Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 19 additions & 21 deletions CLAUDE.md

Large diffs are not rendered by default.

47 changes: 28 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
# RelyLoop

> **Status: alpha (MVP1, v0.1.0).** Open-source automated relevance tuning for enterprise search platforms.

RelyLoop combines an LLM-driven chat agent with an Optuna-driven optimization
loop ("Karpathy loop") to systematically tune query-time relevance on
Elasticsearch and OpenSearch. Engineers describe the problem in chat; the
agent introspects the cluster, proposes a search-space, and runs thousands
of trials against `ir_measures`-computed metrics. Winning configurations
land as Pull Requests against a central search-config Git repo, where named
approvers review and merge.
> **Status: alpha (MVP1, v0.1.0).** The only open-source tool that runs automated Bayesian search-space optimization across thousands of trials, on every major open-source search engine (Elasticsearch, OpenSearch, Apache Solr at MVP2), and ships winning configs as Pull Requests for your existing approval workflow.

A conversational LLM agent describes the problem and proposes the search
space, but the engineering moat is the loop itself, the Git-PR posture, and
the three-engine reach. RelyLoop runs **thousands of Optuna/TPE trials**
across the full query-time search space (field boosts, function scores,
fuzziness, `mm`, tie-breakers, hybrid weights — not just one slice),
evaluates each trial against `ir_measures`-computed metrics, and opens a
**Pull Request** with the winning configuration against your central
search-config Git repo. Your existing approvers and CI handle deployment;
RelyLoop never sits on the live search-serving path.

See [`docs/07_research/comparison.md`](docs/07_research/comparison.md) for
the citation-backed comparison vs OpenSearch Search Relevance Workbench,
Quepid, RRE, Chorus, and Elastic's native tooling — and why the bundle is
genuinely unique in May 2026.

## 5-minute quickstart

Expand Down Expand Up @@ -37,23 +44,25 @@ see Step 0 of the tutorial.
## What's in MVP1 / What's coming

MVP1 ships the full Karpathy loop end-to-end on Elasticsearch + OpenSearch:
chat agent, Optuna optimizer, LLM-as-judge, digest, GitHub PR worker, single-
tenant install. Observable / Production Stacks / Multi-tenant land in MVP2 →
MVP3 → MVP4.
chat agent, Optuna/TPE optimizer, LLM-as-judge, digest, GitHub PR worker,
single-tenant install. **MVP2** adds Apache Solr + UBI judgments + hybrid
UBI+LLM (bundled). **MVP3** adds local-first observability (Langfuse +
SigNoz). **GA v1** is polish + governance + hardening — no new product
surface; all six differentiators are in by MVP3.

Canonical release matrix:
[`docs/01_architecture/tech-stack.md`](docs/01_architecture/tech-stack.md) —
do not duplicate here, the matrix is the source of truth.

## Key design choices

- **Engine-agnostic** — Elasticsearch + OpenSearch in MVP1 via one adapter; Lucidworks Fusion in MVP3; pure Solr in v2.
- **Provider-agnostic** — OpenAI in MVP1; Anthropic, AWS Bedrock, Azure OpenAI, Vertex, Ollama / vLLM in MVP4.
- **Git-as-source-of-truth** — winning configs land as PRs against a central config repo; deployment is the operator's CI's job, not RelyLoop's.
- **Local-first observability** — Langfuse + SigNoz both self-hosted (MVP2+); no LLM trace data leaves the deployment VM.
- **Multi-tenant from MVP4** — single deployment serves many downstream customers in isolation.
- **Agent-first API** — every operation the in-tool orchestrator can perform is also callable by external agents; OpenAPI 3.1, idempotency keys, RFC 7807 errors, outgoing webhooks.
- **Deliberate, not real-time** — RelyLoop is for offline experimentation and change management; it does not sit on the live search-serving path.
- **Engine-neutral across the three OSS engines** — Elasticsearch + OpenSearch in MVP1 via one adapter; Apache Solr in MVP2. Lucidworks Fusion explicitly dropped (see [`chore_drop_fusion_scope/idea.md`](docs/02_product/planned_features/chore_drop_fusion_scope/idea.md)).
- **Full-search-space Bayesian/TPE optimization** — Optuna across field boosts, function scores, fuzziness, `mm`, tie-breakers, hybrid weights, LTR rescoring. Not a 66-cell grid over hybrid weights alone (the only thing OpenSearch SRW's optimizer covers today).
- **Git-as-source-of-truth** — winning configs land as PRs against a central config repo; deployment is the operator's CI's job, not RelyLoop's. OpenSearch SRW has no apply path by explicit RFC choice; this is a stable differentiator.
- **Single-endpoint LLM flexibility** — one env var (`OPENAI_BASE_URL`) is the entire LLM integration surface. Works against any OpenAI-compatible endpoint: OpenAI cloud, Ollama (local), LM Studio (local), vLLM (local or remote), HuggingFace TGI, Azure OpenAI's OpenAI-compatible mode, OpenRouter (multi-model routing), or LiteLLM proxy in front of Bedrock / Vertex / Anthropic native. Truly air-gapped deployments run RelyLoop against Ollama on the same VM with zero data leaving the network. See [`docs/08_guides/llm-endpoint-setup.md`](docs/08_guides/llm-endpoint-setup.md). Native non-OpenAI provider SDKs are in the backlog as an ergonomics upgrade — the unblocking pattern (LiteLLM proxy or OpenRouter) covers most adopters today.
- **Local-first observability** — Langfuse + SigNoz both self-hosted (MVP3); no LLM trace data leaves the deployment VM.
- **Single-tenant through GA v1** — multi-tenancy is in the backlog; SSO via reverse proxy is the recommended path for now.
- **Deliberate, not real-time** — RelyLoop is for offline experimentation and change management; it does not sit on the live search-serving path. Online learning / bandits / production-quality monitoring are a v2 Path B direction.

See spec §4 (non-goals) for the full set.

Expand Down
14 changes: 8 additions & 6 deletions architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ RelyLoop is an **off-line** relevance-tuning tool for enterprise search
platforms. The architecture has four cooperating layers:

1. **Adapter** — a thin Protocol behind which engine differences
(Elasticsearch / OpenSearch / Lucidworks Fusion) and provider differences
(OpenAI / Anthropic / Bedrock / Ollama / Vertex) are isolated.
(Elasticsearch / OpenSearch in MVP1; Apache Solr in MVP2) and LLM
provider differences (OpenAI-compatible endpoints today; Anthropic /
Bedrock / Vertex / Azure OpenAI in the backlog) are isolated.
2. **Domain** — pure Python (no I/O): study state machine, search-space
rules, query rendering, evaluator helpers.
3. **Service** — orchestrators (study runner, judgment generation, digest,
Expand All @@ -29,7 +30,7 @@ tests, and never modifies cluster schema/mapping/analyzer settings.
| [`mvp1-overview.md`](docs/01_architecture/mvp1-overview.md) | The MVP1 reading guide — start here if you're new |
| [`tech-stack.md`](docs/01_architecture/tech-stack.md) | Languages, frameworks, lockfiles, code organization, **canonical release matrix** |
| [`system-overview.md`](docs/01_architecture/system-overview.md) | Service inventory, how containers fit together |
| [`deployment.md`](docs/01_architecture/deployment.md) | Compose layout, secrets pattern, MVP1→MVP4 deployment evolution |
| [`deployment.md`](docs/01_architecture/deployment.md) | Compose layout, secrets pattern, MVP1→GA v1 deployment evolution |
| [`api-conventions.md`](docs/01_architecture/api-conventions.md) | Endpoint conventions, error envelope, pagination, idempotency |
| [`data-model.md`](docs/01_architecture/data-model.md) | Per-table column-level reference; lineage; future audit_log |
| [`adapters.md`](docs/01_architecture/adapters.md) | The `SearchAdapter` Protocol shape |
Expand Down Expand Up @@ -67,9 +68,10 @@ expected to honor them. The full text lives in [`CLAUDE.md`](CLAUDE.md):

1. **Never commit directly to `main`** — feature branches + PRs only.
2. **Secrets via mounted files** (`*_FILE` env vars), never bare env vars.
3. LLM calls go through the `BaseChatModel` abstraction once it lands at
MVP4; until then services may use the `openai` SDK directly but always
read model + base URL from `Settings`.
3. LLM calls go through the `BaseChatModel` abstraction once it lands
(backlog item — native non-OpenAI provider SDKs); until then services
may use the `openai` SDK directly but always read model + base URL
from `Settings`.
4. **Engine-specific code lives only in `backend/app/adapters/<engine>.py`**
— the orchestrator and study runner consume the unified `SearchAdapter`
Protocol.
Expand Down
6 changes: 3 additions & 3 deletions docs/00_overview/DASHBOARD.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# RelyLoop — Release Roadmap

_Top-level index across MVP1 → GA v1+ as of **2026-05-27**. Click a release name to drill into the per-release dashboard. Theme labels sourced from [`docs/01_architecture/tech-stack.md` §"Canonical release matrix"](../01_architecture/tech-stack.md). For the rich local view, open [`dashboard.html`](dashboard.html) in a browser._
_Top-level index across MVP1 → GA v1+ as of **2026-05-28**. Click a release name to drill into the per-release dashboard. Theme labels sourced from [`docs/01_architecture/tech-stack.md` §"Canonical release matrix"](../01_architecture/tech-stack.md). For the rich local view, open [`dashboard.html`](dashboard.html) in a browser._

## Releases

| Release | Theme | Progress | Status |
|---|---|---|---|
| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 88 / 89 scoped done · 14 remaining | **In progress** |
| [MVP1.5 / v0.1.5](MVP1_5_DASHBOARD.md) | Real Signals | 1 item(s) queued | **Held / queued** |
| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 88 / 89 scoped done · 15 remaining | **In progress** |
| MVP1.5 / v0.1.5 | Real Signals | | **Not yet scoped** |
| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Observable | 1 / 1 scoped done · 1 remaining | **In progress** |
| MVP3 / v0.3 | Production Stacks | — | **Not yet scoped** |
| MVP4 / v0.4 | Multi-tenant, Multi-LLM | — | **Not yet scoped** |
Expand Down
Loading
Loading