Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
a5f6c51
docs(feat-chat-agent): /idea-preflight ground-truth pass against post…
SoundMindsAI May 12, 2026
8858769
docs(feat-chat-agent): implementation plan (3 GPT-5.5 review cycles) …
SoundMindsAI May 12, 2026
3647cb5
feat(chat-agent): Alembic 0007_conversations_messages (Story 1.1)
SoundMindsAI May 12, 2026
5d94b5b
feat(chat-agent): ORM models Conversation + Message (Story 1.2)
SoundMindsAI May 12, 2026
54a39e3
feat(chat-agent): Conversation + Message repository (Story 1.3)
SoundMindsAI May 12, 2026
b5729c8
docs(chat-agent): Epic 1 phase-gate complete — tracker + GPT-5.5 log
SoundMindsAI May 12, 2026
9875f37
feat(chat-agent): tool registry + 5 read-only tools (Story 2.1)
SoundMindsAI May 12, 2026
b2ea590
feat(chat-agent): query-set/judgment/run_query tools + dispatch helpe…
SoundMindsAI May 12, 2026
35673ae
feat(chat-agent): study tools (Story 2.3)
SoundMindsAI May 12, 2026
0ecc4a5
feat(chat-agent): proposal/PR tools + open_pr preflight lift (Story 2.4)
SoundMindsAI May 12, 2026
a4cdcc9
feat(chat-agent): system prompt + orchestrator loop (Story 2.5)
SoundMindsAI May 12, 2026
bb05f30
feat(chat-agent): agent_chat service — sole persistence owner (Story …
SoundMindsAI May 12, 2026
9941d6e
feat(chat-agent): conversations REST + SSE endpoints (Stories 3.1 + 3.2)
SoundMindsAI May 12, 2026
6c4f4ff
feat(chat-agent): /chat surface — list page, detail page, SSE consume…
SoundMindsAI May 12, 2026
5dd70eb
docs(chat-agent): runbook + state + CLAUDE updates + 2 idea files (St…
SoundMindsAI May 12, 2026
66bfd8e
fix(chat-agent): CI green — migration head update + integration test …
SoundMindsAI May 12, 2026
fe773bb
fix(chat-agent): GPT-5.5 final-review findings F1-F4 (security + corr…
SoundMindsAI May 12, 2026
0cb4ad9
fix(infra): COPY prompts/ into Docker image + CWD-independent path re…
SoundMindsAI May 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ If you slip and a stub leaks into a committed file, capture it as a `bug_<slug>`
| 7 | [`feat_github_pr_worker`](docs/00_overview/implemented_features/2026_05_12_feat_github_pr_worker/) | **Complete (PR #45, merged 2026-05-12)** |
| 8 | [`feat_github_webhook`](docs/00_overview/implemented_features/2026_05_12_feat_github_webhook/) | **Complete (PR #56, merged 2026-05-12)** |
| 9 | [`feat_studies_ui`](docs/00_overview/implemented_features/2026_05_12_feat_studies_ui/) | **Complete (PR #50, pending merge)** |
| 10 | [`feat_chat_agent`](docs/02_product/planned_features/feat_chat_agent/) | Spec approved, plan pending |
| 10 | [`feat_chat_agent`](docs/02_product/planned_features/feat_chat_agent/) | **Complete (PR pending merge)** |
| 11 | [`feat_proposals_ui`](docs/00_overview/implemented_features/2026_05_12_feat_proposals_ui/) | **Complete (PR #58, merged 2026-05-12)** |
| 12 | [`chore_tutorial_polish`](docs/02_product/planned_features/chore_tutorial_polish/) | Spec approved, plan pending |

Expand All @@ -447,3 +447,4 @@ Run `/pipeline status` for the live view from spec dependencies.
| Local LLM (Ollama / LM Studio / vLLM / TGI) configuration | [`docs/01_architecture/llm-orchestration.md` §"OpenAI-compatible endpoints"](docs/01_architecture/llm-orchestration.md); operator-facing runbook lands with `chore_tutorial_polish` |
| LLM-as-judge worker debugging + calibration / overrides | [`docs/03_runbooks/judgment-generation-debugging.md`](docs/03_runbooks/judgment-generation-debugging.md) (`feat_llm_judgments`) |
| What data leaves the cluster on each judgment-generation call | [`docs/04_security/llm-data-flow.md`](docs/04_security/llm-data-flow.md) (`feat_llm_judgments` §15) |
| Chat-agent debugging — replay a conversation, force a tool dispatch, inspect SSE events | [`docs/03_runbooks/agent-debugging.md`](docs/03_runbooks/agent-debugging.md) (`feat_chat_agent`) |
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ COPY --from=deps --chown=relyloop:relyloop /app/.venv /app/.venv
# needed by `uv sync` to install the project itself into the venv).
COPY --chown=relyloop:relyloop backend/ /app/backend/
COPY --chown=relyloop:relyloop migrations/ /app/migrations/
COPY --chown=relyloop:relyloop prompts/ /app/prompts/
COPY --chown=relyloop:relyloop alembic.ini /app/alembic.ini
COPY --chown=relyloop:relyloop pyproject.toml uv.lock README.md LICENSE /app/

Expand Down
1 change: 1 addition & 0 deletions backend/app/agent/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Agent package — chat orchestrator + tool registry (feat_chat_agent Epic 2)."""
94 changes: 94 additions & 0 deletions backend/app/agent/confirmation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
"""Confirmation guard primitives (feat_chat_agent Story 2.5).

* :data:`MUTATING_TOOL_NAMES` — the 7-tool set requiring confirmation per spec
FR-5 + §19 Decision log. ``create_query_set`` is intentionally NOT on this
list (creating an empty container is cheap to undo).
* :func:`is_affirmative` — whole-word, case-insensitive matcher against a small
affirmative-token vocabulary.
"""

from __future__ import annotations

import re

MUTATING_TOOL_NAMES: frozenset[str] = frozenset(
{
"import_queries_from_csv",
"generate_judgments_llm",
"create_study",
"cancel_study",
"create_proposal_from_study",
"create_proposal_manual",
"open_pr",
}
)


# Single-word and short-phrase affirmatives. Whole-word matching against the
# single-word tokens; substring presence is sufficient for the short phrases.
_AFFIRMATIVE_TOKENS: frozenset[str] = frozenset(
{
"yes",
"y",
"yep",
"yeah",
"ok",
"okay",
"go",
"confirm",
"confirmed",
"proceed",
}
)

_AFFIRMATIVE_PHRASES: tuple[str, ...] = (
"go ahead",
"do it",
"ship it",
)


# Negation tokens that, if present, disqualify the message from being
# treated as affirmative — even if it also contains an affirmative token
# (per GPT-5.5 final-review F2 — without this, "don't do it" or "no go"
# matched the affirmative-phrase substring check and unlocked dispatch).
_NEGATION_TOKENS: frozenset[str] = frozenset(
{
"no",
"not",
"don", # don't (apostrophe stripped by the [a-z] regex)
"dont",
"doesn",
"doesnt",
"won", # won't
"wont",
"cancel",
"stop",
"abort",
"wait",
"nope",
"never",
}
)


def is_affirmative(text: str) -> bool:
"""Return ``True`` if ``text`` reads as user affirmation of a mutating action.

Heuristic — acceptable for MVP1; a strict state-machine confirmation can
land at MVP2 if the heuristic misfires. Case-insensitive; whole-word
matching on single-word tokens so "yes" matches "Yes!" but not "yesterday".
Rejects messages containing negation tokens before checking for
affirmation, so "don't do it", "no go", "stop, do it later" all return
False even though they contain affirmative phrases.
"""
if not text:
return False
lowered = text.lower()
words = set(re.findall(r"[a-z]+", lowered))
if _NEGATION_TOKENS & words:
return False
for phrase in _AFFIRMATIVE_PHRASES:
if phrase in lowered:
return True
return bool(_AFFIRMATIVE_TOKENS & words)
26 changes: 26 additions & 0 deletions backend/app/agent/context.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""ToolContext — dependency bundle passed to every tool impl by the orchestrator.

Tools call into the service/repo layer using these. ``arq_pool`` is None when
the queue isn't connected; tools that enqueue work must raise
``QUEUE_UNAVAILABLE`` in that case (mirroring the open_pr proposal endpoint).
"""

from __future__ import annotations

from dataclasses import dataclass

from arq.connections import ArqRedis
from redis.asyncio import Redis
from sqlalchemy.ext.asyncio import AsyncSession

from backend.app.core.settings import Settings


@dataclass(frozen=True, slots=True)
class ToolContext:
"""Bundles dependencies handed to tool impls so each impl has one parameter."""

db: AsyncSession
redis: Redis
arq_pool: ArqRedis | None
settings: Settings
154 changes: 154 additions & 0 deletions backend/app/agent/events.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
r"""Stream events emitted by the orchestrator (feat_chat_agent Story 2.5).

The orchestrator is a pure async generator — it does not write to the DB.
Six event types flow through ``run_turn``:

* **Wire events** forwarded to SSE (4): :class:`TokenEvent`, :class:`ToolCallEvent`,
:class:`ToolResultEvent`, :class:`DoneEvent`. Each carries an
``.to_sse_lines()`` method that produces the canonical
``event: <type>\ndata: <json>\n\n`` framing.
* **Persistence events** consumed internally by :func:`agent_chat.send_user_message`
to call ``repo.create_message`` (2): :class:`AssistantMessagePersistEvent`,
:class:`ToolMessagePersistEvent`. NOT forwarded to SSE — the visible content
already streamed via ``TokenEvent`` / ``ToolCallEvent`` / ``ToolResultEvent``.
"""

from __future__ import annotations

import json
from dataclasses import dataclass, field
from typing import Any

# ---------------------------------------------------------------------------
# Wire events (forwarded to SSE)
# ---------------------------------------------------------------------------


@dataclass(frozen=True, slots=True)
class TokenEvent:
"""One streamed text delta."""

text: str

def to_sse_lines(self) -> str:
"""Render as canonical SSE framing."""
return f"event: token\ndata: {json.dumps({'text': self.text})}\n\n"


@dataclass(frozen=True, slots=True)
class ToolCallEvent:
"""The LLM emitted a tool_call. Arguments are the JSON-parsed dict.

Emitted BEFORE Pydantic validation runs, so the UI's ``<ToolCallCard>``
renders even when the args fail validation. ``arguments`` is always a
``dict`` coming from ``json.loads`` of the OpenAI-supplied arguments
string (or ``{"_raw": "<raw>"}`` if the raw string itself isn't valid
JSON). No Python ``UUID`` objects ever enter this field, so
``json.dumps`` round-trips cleanly.
"""

id: str
name: str
arguments: dict[str, Any]

def to_sse_lines(self) -> str:
"""Render as canonical SSE framing."""
payload = {"id": self.id, "name": self.name, "arguments": self.arguments}
return f"event: tool_call\ndata: {json.dumps(payload, default=str)}\n\n"


@dataclass(frozen=True, slots=True)
class ToolResultEvent:
"""A tool dispatch terminated — either with a result or an error code."""

id: str
name: str
result: dict[str, Any] | None = None
error: str | None = None
detail: str | None = None

def to_sse_lines(self) -> str:
"""Render as canonical SSE framing."""
payload: dict[str, Any] = {"id": self.id, "name": self.name}
if self.error is not None:
payload["error"] = self.error
if self.detail is not None:
payload["detail"] = self.detail
else:
payload["result"] = self.result
return f"event: tool_result\ndata: {json.dumps(payload, default=str)}\n\n"


@dataclass(frozen=True, slots=True)
class DoneEvent:
"""Terminal event for a turn. Carries usage + cost on success, error code on failure."""

conversation_id: str
tokens_used: int | None = None
cost_usd: float | None = None
error: str | None = None
iterations: int | None = None

def to_sse_lines(self) -> str:
"""Render as canonical SSE framing."""
payload: dict[str, Any] = {"conversation_id": self.conversation_id}
if self.error is not None:
payload["error"] = self.error
else:
if self.tokens_used is not None:
payload["tokens_used"] = self.tokens_used
if self.cost_usd is not None:
payload["cost_usd"] = self.cost_usd
return f"event: done\ndata: {json.dumps(payload, default=str)}\n\n"


# ---------------------------------------------------------------------------
# Persistence events (consumed by agent_chat, NOT forwarded to SSE)
# ---------------------------------------------------------------------------


@dataclass(frozen=True, slots=True)
class AssistantMessagePersistEvent:
"""Internal marker: agent_chat should INSERT an ``assistant``-role message.

``tool_calls`` is the structured list captured from the OpenAI stream
(each item is ``{id, type, function: {name, arguments}}``) or None for
plain text replies. ``usage`` carries the OpenAI token usage when present
(None for the degraded-mode ``system_notice``). ``cost_usd`` is the
computed cost from ``compute_call_cost``.
"""

content: dict[str, Any]
tool_calls: list[dict[str, Any]] | None = None
usage: dict[str, int] | None = None
cost_usd: float | None = None


@dataclass(frozen=True, slots=True)
class ToolMessagePersistEvent:
"""Internal marker: agent_chat should INSERT a ``tool``-role message."""

tool_call_id: str
content: dict[str, Any] = field(default_factory=dict)


StreamEvent = (
TokenEvent
| ToolCallEvent
| ToolResultEvent
| DoneEvent
| AssistantMessagePersistEvent
| ToolMessagePersistEvent
)
"""Union of every event the orchestrator can yield."""


__all__ = [
"AssistantMessagePersistEvent",
"DoneEvent",
"StreamEvent",
"TokenEvent",
"ToolCallEvent",
"ToolMessagePersistEvent",
"ToolResultEvent",
]
Loading
Loading