Skip to content

feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248

Open
dm36 wants to merge 2 commits into
mainfrom
dhruv/agx1-272-agent-api-keys-dual-write
Open

feat(AGX1-272): dual-write agent_api_keys to spark-authz behind FGAC flag#248
dm36 wants to merge 2 commits into
mainfrom
dhruv/agx1-272-agent-api-keys-dual-write

Conversation

@dm36
Copy link
Copy Markdown

@dm36 dm36 commented May 26, 2026

Summary

Mirrors the AGX1-274 task dual-write pattern (#246) for agent_api_keys, with the parent_agent cascade properly wired end-to-end. When the per-account FGAC_AGENT_API_KEYS_DUAL_WRITE flag is on, every agent_api_key create calls register_resource(api_key, parent=agent) (writing both the owner edge AND the parent_agent edge atomically in SpiceDB) BEFORE the Postgres write; every delete best-effort calls deregister_resource after.

PR A of two — PR B (AGX1-273, #252) migrates the api_key routes to authoritative auth checks against Spark.

Consolidation note

This PR was originally split into two: the dual-write (#248 itself, using grant()) and a follow-up parent-edge fix (formerly #253, using register_resource(parent=...)). Reviewer surfaced that grant() alone doesn't write the parent_agent edge — and without it, the SpiceDB schema's read = ... & parent_agent->read & ... cascade fails closed for every reader, even the owner.

Folded #253's commit into this branch via cherry-pick. PR #253 has been closed. The use case now calls register_resource(parent=AgentexResource.agent(agent_id)) instead of grant(), which writes both edges in one round-trip.

Cross-repo stack

# Repo PR Purpose State
1 scaleapi/scaleapi #144657 sgp-authz 0.6.0 → 0.7.0; parent_resource kwarg on register_resource + cross-tenant validation Draft
2 scaleapi/agentex #354 agentex-auth Port + Spark adapter + HTTP routes + api_key mapping; pin to 0.7.0 Draft
3 (this PR) scaleapi/scale-agentex this scale-agentex Port + use case calling register_resource(parent=agent) Draft

Stack must merge in order: #1#2 → this.

Changes

Dual-write infrastructure (commit 7e4d50e)

  • Migration agent_api_keys: add creator_user_id, creator_service_account_id, spark_authz_zedtoken. CHECK constraint + CREATE INDEX CONCURRENTLY per Asher's task migration.
  • ORM / Entity: same three fields on AgentAPIKeyORM and AgentAPIKeyEntity.
  • Schema: AgentexResourceType.api_key, AgentexResource.api_key(...), AgentexResourceOptionalSelector.api_key(...).
  • src/utils/feature_flags.py (NEW): FeatureFlagProvider env-var allowlist; flags FGAC_TASKS, FGAC_TASKS_DUAL_WRITE, FGAC_AGENT_API_KEYS_DUAL_WRITE. Matches Asher's feat(AGX1-274): record task creator identity and FGAC migration safety #246 shape. If feat(AGX1-274): record task creator identity and FGAC migration safety #246 lands first this becomes a rebase concern.
  • AgentAPIKeysUseCase: accepts authorization_service + feature_flags; threads account_id through create / delete*; best-effort revoke-after-delete that logs but doesn't block; creator_user_id / creator_service_account_id read from principal_context.
  • Routes: create_api_key, delete_agent_api_key, delete_agent_api_key_by_name resolve account_id from authorization_service.principal_context and pass it down. No read-side auth checks (PR B scope).
  • Tests: tests/integration/services/test_agent_api_key_service_dual_write.py mirrors test_task_service_dual_write.py with 8 cases.

Parent-edge work (commit 9668f1a, folded in from #253)

  • src/adapters/authorization/port.py — abstract register_resource(resource, parent=None) and deregister_resource(resource) on AuthorizationGateway.
  • src/adapters/authorization/adapter_agentex_authz_proxy.py — POST /v1/authz/register and /v1/authz/deregister to agentex-auth (these endpoints are added in #354).
  • src/domain/services/authorization_service.pyregister_resource / deregister_resource service methods mirror the grant/revoke pattern (principal_context override, _bypass support, parent identity in log line for cascade debugging).
  • AgentAPIKeysUseCase:
    • Swap grant → register_resource in _register_api_key_in_spark_authz, passing parent=AgentexResource.agent(agent_id). The parent edge is load-bearing.
    • Swap revoke → deregister_resource in _deregister_api_key_from_spark_authz. Atomically removes the resource + all of its relationships.
    • except Exception wrappers: fail-closed on register, best-effort on deregister.
  • Tests: rename mocks (grant/revokeregister_resource/deregister_resource); add explicit assertion that parent=AgentexResource.agent(agent.id) is passed correctly on every create (the contract that prevents silently dropping the parent edge in future changes).

Structural divergence from #246

  • agent_api_keys have no service layer, so the dual-write logic lives in AgentAPIKeysUseCase rather than a parallel AgentAPIKeyService. Keeps the call sites simple — open to refactoring to a service layer if reviewers want strict layering parity with tasks.

Test plan

  • 8 / 8 dual-write integration tests pass locally (pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py). New assertion in test_create_api_key_calls_grant_when_flag_on pins the parent edge contract:
    registered_parent = register.await_args.kwargs["parent"]
    assert registered_parent.type == AgentexResourceType.agent
    assert registered_parent.selector == agent.id
  • All other dual-write semantics preserved: flag-on, flag-off, no-creator skip, register-failure-prevents-row, deregister-failure-does-not-block-delete.
  • CI: ruff, ruff-format, alembic migration lint passed locally on commit.
  • Verify alembic upgrade/downgrade against a clean DB.
  • End-to-end against a real SpiceDB once the full cross-repo stack is merged and AUTH_PROVIDER=spark is exercised in dev.

Out of scope / follow-ups

  • ZedToken plumbing — adapter returns None from register_resource; column stays NULL for now (same as tasks). A follow-up will surface the token once the adapter exposes it.
  • Backfill for existing api_keys created without the parent edge. Reconcile job tracked separately.
  • Other resource types' grant → register_resource swap (task, build, deployment, schedule). Each owns its own follow-up; Asher's task PR feat(AGX1-274): record task creator identity and FGAC migration safety #246 has the same parent_agent gap and is the highest-priority follow-up.

Linked: AGX1-272.
Closes #253 (consolidated here).

🤖 Generated with Claude Code

Greptile Summary

This PR introduces dual-write of agent_api_keys to Spark AuthZ behind a per-account FGAC_AGENT_API_KEYS_DUAL_WRITE feature flag, mirroring the task dual-write pattern from PR #246. The dual-write logic lives in AgentAPIKeysUseCase (no dedicated service layer), with grant-before-write on create and best-effort deregister-after-delete.

  • Migration adds creator_user_id, creator_service_account_id, and spark_authz_zedtoken columns to agent_api_keys, creates CONCURRENTLY indexes in an autocommit_block, and enforces a CHECK constraint ensuring at most one creator identity is set.
  • FeatureFlagProvider (new src/utils/feature_flags.py) resolves per-account flags via env-var allowlists; the env key is derived from the flag name at call time so no caching layer is needed.
  • AgentAPIKeysUseCase calls register_resource (owner + parent edge atomically) before the Postgres write and best-effort deregister_resource after the delete; eight new integration tests cover the main flag-on/off and failure paths.

Confidence Score: 5/5

Safe to merge — all new code paths are guarded by a per-account feature flag, all new columns are nullable with no schema breakage, and the dual-write contract is correctly implemented and well-tested.

The grant-before-write and best-effort-deregister patterns are implemented consistently and match the documented design. The migration uses CONCURRENTLY indexes with IF NOT EXISTS and a correct autocommit_block, safe to run against a live database. The feature flag short-circuits everything when disabled, keeping existing behaviour unchanged for all non-FGAC accounts. No logic defects were found beyond what has already been discussed in earlier review threads.

No files require special attention.

Important Files Changed

Filename Overview
agentex/database/migrations/alembic/versions/2026_05_26_1200_add_agent_api_key_creator_and_zedtoken_b2c84edb77d6.py Adds three nullable columns, creates CONCURRENTLY indexes in an autocommit_block, and adds a single-creator CHECK constraint; downgrade reverses cleanly.
agentex/src/domain/use_cases/agent_api_keys_use_case.py Adds dual-write logic: register_resource before DB create (fail-closed), deregister_resource after DB delete (best-effort); creator identity sourced from principal_context.
agentex/src/utils/feature_flags.py New FeatureFlagProvider reads per-account allowlists from env vars; env key derived from flag name at call time; straightforward and correct.
agentex/src/api/routes/agent_api_keys.py Delete and create routes now resolve account_id from principal_context and pass it to the use case; clean and symmetric.
agentex/src/domain/services/authorization_service.py Adds register_resource and deregister_resource methods to AuthorizationService, delegating to the gateway with bypass logic and logging; well-structured.
agentex/tests/integration/services/test_agent_api_key_service_dual_write.py Eight integration tests covering flag-on/off, grant failure preventing DB write, deregister failure not blocking delete, and no-creator no-op path; good coverage of the main branches.
agentex/src/adapters/orm.py Adds creator_user_id, creator_service_account_id (index=True), and spark_authz_zedtoken columns to AgentAPIKeyORM; index names match those created in the migration.
agentex/src/adapters/authorization/adapter_agentex_authz_proxy.py Implements register_resource and deregister_resource via /v1/authz/register and /v1/authz/deregister endpoints; parent is serialized when present.
agentex/src/adapters/authorization/port.py Adds register_resource and deregister_resource abstract methods to AuthorizationGateway; well-documented with the parent-edge requirement.
agentex/src/domain/entities/agent_api_keys.py Adds creator_user_id, creator_service_account_id, and spark_authz_zedtoken fields to AgentAPIKeyEntity; all nullable with clear descriptions.

Sequence Diagram

sequenceDiagram
    participant R as Route Handler
    participant UC as AgentAPIKeysUseCase
    participant FF as FeatureFlagProvider
    participant AS as AuthorizationService
    participant DB as AgentAPIKeyRepo

    Note over R,DB: CREATE flow
    R->>UC: create(name, agent_id, api_key, account_id)
    UC->>FF: is_enabled(FGAC_AGENT_API_KEYS_DUAL_WRITE, account_id)
    alt flag ON and creator resolvable
        UC->>AS: "register_resource(api_key_id, parent=agent_id)"
        AS-->>UC: None
    else flag OFF or no creator
        UC-->>UC: skip Spark registration
    end
    UC->>DB: create(AgentAPIKeyEntity)
    DB-->>UC: persisted entity
    UC-->>R: AgentAPIKeyEntity

    Note over R,DB: DELETE-by-ID flow
    R->>UC: delete(id, account_id)
    UC->>DB: delete(id)
    DB-->>UC: ok
    UC->>FF: is_enabled(FGAC_AGENT_API_KEYS_DUAL_WRITE, account_id)
    alt flag ON
        UC->>AS: deregister_resource(api_key_id)
        AS-->>UC: ok or logged failure
    end
    UC-->>R: void

    Note over R,DB: DELETE-by-name flow
    R->>UC: delete_by_agent_id_and_key_name(agent_id, key_name, account_id)
    UC->>DB: get_by_agent_id_and_name(agent_id, key_name)
    DB-->>UC: existing or None
    UC->>DB: delete_by_agent_id_and_key_name(agent_id, key_name)
    DB-->>UC: ok
    alt existing found AND flag ON
        UC->>AS: deregister_resource(existing.id)
        AS-->>UC: ok or logged failure
    end
    UC-->>R: void
Loading

Comments Outside Diff (1)

  1. agentex/tests/integration/services/test_agent_api_key_service_dual_write.py, line 703-727 (link)

    P2 Missing test for grant-succeeds-but-DB-create-fails (orphan Spark tuple)

    The suite covers grant failure preventing the DB write, but not the reverse: grant succeeding followed by repo.create raising (e.g., a duplicate key or transient DB error). In that case a Spark tuple exists for an api_key_id that never lands in Postgres. Adding a test for this case would make the documented known-limitation explicit and guard against accidental regression if compensating logic is added later.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: agentex/tests/integration/services/test_agent_api_key_service_dual_write.py
    Line: 703-727
    
    Comment:
    **Missing test for grant-succeeds-but-DB-create-fails (orphan Spark tuple)**
    
    The suite covers `grant` failure preventing the DB write, but not the reverse: `grant` succeeding followed by `repo.create` raising (e.g., a duplicate key or transient DB error). In that case a Spark tuple exists for an `api_key_id` that never lands in Postgres. Adding a test for this case would make the documented known-limitation explicit and guard against accidental regression if compensating logic is added later.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Cursor Fix in Claude Code Fix in Codex

Reviews (2): Last reviewed commit: "feat: register_resource with parent edge..." | Re-trigger Greptile

…AGENT_API_KEYS_DUAL_WRITE flag

Mirrors the AGX1-274 task dual-write pattern (PR #246) for agent_api_keys.

- Adds creator_user_id / creator_service_account_id / spark_authz_zedtoken
  columns to agent_api_keys, with CHECK constraint and concurrent indexes.
- On create, when FGAC_AGENT_API_KEYS_DUAL_WRITE is enabled for the caller's
  account, calls authorization_service.grant(AgentexResource.api_key(id))
  BEFORE the Postgres write. Grant failure aborts the create.
- On delete, best-effort revoke after the Postgres delete. Failures are
  logged but do not block the delete.
- Adds AgentexResourceType.api_key and AgentexResource.api_key(...) factory.
- Creates src/utils/feature_flags.py with both FGAC_TASKS_DUAL_WRITE and
  FGAC_AGENT_API_KEYS_DUAL_WRITE (file does not exist on main yet; if PR #246
  lands first this becomes a rebase concern).

Structural divergence from tasks: agent_api_keys have no service layer, so
the dual-write logic lives in AgentAPIKeysUseCase rather than a separate
service. This keeps the call site simple and avoids inventing a new layer.

Route layer (read-side auth checks) is out of scope; that's PR B (AGX1-273).
agentex-auth spark_mapping.py update is a sibling-repo concern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dm36 dm36 marked this pull request as ready for review May 26, 2026 21:31
@dm36 dm36 requested a review from a team as a code owner May 26, 2026 21:31
Comment on lines +237 to +246
existing = await self.agent_api_key_repo.get_by_agent_id_and_name(
agent_id=agent_id, name=key_name, api_key_type=api_key_type
)
await self.agent_api_key_repo.delete_by_agent_id_and_key_name(
agent_id=agent_id, key_name=key_name, api_key_type=api_key_type
)
if existing is not None:
await self._deregister_api_key_from_spark_authz(
api_key_id=existing.id, account_id=account_id
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unconditional pre-fetch adds a DB round-trip on every delete-by-name regardless of flag state

get_by_agent_id_and_name is called before the delete on every invocation, even when account_id is None or the FGAC_AGENT_API_KEYS_DUAL_WRITE flag is off for this account. _deregister_api_key_from_spark_authz will return immediately in those cases, making the existing fetch pure overhead. The same applies to delete_by_agent_name_and_key_name. Gate the pre-fetch behind a flag check to avoid the extra query on every delete-by-name in non-FGAC accounts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex/src/domain/use_cases/agent_api_keys_use_case.py
Line: 237-246

Comment:
**Unconditional pre-fetch adds a DB round-trip on every delete-by-name regardless of flag state**

`get_by_agent_id_and_name` is called before the delete on every invocation, even when `account_id` is `None` or the `FGAC_AGENT_API_KEYS_DUAL_WRITE` flag is off for this account. `_deregister_api_key_from_spark_authz` will return immediately in those cases, making the `existing` fetch pure overhead. The same applies to `delete_by_agent_name_and_key_name`. Gate the pre-fetch behind a flag check to avoid the extra query on every delete-by-name in non-FGAC accounts.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Cursor Fix in Claude Code Fix in Codex

Comment thread agentex/src/domain/use_cases/agent_api_keys_use_case.py
Closes the parent_agent cascade gap surfaced on scaleapi/agentex#354.
The api_key dual-write (AGX1-272, PR #248) currently calls grant() which
writes the owner edge in SpiceDB but NOT the parent_agent edge. The
agent_api_key schema requires `read = ... & parent_agent->read & ...`,
so every downstream read/update fails closed without that edge.

This PR adds register_resource/deregister_resource (Port + adapter + service)
and swaps the api_keys use case from grant→register_resource with
parent=AgentexResource.agent(agent_id). Now the owner edge and parent_agent
edge are written atomically.

Stack:
- scaleapi/scaleapi#144657 — sgp-authz 0.7.0 (parent_resource kwarg).
- scaleapi/agentex#355 — agentex-auth Port + adapter + HTTP routes.
- #248 — original AGX1-272 dual-write (this stacks on it).
- THIS PR — extends #248 to use the parent-aware path.

Changes:
- Port: abstract register_resource(resource, parent=None) and
  deregister_resource(resource).
- Adapter proxy: POST /v1/authz/register and /v1/authz/deregister.
- Service: mirror existing grant/revoke pattern (principal_context override,
  _bypass support, parent in log line for cascade debugging).
- Use case: swap grant→register_resource passing parent=agent;
  swap revoke→deregister_resource. except Exception wrappers preserved
  (fail-closed on register, best-effort on deregister).
- Tests: rename mocks to register_resource/deregister_resource; assert the
  parent edge is passed correctly.

Test plan:
- pytest agentex/tests/integration/services/test_agent_api_key_service_dual_write.py
  → 8 / 8 pass.
- New test ``test_create_api_key_calls_grant_when_flag_on`` asserts
  parent.type == AgentexResourceType.agent and parent.selector == agent.id.

Other resource types' grant→register_resource swap is out of scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant