Skip to content

fix(server): Tier 2 permission-denied parity contract (#392)#735

Closed
bokelley wants to merge 1 commit into
mainfrom
claude/issue-392-tier2-parity-contract
Closed

fix(server): Tier 2 permission-denied parity contract (#392)#735
bokelley wants to merge 1 commit into
mainfrom
claude/issue-392-tier2-parity-contract

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Closes #392. Closes the structural / latency / header / observability gaps in the cross-tenant onboarding-oracle clamp left over from #375.

  • Single internal PermissionDeniedError + translator (was 4 inline envelope sites in _resolve_buyer_agent)
  • Explicit deadline-relative latency budget on every denial path (env-tunable via ADCP_PERMISSION_DENIED_BUDGET_MS, default 50 ms)
  • Audit/metric parity: uniform operation label buyer_agent_registry.permission_denied, uniform details key set, agent_url hashed-truncated via sha256(...)[:12] so log-scraping cannot rebuild the side channel
  • Header parity tradeoff documented in the module docstring — Content-Length variance is bounded by details payload size; the spec's omit-on-unestablished-identity rule prevents padding the unrecognized envelope, and the intra-recognized variance (agent_url is buyer-controlled) already exceeds the recognized-vs-unrecognized delta

Security review

Self-checks performed against the design's review points:

  1. Field-presence side channel in audit emission? The audit row's details key set is identical on every denial path — outcome, reason_scope, reason_status, agent_url_hash are all present; values are None on the unrecognized branch. Verified by test_audit_emit_uniform_details_key_set and test_audit_emit_unrecognized_carries_none_hash_not_missing_key. On the wire, details is omitted entirely on unrecognized (the empty dict on AdcpError is dropped by to_wire()), preserving the existing spec-conformance behavior.

  2. Ordering leak between audit emit and budget sleep? Identical ordering on every branch, structurally enforced by routing all branches through raise_permission_denied: capture deadline → emit audit → sleep until deadline → raise. The deadline-relative sleep (rather than fixed-duration) absorbs audit-sink variance so total wall-clock is dominated by the budget regardless of branch.

  3. Padding content leak? No padding applied — the spec's omit-on-unestablished-identity rule prevents padding the unrecognized envelope. The tradeoff is documented in the module docstring with the justification that intra-recognized agent_url length variance already exceeds the recognized-vs-unrecognized delta.

What I'd want a security reviewer to verify next:

  • Audit-sink wiring downstream. PlatformHandler now accepts permission_denied_audit_sink — verify that the v3 reference seller / SalesAgent wires this to the same sink as the registry-cache layer so SecOps sees registry events and denial events in a single stream.
  • Budget tuning under load. The 50 ms default assumes audit-sink p99 < 30 ms. Real-world DB-backed sinks can spike higher; production rollouts should establish an SLO on the sink and document the budget setting alongside it.
  • Latency-parity test sensitivity in CI. The latency-parity test isolates raise_permission_denied from PlatformHandler setup overhead so the assertion targets the parity property rather than handler jitter. Verify the test stays sub-5ms on the CI runner profile; if CI moves to noisier hardware, the budget in that test can be raised (the production setting is independent).

Test plan

  • Latency parity p99 diff < 5 ms (tests/test_tier2_parity_contract.py::test_latency_parity_p99_difference_under_budget)
  • Header parity within ±200 bytes Content-Length (test_header_content_length_within_tolerance); two unrecognized envelopes byte-identical (test_unrecognized_envelopes_are_byte_equivalent)
  • Side-effects parity: audit operation label, key set, hash-truncated discriminator (test_audit_emit_same_operation_label_across_paths, test_audit_emit_uniform_details_key_set, test_audit_emit_agent_url_is_hashed_not_plaintext, test_audit_emit_unrecognized_carries_none_hash_not_missing_key)
  • Status code parity across 4 denial paths (test_status_code_parity_across_four_paths)
  • Audit-sink failure isolation: raising sink doesn't bypass the gate (test_raising_sink_does_not_propagate)
  • Env-var configuration: tunable, malformed-value fallback, default (test_budget_env_var_*)
  • Existing tests/test_tier2_spec_conformance.py still passes (12 tests green)
  • Full unit suite green: 4814 passed, 24 skipped, 1 xfailed
  • ruff check src/ + mypy src/adcp/ clean

🤖 Generated with Claude Code

Folds the four PERMISSION_DENIED branches in `_resolve_buyer_agent`
through a single emit point with:

- A single internal PermissionDeniedError + translator (was 4 inline
  envelope sites).
- Deadline-relative latency budget on every denial path, env-tunable
  via ADCP_PERMISSION_DENIED_BUDGET_MS (default 50 ms).
- Audit-row parity: uniform operation label, uniform key set, agent_url
  hashed-truncated to defend against log-scraping.
- Header-parity tradeoff documented (Content-Length variance within the
  spec's omit-on-unestablished-identity allowance).

Closes the structural / latency / observability gaps deferred from #375
(wire-code rename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

Holding for human security review before merge — this is a 1,100-line auth refactor with explicit timing-side-channel + audit/header parity concerns. The implementing agent self-flagged that an independent security-reviewer pass is the right next step.

Not blocking the 5.5.0 release. Will ship in a subsequent release after review.

@bokelley
Copy link
Copy Markdown
Contributor Author

Closing as superseded by #748.

Quick recap of why this PR's wire-level parity premise no longer holds, and where the residual value moved:

Why this PR is superseded

PR #748 (merged 2026-05-20) migrated the recognized-but-denied paths to dedicated wire codes (AGENT_SUSPENDED / AGENT_BLOCKED) with recovery="terminal". That is the spec-aligned strategy adopted by AdCP 3.1.0-beta.1 (upstream PR adcontextprotocol/adcp#3906).

This PR was built against the pre-#748 world where all four denial branches funneled through PERMISSION_DENIED with different details payloads. The whole "four branches funnel through a single translator" structural contract no longer applies — the recognized-but-denied branches now intentionally diverge on the wire via dedicated codes, and that's the correct design per the spec.

Per the protocol-expert review of #748: "recognized-but-denied path no longer carries scope='agent', so it's now indistinguishable from any vanilla PERMISSION_DENIED, not just from the unrecognized path." The very wire-level oracle this PR was closing was closed differently — and better — by the spec migration.

Independent issues this PR had

Reviewed by security-reviewer-deep and ad-tech-protocol-expert-deep. Findings that would have blocked merge regardless of the supersession:

  • Wire leak of agent_url (permission_denied.py:240): the PR ships details.agent_url in plaintext on recognized branches, directly contradicting its own indistinguishability claim. Tests at test_tier2_parity_contract.py:191-205 actively pin this divergence.
  • ADCP_PERMISSION_DENIED_BUDGET_MS=0 silently disables the timing-oracle defense — no warning, no fallback.
  • nan returns nan (no protection); inf hangs dispatch indefinitely (DoS via asyncio.sleep(inf)).
  • DEBUG-log path leaks scope / status in plaintext when audit_sink=None, defeating the audit-hashing claim.
  • 5 s sink timeout vs 50 ms budget — 100× asymmetry reopens the timing oracle at a 5 s margin if the sink times out on one branch.
  • 48-bit hash truncation is invertible by operators with the candidate URL list — the "log-scraping cannot reconstruct" claim was overstated.
  • No transport-level integration test through ADCPAgentExecutor or build_mcp_error_result. _to_mcp in translate.py leaks the discriminator via the Details: text line.

Residual value moved to #772

The narrow follow-up at #772 captures what remains useful:

  • Latency budget on the two remaining PERMISSION_DENIED branches (unrecognized + unknown-status default-reject), with the env-var hardening this PR was missing.
  • Uniform DenialAuditRecord schema across all four denial branches (incl. the dedicated-code paths) — SecOps telemetry parity, not wire parity.
  • HMAC-or-plaintext-with-documented-threat-model for the audit agent_url (rejects 48-bit truncation).
  • Transport-level integration tests.
  • Estimated ~150 LoC vs. this PR's 1,106.

The parent issue #392 will also close, since its 4-numbered design list is largely obsoleted by #748 and the residue is covered by #772.

Thanks to the original implementing agent for the careful scoping note — the "holding for human security review" comment was the right call, and the review surfaced the supersession.

@bokelley
Copy link
Copy Markdown
Contributor Author

Acknowledged — thanks for the detailed write-up. The supersession by #748 (dedicated wire codes + recovery="terminal") and the residual capture in #772 make sense. The "holding for human security review" flag in this PR did its job; the reviewer findings you listed above are exactly the kind of blockers that gate should catch. No further action needed from triage on this branch.


Triaged by Claude Code. Session: https://claude.ai/code/session_01SV2WtK4kVkhhErSe8Eeind


Generated by Claude Code

bokelley added a commit that referenced this pull request May 21, 2026
…#772) (#774)

Adds PermissionDeniedBudget to floor both PERMISSION_DENIED branches in
_resolve_buyer_agent (registry-miss / no-credential, and unknown-status
default-reject) at the same deadline. Registry I/O variance between the
two branches (cache-hit-returning-None vs. real-row read) is now absorbed
into a fixed budget rather than leaking as a latency oracle.

Scope is intentionally narrow per #772 (extracted from the larger #735
that was superseded by #748's dedicated-code migration):

- Budget applies only to PERMISSION_DENIED. AGENT_SUSPENDED / AGENT_BLOCKED
  skip the budget — the code itself is the discriminator, so latency
  parity carries no additional bit.
- Env-tunable via ADCP_PERMISSION_DENIED_BUDGET_MS (default 50 ms).
- Fails closed on misconfiguration: 0, negative, nan, inf, and non-numeric
  values fall back to the default and log WARNING. The defense is never
  silently disabled.
- Deadline-relative sleep using perf_counter() so audit/registry latency
  variance is absorbed into the budget rather than added on top.

The audit-hygiene / DenialAuditRecord work (#772 PR B) is descoped — no
concrete SecOps consumer asking for it; can be re-filed if one surfaces.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tier 2 commercial-identity gate: latency / headers / side-effects parity contract

1 participant