Skip to content

fix(signing): Argus follow-ups on #789 — IDNA + IP literals, Protocol contract, misconfig warnings, diagnostic hygiene#798

Merged
bokelley merged 5 commits into
mainfrom
bokelley/signing-followups-argus
May 22, 2026
Merged

fix(signing): Argus follow-ups on #789 — IDNA + IP literals, Protocol contract, misconfig warnings, diagnostic hygiene#798
bokelley merged 5 commits into
mainfrom
bokelley/signing-followups-argus

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

All non-blocking follow-up items from the two Argus reviews on PR #789, bundled into one PR with four cohesive commits so reviewers can read the story incrementally.

# Commit Items
1 1bda50d2 IDNA hygiene: centralized helper, IP literal short-circuit, transitional=False explicit, single trailing-dot strip, bare IPv6 brackets
2 6210c8bf BrandSourcedJwksResolver Protocol (runtime_checkable), tightened ClassVar[Literal["brand_json"]], moved _BrandJsonStaticJwksResolver above its caller
3 e5457a70 UserWarning + DeprecationWarning on misconfig (brand-json source without origins; legacy resolver with origins set)
4 733ed4e4 _extract_key_origins 512-byte per-entry cap, cold-cache jwks_uri=NoneREQUEST_SIGNATURE_JWKS_UNAVAILABLE, host-only diagnostic fallback on canonicalization failure

Eleven items handled

From Argus's first-pass review on #789:

  • NICE Pass transitional=False explicitly at all IDNA callsites → commit 1 (centralized to one helper site).
  • FOLLOW-UP UserWarning when expected_key_origins is None AND jwks_source == "brand_json" → commit 3.
  • FOLLOW-UP DeprecationWarning when origins supplied AND no jwks_source → commit 3.
  • FOLLOW-UP Cold-cache jwks_uri=NoneREQUEST_SIGNATURE_JWKS_UNAVAILABLE → commit 4.
  • NIT ClassVar[Literal["brand_json", "publisher_pin"]]Literal["brand_json"] → commit 2.

From Argus's second-pass review on #789:

  • FOLLOW-UP IDNA + IP literals (idna.encode("192.0.2.1") raises in IDNA-2008) → commit 1.
  • FOLLOW-UP _origin_host trailing-dot stripping (rstrip(".") vs docstring's "single trailing dot") → commit 1.
  • FOLLOW-UP runtime_checkable Protocol for the discriminant → commit 2.
  • FOLLOW-UP Move _BrandJsonStaticJwksResolver above its caller → commit 2.
  • NIT Per-entry length cap on _extract_key_origins → commit 4.
  • NIT expected_origin/actual_origin raw-URL fallback shape → commit 4.

Surface changes (additive, no breaking)

  • New module: adcp.signing._idna_canonicalize — private helper consolidating the four prior IDNA callsites (jwks.py, ip_pinned_transport.py, revocation_fetcher.py, key_origins.py).
  • New Protocol: adcp.signing.BrandSourcedJwksResolver — runtime_checkable, declares the jwks_source: ClassVar[Literal["brand_json"]] + jwks_uri: str contract. Adopters wiring custom resolvers conform by setting the two attributes; no inheritance required.
  • SignatureVerificationError code surface unchanged. New routing on jwks_uri=None (uses the existing REQUEST_SIGNATURE_JWKS_UNAVAILABLE code).
  • Warnings fire on two misconfig paths that previously silently no-op'd. Adopter-facing in operator logs; no behavior change for correctly-configured callers.

Tests

Suite Count Note
test_verify_from_agent_url.py 21 (+6) Protocol-conformance, warnings (3 cases), length cap, diagnostic host, jwks_uri=None
test_key_origins.py 24 (+3) IPv4/IPv6 literal short-circuit, IP-vs-IDN mismatch
Full signing surface 622 All green. ruff + mypy clean.

What's deferred (still open as separate concerns)

None — the eleven items here were the explicit follow-ups Argus flagged on #789. Future v3-identity work tracked under #350 was already closed.

Test plan

  • CI green on Python 3.10–3.13
  • Argus re-review on the IDNA helper + Protocol surface
  • Downstream import smoke for BrandSourcedJwksResolver

🤖 Generated with Claude Code

bokelley and others added 4 commits May 22, 2026 07:48
…ral short-circuit

Argus follow-ups on #789 — three IDNA-hygiene items consolidated into
one helper used by all four callsites.

1. IP literal short-circuit. idna.encode('192.0.2.1', uts46=True)
   raises in IDNA-2008 mode (rejects purely-numeric labels). Adopters
   on allow_private=True dev setups with IP-literal JWKS URIs regressed
   to SSRFValidationError after #789. Fix: gate with
   ipaddress.ip_address before IDNA; v4 + v6 (bracketed and naked) pass
   through. IPv6 compressed-canonical normalization happens via
   str(ipaddress.IPv6Address(...)).

2. transitional=False explicit at the single helper callsite.
   Default in idna>=3.x is already False (Eszett-preserving), but
   pinning documents intent and locks against future upstream default
   flip. (Note: idna package spells the kwarg 'transitional', not the
   UTS#46-document 'transitional_processing'.)

3. Single trailing-dot stripping. Previous rstrip('.') stripped
   arbitrary chains; docstring claimed single dot. Now matches.

Plus a bare-IPv6 fix in _extract_host: urlsplit('https://2001:db8::1')
interprets the first ':' as port separator. Detect bare IPv6 via
ipaddress.ip_address and bracket-wrap before urlsplit.

New module: src/adcp/signing/_idna_canonicalize.py (canonicalize_host).
Four callsites now delegate: jwks.py, ip_pinned_transport.py,
revocation_fetcher.py, key_origins.py.

3 new IP-literal regression tests in test_key_origins.py.
614 tests across impacted surface remain green. ruff + mypy clean.

Refs Argus second-pass review on #789.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s_source annotation

Argus follow-ups on #789 grouped by jwks_source contract.

1. BrandSourcedJwksResolver — runtime_checkable Protocol surfacing
   the duck-typed jwks_source + jwks_uri contract as a typed
   predicate. Verifier-side isinstance() checks work, and adopters
   declaring custom brand.json-walking resolvers can opt in by
   setting the two attributes without inheriting from this Protocol
   (typing.Protocol's structural conformance). Lives next to
   JwksResolver / AsyncJwksResolver in adcp.signing.jwks; re-exported
   from adcp.signing.

2. Tighten ClassVar[Literal[...]] annotation on both concrete
   classes from Literal['brand_json', 'publisher_pin'] to
   Literal['brand_json']. Each class produces exactly one value;
   the wider annotation would let mypy miss a future reassignment
   typo. Style only, no runtime effect.

3. Move _BrandJsonStaticJwksResolver above its caller in
   agent_resolver.py. Functionally safe under Python's call-time
   name resolution but brittle to refactor. Now sits right after the
   '# ---- verify factory ----' divider where readers expect helpers
   to live.

Two new Protocol-conformance tests in test_verify_from_agent_url.py:
- _BrandJsonStaticJwksResolver isinstance-conforms.
- A bare StaticJwksResolver does NOT conform (back-compat skip path
  for adopter resolvers predating the discriminant).

87 tests across impacted surface remain green.

Refs Argus second-pass review on #789.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Argus first-pass review follow-ups on #789. Two warnings on misconfig
paths that previously silently no-op'd:

1. UserWarning when jwks_source='brand_json' AND
   expected_key_origins is None. The resolver IS brand-json-sourced
   but the caller didn't surface the operator's declared
   identity.key_origins map, so the spec-mandated check skips. The
   adopter sees the warning in operator logs and threads the origins
   through VerifyOptions.

2. DeprecationWarning when expected_key_origins is supplied AND the
   resolver has no jwks_source attribute. Adopter upgraded the SDK
   but their custom resolver predates the discriminant. The SDK
   silently downgrades to no-check; warning gives the upgrade signal
   ('set jwks_source="brand_json" on the resolver class or conform
   to BrandSourcedJwksResolver').

Three new tests in test_verify_from_agent_url.py:
- brand_json-source + no-origins emits UserWarning
- legacy-resolver + origins emits DeprecationWarning
- happy-path (brand_json + origins) emits neither

18 tests across impacted surface remain green.

Refs Argus first-pass review on #789.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…one to JWKS_UNAVAILABLE, host-only diagnostic fallback

Three remaining Argus follow-ups from #789.

1. Per-entry length cap on _extract_key_origins. Each origin value
   is bounded at 512 bytes — well above any legitimate
   scheme+host+port shape but tight enough that a pathological
   multi-kilobyte entry from the 64 KiB capabilities body doesn't
   propagate through downstream comparisons. Oversized entries are
   SKIPPED (not truncated — a truncated host would silently match
   the wrong domain). Constant
   _MAX_KEY_ORIGIN_VALUE_BYTES = 512 documents the choice.

2. Cold-cache jwks_uri=None routes to REQUEST_SIGNATURE_JWKS_UNAVAILABLE.
   Previous behavior coerced None to '' and routed through the
   mismatch path with empty actual_origin — a resolver-side I/O
   failure misclassified as adversarial origin-mismatch on
   dashboards. Now raises JWKS_UNAVAILABLE with
   detail={'purpose': signing_purpose} so the cold-cache shape
   aggregates with other resolver-fetch failures.

3. Diagnostic host-only fallback on canonicalization failure. When
   _origin_host can't canonicalize one side, the mismatch detail's
   expected_origin / actual_origin values must still be HOST-SHAPED
   — previous fallback leaked the full raw URL into the host-labeled
   field, inconsistent with the success-path host-only shape. New
   _diagnostic_host helper falls through to _extract_host (the same
   URL/bare-host parser the canonicalization step uses) for a
   best-effort host, empty string at worst.

Three new tests in test_verify_from_agent_url.py:
- _extract_key_origins skips oversized entries
- mismatch detail uses host-only fallback (no URL leakage)
- jwks_uri=None routes to JWKS_UNAVAILABLE

622 tests across impacted surface remain green. ruff + mypy clean.

Refs Argus second-pass review on #789 (cold-cache routing, detail
shape, per-entry length cap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/adcp/signing/jwks.py Fixed
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes May 22, 2026
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Follow-ups noted below. Eleven items consolidated into four cohesive commits, conventional-commit semver signal is right on each, behavior changes are strict refinements over #789's silent no-op paths.

Things I checked

  • Public surface, additive only. BrandSourcedJwksResolver added to adcp.signing.__init__ (re-exported correctly at line 211 + 324). feat(signing): is the right semver — additive Protocol export, no removals, no required-field flips. The ClassVar[Literal["brand_json", "publisher_pin"]]Literal["brand_json"] narrowing on BrandJsonJwksResolver.jwks_source and _BrandJsonStaticJwksResolver.jwks_source is a static-only tightening; runtime value was always "brand_json". ad-tech-protocol-expert: sound — publisher-pin is verifier-side trust posture, modeling it as resolver-side absence is the right boundary.
  • IDNA helper byte-equality on IP path. 192.0.2.1 vs 192.0.2.10 produce distinct str(IPv4Address) outputs; bracketed/bare/compressed IPv6 forms converge symmetrically through _extract_host bracket synthesis + canonicalize_host bracket stripping. Tests at tests/test_key_origins.py:343-372 pin both the match direction and the IP-vs-IDN fail-closed direction.
  • jwks_uri=None reclassification is a strict refinement. verifier.py:579-598 raises JWKS_UNAVAILABLE where the prior code coerced to "" and routed through mismatch. Both arms raise SignatureVerificationError — no per-code retry/skip branch, no auth-bypass surface. step=7 preserved. security-reviewer: no High/Medium.
  • Warning categories. UserWarning for adopter-side missing expected_key_origins (stdlib idiom for runtime misuse) + DeprecationWarning for legacy-resolver upgrade signal (visible-by-default under pytest / __main__). Stacklevel=2 on both is correct.
  • Length cap is fail-closed. agent_resolver.py:373 skip-not-truncate is right — a truncated host silently matches the wrong domain. Verifier then surfaces the purpose as missing on the consistency check.
  • Diagnostic detail is host-shaped on both branches. _diagnostic_host falls through to _extract_host, which returns only urlsplit(...).hostname — never path/query/fragment. Test at test_verify_from_agent_url.py pins \"/\" not in detail[\"expected_origin\"].
  • Warning messages are static. No dynamic hostnames/URIs in either warnings.warn text — no trust-boundary leak.

Follow-ups (non-blocking — file as issues)

  • _extract_host covers schemeless bare IPv6, not URL-form bare IPv6. key_origins.py:253-265 bracket synthesis runs only when parts.hostname is falsy from the first urlsplit. A caller passing https://2001:db8::1/jwks.json (URL form, no brackets — malformed per RFC 3986 but observed in the wild) gets hostname=\"2001\" truthy, returns early, and downstream comparison silently uses \"2001\". Either move the IPv6 detection above the parts.hostname check, or document that URL-form-without-brackets is out of scope.
  • canonicalize_host lowercases via Python str.lower() before idna.encode(uts46=True). UTS#46 has its own case-mapping table; for code points where Python str.lower() and UTS#46 diverge (Greek final-sigma, locale-sensitive forms) the new pipeline produces different bytes than the old idna.encode(...).decode(\"ascii\").lower() ordering. Eszett is in both maps so the existing regression test passes — add a Greek-Sigma regression to pin the assumption.
  • Origin semantics worth checking against the TS SDK. RFC 6454 origin is scheme+host+port; this SDK emits host-only on both branches (which IS internally consistent — the point of the diff). Worth confirming the reference TypeScript SDK's expected_origin/actual_origin shape so the two SDKs don't surface different fields for the same error code on the wire.
  • Scope-id IPv6 (fe80::1%eth0) passes through canonicalize_host unmodified. Python 3.9+ ipaddress.ip_address accepts scope IDs; the helper round-trips the %eth0 suffix. Not a credible attack on the comparison path (self-matches), but if a brand.json pin smuggles a scope onto a JWKS URI, downstream socket.getaddrinfo honours it on Linux. Strip %... before parse, or reject scoped IPs at canonicalization.
  • warnings.warn per-call under load. Default once-per-location filter dedupes in operator logs, but a misbehaving resolver invoked at request rate still pays the warn-site overhead. Optional: gate behind an lru_cache(maxsize=1) keyed on id(resolver).
  • L1 (security-reviewer): _diagnostic_host passes attacker-supplied host strings (control chars, ANSI escapes) through to log surfaces unfiltered. Rely on log-layer sanitization, or assert ASCII-printable before returning.

Minor nits (non-blocking)

  1. Magic-constant doc-pin. _idna_canonicalize.py:80str(ip) returns IPv6 without brackets; add one sentence to the docstring so a future caller doesn't reintroduce them. Cite the asymmetry with urlsplit (also strips).
  2. Test filter loosens by substring. test_no_warning_when_both_attributes_align (test_verify_from_agent_url.py) filters caught warnings on \"jwks_source\" in str(w.message). Both new warning texts contain that substring so the test does what it claims, but assert caught == [] after simplefilter(\"always\") would catch any future unrelated warning regression on the happy path.
  3. Clarify the source is None branch. verifier.py:566 correctly skips the DeprecationWarning for explicit \"publisher_pin\" values; the docstring at line 542 covers it, but a one-line comment at line 566 would prevent a future "fix" from broadening the warning to any non-brand_json value.

Notable: this is the fourth signing-hygiene PR consolidating Argus follow-ups — the loop is doing what it's supposed to do, but worth a glance at whether the issue tracker has the residual items (#350 closed per PR description; if there's a follow-up issue for the items above, link them).

Approving on the strength of internal-consistency wins (mismatch-detail shape, IP-literal canonicalization symmetry, taxonomic JWKS_UNAVAILABLE re-routing) plus the new Protocol surfacing the duck-typed contract as a typed predicate.

…ence linter

github-code-quality flagged def __call__(...) -> ...: ... on the
new Protocol method as 'statement has no effect.' The ellipsis IS
the function body (standard Python pattern for abstract Protocol
methods — same shape as the three other Protocol __call__ methods
in this module). A one-line docstring documents intent and silences
the false positive without changing semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean follow-up bundle — eleven items in four well-scoped commits, each citing its parent review thread. Centralizing the four IDNA callsites into canonicalize_host is the right shape, and the BrandSourcedJwksResolver Protocol surfaces the duck-typed contract as a typed predicate without breaking the back-compat skip-on-absence path.

security-reviewer: no Critical/High. All seven threat questions resolve to not-exploitable — [example.com] bracket-then-IDNA fall-through fails closed in both callers; lowercase-before-IP-check is fine because ipaddress.ip_address is strict ASCII; f-string interpolation of signing_purpose is not a format-string surface; cold-cache JWKS_UNAVAILABLE routing preserves fail-closed under all postures.

ad-tech-protocol-expert: sound-with-caveats. Discriminant narrowing to Literal[\"brand_json\"] matches spec wording (publisher-pin is a step-6 routing carve-out, not a jwks_source value on the resolver). JWKS_UNAVAILABLE for cold-cache is the spec-correct code per security.mdx error table.

Things I checked

  • canonicalize_host (src/adcp/signing/_idna_canonicalize.py:44) — IP-literal short-circuit handles v4, bare v6, bracketed v6; str(ipaddress.IPv6Address(...)) yields RFC 5952 compressed form so cross-implementation byte-equality holds. transitional=False explicit (kwarg spelling matches the idna package, not the UTS#46 doc).
  • All four callsites converted: jwks.py:235, ip_pinned_transport.py:117, revocation_fetcher.py:389, key_origins.py:221. Fallback shapes are consistent (raise → caller catches; or return None / raw-lowered).
  • _maybe_check_key_origin (src/adcp/signing/verifier.py:548-604): warnings fire on the two misconfig paths with stacklevel=2, jwks_uri=None raises REQUEST_SIGNATURE_JWKS_UNAVAILABLE with step=7 and detail={\"purpose\": ...}. Happy path is unchanged.
  • _extract_key_origins 512-byte cap skips rather than truncates (truncated host would silently match the wrong domain — right call). Body is already capped at 64 KiB upstream so the total-entries bound is implicit.
  • _diagnostic_host (src/adcp/signing/key_origins.py:167) routes the canonicalization-failure path through _extract_host so the mismatch detail stays host-shaped rather than leaking full URLs into the host-labeled field.
  • BrandSourcedJwksResolver re-exported from src/adcp/signing/__init__.py:211, 324. feat(signing): is the right semver — additive, no breaking change. _BrandJsonStaticJwksResolver and BrandJsonJwksResolver both pass isinstance(x, BrandSourcedJwksResolver); bare StaticJwksResolver does not (test at tests/test_verify_from_agent_url.py:524).
  • 622 tests across the impacted surface — claim is plausible given the file scope.

Follow-ups (non-blocking — file as issues)

  1. Docstring drift on _maybe_check_key_origin. src/adcp/signing/verifier.py:530-531 still says "we fail closed via the mismatch path (actual_origin becomes None)" — that's the pre-PR behavior. New behavior at L580-598 raises REQUEST_SIGNATURE_JWKS_UNAVAILABLE. Tighten the contract wording.
  2. __all__ alphabetical drift. BrandSourcedJwksResolver at src/adcp/signing/__init__.py:324 is wedged between BrandJsonJwksResolver and BrandJsonResolverError. Correct position is after BrandJsonResolverErrorCode. The import block above sorts it correctly; only __all__ is off.
  3. _extract_key_origins skips oversized entries silently. agent_resolver.py:374-376ad-tech-protocol-expert flagged this: a skipped entry surfaces downstream as a missing declaration, which is confusing in triage. Consider a logging.warning (not a warnings.warn — operator-side, not adopter-side) with the purpose name + value length so operators can find the misconfig.
  4. runtime_checkable Protocol value-check semantics. jwks.py:155isinstance(x, BrandSourcedJwksResolver) only checks attribute presence, not value, so a custom resolver setting jwks_source = \"publisher_pin\" would isinstance-conform. The verifier mitigates by re-checking source == \"brand_json\" at verifier.py:565, so not exploitable today. Worth a one-line note in the Protocol docstring that the discriminant value is verified at the callsite, not by isinstance.
  5. Spec-side maxLength on identity.key_origins.*. ad-tech-protocol-expert: the upstream JSON Schema declares format: \"uri\" with no explicit maxLength. 512 is a defensible local clamp but unilateral — worth raising on adcontextprotocol/adcp so the schema and the SDK clamp track each other.

Minor nits (non-blocking)

  1. Weak assertion on the URL-leak test. tests/test_verify_from_agent_url.py:679\"/\" not in detail[\"expected_origin\"] passes even when _extract_host(\"not a host with spaces\") returns \"not\". Tighten to detail[\"expected_origin\"] in (\"\", \"not\") or to an empty-string expectation for unparseable inputs — the current shape doesn't actually pin the no-leak intent the docstring claims.
  2. Redundant .lower() before idna.encode. src/adcp/signing/_idna_canonicalize.py:68 — UTS#46 already case-folds. Harmless for ASCII; for exotic Unicode (Turkish dotted-İ, Greek final-sigma) Python's str.lower() and UTS#46 mapping can diverge. Not exploitable but a future footgun — consider letting uts46=True do the mapping.
  3. "Lowercased to match the other branches" claim is slightly stale. _idna_canonicalize.py:57-58 docstring step 4 promises a .lower() after idna.encode().decode(\"ascii\") that the code doesn't actually call (the encoder already returns lowercase A-labels). Behavior is correct, prose is half a step ahead. Cosmetic.

Safe to merge once CI green on 3.10–3.13.

@bokelley bokelley merged commit fcb64fc into main May 22, 2026
23 checks passed
@bokelley bokelley deleted the bokelley/signing-followups-argus branch May 22, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant