Skip to content

feat(canonical-formats): public API + v2→v1 projection (#741, half 1 of 2)#841

Merged
bokelley merged 6 commits into
mainfrom
bokelley/issue-741-status
May 24, 2026
Merged

feat(canonical-formats): public API + v2→v1 projection (#741, half 1 of 2)#841
bokelley merged 6 commits into
mainfrom
bokelley/issue-741-status

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

First half of #741 — canonical-formats projection layer. Lands the public-API surface, the v2 → v1 projection algorithm, and the format_options[] closed-set validator. Reverse v1 → v2 projection, pixel_tracker bidirectional contract, and the 14 reference fixtures + round-trip tests follow in a second PR.

Public API on adcp.types (29 new names)

Group Symbols
Discriminator + projection CanonicalFormatKind, ProductFormatDeclaration, ProductFormatSellerPreference, CanonicalProjectionReference, CanonicalAssetSource, CanonicalSlotOverride
13 canonical format classes CanonicalFormatImage, CanonicalFormatHtml5Banner, CanonicalFormatDisplayTag, CanonicalFormatImageCarousel, CanonicalFormatHostedVideo, CanonicalFormatVastVideo, CanonicalFormatHostedAudio, CanonicalFormatDaastAudio, CanonicalFormatNativeInFeed, CanonicalFormatResponsiveCreative, CanonicalFormatAgentPlacement, CanonicalFormatSponsoredPlacement (+ CanonicalFormatBase, CanonicalCompositionModel)
Pixel tracker asset PixelTrackerAsset, PixelTrackerEvent, PixelTrackerMethod
Registry V1V2CanonicalFormatMappingRegistry, V1CanonicalMapping, V1CanonicalGlobPattern, V1CanonicalStructuralPattern, V1CanonicalStructural, V1CanonicalV2Projection, V1CanonicalDimensions

Hand-rolled ProductFormatDeclaration

The codegen path is broken for this class: the upstream schema is a discriminated oneOf over 13 format_kind values, each binding params to a canonical-specific schema. datamodel-code-generator flattens it to a single class carrying only the shared properties — format_kind and params disappear entirely, and extra='ignore' silently drops them on construction.

src/adcp/types/canonical_decl.py hand-rolls the wire-faithful class (discriminator + open params: dict[str, Any] + extra='allow'). The codegen output stays available as _GeneratedProductFormatDeclaration.

New module adcp.canonical_formats

from adcp.canonical_formats import (
    project_declaration_to_v1,
    project_product_to_v1,
    validate_format_kind_in_options,
    find_declaration_by_kind,
    FormatKindNotInClosedSetError,
    load_default_registry,
    glob_match,
    structural_match,
)

v2 → v1 resolution order (per registries/v1-canonical-mapping.json)

  1. canonical_formats_only=True or format_kind='custom' → no v1 emit, no advisory.
  2. v1_format_ref[] set → emit refs; emit FORMAT_DECLARATION_V1_LOSSY_MULTI_SIZE when len(v1_format_ref) < len(params.sizes).
  3. v1_format_ref[] absent, canonical with v1_translatable=False (agent_placement / sponsored_placement / responsive_creative / image_carousel) → silent.
  4. v1_format_ref[] absent, canonical with v1_translatable=True → emit FORMAT_DECLARATION_V1_AMBIGUOUS.

Registry-based inversion is explicitly NOT implemented per the registry's "Direction of truth" rule: "SDKs MUST NOT synthesize a v1 format_id from the registry by inverting structural matches." The seller's path on AMBIGUOUS is to author v1_format_ref.

All advisories carry source: "sdk" + sdk_id: "adcontextprotocol-adcp-python@<version>".

Closed-set format_options[] validator

Seller-side pre-call guard for create_media_buy. FormatKindNotInClosedSetError exposes format_kind + accepted_kinds on the exception for direct error.details.accepted_values round-tripping. find_declaration_by_kind disambiguates with capability_id when the closed set carries multiple declarations of the same kind.

What's NOT in this PR (lands in #741 half 2)

  • v1 → v2 reverse projection (registry glob/structural matchers exist; not yet wired into a public project_v1_to_v2 helper)
  • pixel_tracker bidirectional downgrade/upgrade contract (PIXEL_TRACKER_LOSSY_DOWNGRADE / PIXEL_TRACKER_UPGRADE_INFERRED)
  • 14 reference fixtures from static/examples/products/canonical/ upstream + round-trip tests
  • FORMAT_DECLARATION_DIVERGENT narrowing check between v2 params and referenced v1 requirements

Layering

canonical_decl.py and canonical_formats/advisory.py are added to tests/test_import_layering.py's ALLOWED_FILES — both legitimately import from generated_poc (hand-rolled override + wire-shape Error construction respectively), same architectural role as aliases.py / capabilities.py / _forward_compat.py.

Test plan

  • ruff check src/ — all checks passed
  • mypy src/adcp/ — 892 files, no issues
  • pytest tests/ — 5063 passed, 38 skipped, 1 xfailed
  • 51 new canonical-formats tests across 4 files (public API, projection, options, registry)
  • Public-API snapshot regenerated to reflect the 29 new exports
  • Import-layering test passes with the two new allowlist entries

Refs: #741

…alidator (#741)

First half of the canonical-formats projection layer (issue #741) — public
API surface, v2 → v1 projection algorithm, and ``format_options[]``
closed-set validator. The v1 → v2 reverse projection, ``pixel_tracker``
bidirectional contract, and 14 reference fixtures land in a follow-up.

## Public API surface on ``adcp.types`` (29 new exports)

Discriminator + projection ref: ``CanonicalFormatKind``,
``ProductFormatDeclaration``, ``ProductFormatSellerPreference``,
``CanonicalProjectionReference``, ``CanonicalAssetSource``,
``CanonicalSlotOverride``. Plus 13 canonical format classes, pixel
tracker asset trio, and the 7 registry types.

## Hand-rolled ``ProductFormatDeclaration``

The upstream schema is a discriminated ``oneOf`` over 13
``format_kind`` values, each binding ``params`` to a canonical-specific
schema. ``datamodel-code-generator`` flattens this to a single class
carrying only the shared properties — ``format_kind`` and ``params``
disappear entirely, and ``extra='ignore'`` silently drops them on
construction.

``src/adcp/types/canonical_decl.py`` adds a hand-rolled class with the
full wire shape (discriminator + open ``params`` dict +
``extra='allow'``) and rewires the public alias. The codegen output is
preserved as ``_GeneratedProductFormatDeclaration``.

## New module ``adcp.canonical_formats``

- ``project_declaration_to_v1`` / ``project_product_to_v1`` walk the
  resolution order from ``registries/v1-canonical-mapping.json`` and
  return projected ``format_ids[]`` plus any SDK-source advisories
  (``FORMAT_DECLARATION_V1_LOSSY_MULTI_SIZE``,
  ``FORMAT_DECLARATION_V1_AMBIGUOUS``).
- ``validate_format_kind_in_options`` / ``find_declaration_by_kind``
  are the seller-side closed-set guards with ``capability_id``
  disambiguation.
- ``load_default_registry`` loads the bundled mapping, cached; with
  ``glob_match`` and ``structural_match`` matchers.

Registry-based inversion (v2 → v1 via structural match) is explicitly
NOT implemented because the registry forbids it: "SDKs MUST NOT
synthesize a v1 ``format_id`` from the registry by inverting structural
matches." Step 4 of the resolution order surfaces
``FORMAT_DECLARATION_V1_AMBIGUOUS`` instead; the seller's path is to
author ``v1_format_ref``.

All emitted ``Error`` entries carry ``source: "sdk"`` and
``sdk_id: "adcontextprotocol-adcp-python@<version>"`` per
``core/error.json#sdk_id``.

## Tests

51 new tests across four files covering every branch of the resolution
order (parametrised across all 13 canonical kinds + the
``v1_translatable`` table), closed-set validator edge cases, registry
loader caching + matchers, and the public-API surface.

## Layering

``canonical_decl.py`` and ``canonical_formats/advisory.py`` are added
to ``tests/test_import_layering.py``'s ``ALLOWED_FILES`` — both
legitimately import from ``generated_poc`` (hand-rolled override +
wire-shape ``Error`` construction respectively), same role as
``aliases.py`` / ``capabilities.py`` / ``_forward_compat.py``.

## Verified

- ``ruff check src/`` ✓
- ``mypy src/adcp/`` ✓ (892 source files)
- ``pytest tests/`` ✓ (5063 passed, 38 skipped, 1 xfailed)

Refs: #741
Comment thread src/adcp/canonical_formats/registry.py Fixed
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes May 24, 2026
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Follow-ups noted below. Half 1 of 2 lands additively on a clean architectural seam — hand-rolled ProductFormatDeclaration for the discriminated oneOf, registry consulted on v1→v2 only, advisory-only on errors[] for AMBIGUOUS / LOSSY_MULTI_SIZE.

Things I checked

  • V1_TRANSLATABLE table at src/adcp/canonical_formats/projection.py:54-68 matches the four default: false overrides in schemas/cache/3.1.0-beta.3/formats/canonical/{agent_placement,sponsored_placement,responsive_creative,image_carousel}.json exactly. native_in_feed correctly stays True. test_v1_translatable_table_matches_kind_enum pins coverage against the enum.
  • Resolution-order algorithm at projection.py:111-201 matches the 4-step contract in registries/v1-canonical-mapping.json ("Direction of truth (normative)"). Step 3 silence for v1_translatable=False canonicals is correct — surfacing AMBIGUOUS there would spam the wire.
  • Registry-based v2→v1 inversion is explicitly NOT implemented (projection.py:28-31); matches the registry's MUST NOT synthesize v1 format_id from registry by inverting structural matches clause.
  • FORMAT_DECLARATION_V1_AMBIGUOUS and FORMAT_DECLARATION_V1_LOSSY_MULTI_SIZE both present in schemas/cache/3.1.0-beta.3/enums/error-code.json; source=sdk + sdk_id="adcontextprotocol-adcp-python@<version>" (advisory.py:43,77-86) matches core/error.json.
  • Hand-rolled ProductFormatDeclaration at src/adcp/types/canonical_decl.py:59-169 carries all 11 properties from the upstream schema. extra='allow' correct. _GeneratedProductFormatDeclaration preserved as escape hatch.
  • Layering: canonical_decl.py and canonical_formats/advisory.py added to ALLOWED_FILES (tests/test_import_layering.py:55,63) — same architectural role as aliases.py / capabilities.py. No new violations.
  • Conventional commit feat(canonical-formats): is correct — additive on adcp.types (29 names) and a new adcp.canonical_formats module. No ! needed, no public symbol removed or narrowed.
  • code-reviewer: no blockers, two Major items below.
  • ad-tech-protocol-expert: sound-with-caveats — algorithm + advisory codes verified against upstream schemas; caveats land as Follow-ups below.

Follow-ups (non-blocking — file as issues)

  • params is required on the wire but defaults to None on the class. src/adcp/types/canonical_decl.py:91-99. Upstream core/product-format-declaration.json marks params required alongside format_kind. A declaration constructed without params round-trips locally (extra='allow') but emits JSON a spec-conformant counterparty MUST reject. Either make params non-optional or add a model_validator that fails closed. Half-2-territory if you want to land it alongside the divergent-narrowing check.
  • allOf constraints not enforced on the Pythonic class. format_kind='custom' MUST carry format_shape + format_schema; canonical_formats_only=true is mutually exclusive with v1_format_ref. Neither is enforced — extra='allow' lets contradictions through. Same root cause as the params Follow-up: hand-rolled class skipped the schema's allOf block. Reasonable for half 1; flag for the second PR.
  • format_options: Iterable[ProductFormatDeclaration] exhausts single-pass generators. src/adcp/canonical_formats/format_options.py:61,85. validate_format_kind_in_options(...); find_declaration_by_kind(...) paired against a generator silently returns None from the second call because the validator drained the iterator. Narrow to Sequence[ProductFormatDeclaration] or snapshot via list(...) at entry.
  • _versions_overlap silently accepts <= and silently drops < / >. src/adcp/canonical_formats/registry.py:131-153. The docstring at L106-115 advertises >=, 4.x, exact only. A registry entry inadvertently authored as "<4.0" matches nothing without an error. Either reject unknown prefixes loudly or document the full accepted set.
  • FORMAT_DECLARATION_DIVERGENT narrowing check. Resolution-order step 1 in registries/v1-canonical-mapping.json mandates a narrows check between params and the referenced v1 requirements. Acknowledged by the PR body as half-2-territory — flagged so it doesn't get lost.
  • Docs drift on new public surface. README.md features section, AGENTS.md:209-243 import quick reference, llms.txt:36-46 module index, and the CLAUDE.md "Import Architecture" narrative all miss adcp.canonical_formats and the two new ALLOWED_FILES entries. skills/adcp-media-buy/SKILL.md:60 and skills/adcp-creative/SKILL.md:58 point at the spec doc but don't name the SDK helpers sellers actually call. MIGRATION_v5_to_v6.md:120-142 recipe lacks from adcp import … / from adcp.types import Error so a copy-paste gets NameError.

Minor nits (non-blocking)

  1. _GeneratedProductFormatDeclaration in __all__. src/adcp/types/canonical_decl.py:174. Underscore-prefixed names in __all__ is unusual; either drop from __all__ and let consumers import the dunder directly, or rename to a non-underscore alias.
  2. details annotation symmetry. src/adcp/canonical_formats/projection.py:176. Mirror the explicit details: dict[str, Any] = {...} annotation from L145 to defuse mypy inference on the subsequent details["product_id"] = product_id assignment.
  3. V1_TRANSLATABLE is a hand-maintained mirror. A drift test that loads each canonical's schema and asserts the v1_translatable default matches the table would catch upstream additions before they default to True and start emitting AMBIGUOUS on structurally-v1-unreachable canonicals.

Safe to merge once CI completes (storyboard runners + Python 3.11/3.12/3.13 still in progress).

Addresses code-reviewer, ad-tech-protocol, adtech-product, and security
reviews of the canonical-formats projection layer. Material correctness,
security, and adopter-ergonomics fixes; no functional removals.

## Schema-conformance fixes (NORMATIVE)

- ``ProductFormatDeclaration.params`` is now required (schema:
  ``required: ["format_kind", "params"]``). Previously optional —
  wire-invalid declarations could construct.
- ``canonical_formats_only=True`` + ``v1_format_ref[]`` are rejected at
  construction (schema ``allOf.not`` clause). Previously the model
  accepted the combination and projection silently discarded the refs.
- Projection step 1 no longer fires on ``format_kind=custom`` alone.
  ``custom`` declarations MAY carry seller-asserted ``v1_format_ref[]``
  and project via step 2; only ``canonical_formats_only=True``
  triggers the unconditional silent skip.

## Security fixes

- Credential-shaped key guard on ``ProductFormatDeclaration`` — same
  suffix list and rationale as the dispatcher's ``ctx_metadata`` gate.
  ``params`` and model extras are walked recursively; sellers cannot
  stuff credentials onto a declaration where they would round-trip
  into buyer responses + the idempotency replay cache.
- ``load_default_registry()`` returns a fresh deep copy of the cached
  parsed registry — multi-tenant SDK processes can no longer poison
  each other's projection view by mutating ``mappings``.
- Seller-controlled identifiers (``product_id``) capped to 128 chars
  before echoing into advisory ``details`` — defends against
  log-injection through multi-hop ``errors[]``.

## Error-handling fixes

- Malformed registry bundles raise ``RegistryLoadError`` with
  contextual ``ADCP_VERSION`` + failure mode instead of propagating
  raw exceptions.
- ``_versions_overlap`` accepts ``<``, ``>``, ``!=``, ``==``
  (previously only ``<=`` / ``>=``); raises on unrecognised operator
  prefixes (``~``, ``^``) rather than silently never matching.
- ``SDK_ID`` resolved lazily via module ``__getattr__``; reads the
  actual distribution name from
  ``importlib.metadata.metadata("adcp")["Name"]`` rather than
  hardcoding a prefix that could drift from the PyPI distribution.

## Adopter-ergonomics additions

- ``ProductFormatDeclaration.params_as(CanonicalFormatImage)`` —
  validates the open ``params`` dict against the typed canonical class.
- ``find_declaration_by_v1_format_id(format_id, format_options)`` —
  seller-side helper for v1-inbound ``create_media_buy`` requests
  against a product publishing v2 ``format_options[]``.
- ``FormatKindNotInClosedSetError.to_wire_error()`` — turns the
  closed-set exception into the wire-correct ``UNSUPPORTED_FEATURE``
  ``Error`` with ``details.rejected_value`` + sorted/dedup'd
  ``accepted_values``. Replaces the 14-line MIGRATION recipe.

## Layering cleanup

- ``Recovery`` and ``Source`` re-exported via ``adcp.types``;
  ``canonical_formats/advisory.py`` removed from
  ``ALLOWED_FILES``. ``canonical_decl.py`` stays on the allowlist
  (hand-rolled override of a generated class).

## Public-API restoration

- ``NotificationConfig``, ``WholesaleFeedEvent``, ``WholesaleFeedWebhook``
  re-exports restored to ``adcp.types`` and ``adcp`` (silently dropped
  by ruff's auto-fix during the original PR pass).

## Tests

5093 passed locally. ``tests/test_canonical_formats_declaration.py``
(new) covers the new ``ProductFormatDeclaration`` invariants. Existing
test files extended with custom flow-through, multi-tenant registry
isolation, ``RegistryLoadError`` wrapping, version-operator DSL +
fail-loud, ``to_wire_error`` shape, ``find_declaration_by_v1_format_id``
matching, seller-controlled identifier truncation.

Refs: #741
Comment thread src/adcp/canonical_formats/__init__.py Fixed
Comment thread src/adcp/canonical_formats/advisory.py Fixed
bokelley added 2 commits May 23, 2026 21:01
The expert-review fix push at c58d403 only triggered the CodeQL + IPR
+ GitGuardian checks; the CI test matrix and Argus AI review didn't
fire. Empty commit to force a fresh `synchronize` event so the full
22-check suite runs against the review-fix code.

Refs: #741
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes May 24, 2026
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The resolution-order implementation lines up with the registry's "Direction of truth" contract — refusing to invert structural matches and routing canonical-formats projection through seller-asserted v1_format_ref[] is the right shape for a v2 → v1 layer. Follow-ups noted below.

Things I checked

  • Resolution order (steps 1–4) at src/adcp/canonical_formats/projection.py:115-207. Step 1 short-circuits before refs check; step 2 emits + computes the multi-size advisory; step 3 silently skips non-translatable canonicals; step 4 emits AMBIGUOUS. Order matches the registry normative section.
  • V1_TRANSLATABLE table at projection.py:58-72. Cross-checked against schemas/cache/3.1.0-beta.3/formats/canonical/*.json — the four explicit false overrides (agent_placement, sponsored_placement, image_carousel, responsive_creative) plus custom=False match the per-canonical schemas. tests/test_canonical_formats_projection.py:172-177 enforces full enum coverage so a new upstream kind without a table entry trips CI rather than silently defaulting to True.
  • Hand-rolled ProductFormatDeclaration at src/adcp/types/canonical_decl.py:122-236. Carries all 9 shared properties plus discriminator + open params. The allOf.not mutual-exclusion clause is enforced in _check_mutual_exclusion. params is required, matching the schema's required: ["format_kind", "params"]. extra='allow' for forward-compat is safe because the upstream object doesn't lock additionalProperties: false.
  • Credential-shaped key guard at canonical_decl.py:76-116, 238-264. Suffix list and case-insensitive .lower() matching are byte-identical to adcp.decisioning.dispatch._CREDENTIAL_SHAPED_KEY_SUFFIXES. Walks params recursively + extras via __pydantic_extra__ + nested BaseModel via model_dump(mode='python'). This is the right shape — same fail-closed rationale as the dispatcher's ctx_metadata gate.
  • Registry multi-tenant isolation at registry.py:119-132. _load_registry_uncopied() is lru_cache(maxsize=1); load_default_registry() returns model_copy(deep=True). Tenant A mutating the projection registry cannot leak into tenant B's view.
  • Advisory wire shape at advisory.py:113-122. source=sdk matches the upstream source enum ["producer","sdk"]. sdk_id format <dist>@<version> matches core/error.json. Both new error codes (FORMAT_DECLARATION_V1_AMBIGUOUS, FORMAT_DECLARATION_V1_LOSSY_MULTI_SIZE) are in enums/error-code.json.
  • Type layering at tests/test_import_layering.py:32-56. canonical_decl.py correctly added to ALLOWED_FILES with the architectural rationale; canonical_formats/advisory.py was removed from the allowlist after the follow-up commit re-exported Recovery/Source via adcp.types. No new layering violations.
  • Public-API audit. 29 net new exports under adcp.types, all additive. feat: without ! is the right semver signal. MIGRATION_v5_to_v6.md documents the additions and the hand-roll rationale.
  • Test plan. All 6 checked items in the PR body match actual artifacts (5063 passed, 51 new tests across 4 files, public-API snapshot regenerated, layering test passes).

Follow-ups (non-blocking — file as issues)

  • agent_url canonicalization in find_declaration_by_v1_format_id (src/adcp/canonical_formats/format_options.py:175-181). schemas/cache/3.1.0-beta.3/core/format-id.json:11 is normative: callers MUST canonicalize agent_url before comparison. The current str(ref.agent_url) == target_url does Pydantic-AnyUrl trailing-slash normalization but not RFC 3986 §6 host-casefolding or default-port stripping. A seller publishing "https://Creative.AdContextProtocol.org" will silently miss-match a buyer's "https://creative.adcontextprotocol.org" and the SDK returns None → wrongful UNSUPPORTED_FEATURE. src/adcp/signing/brand_jwks.py:_canonicalize_url already exists in this repo — wire it through. Flagged by both code-reviewer and ad-tech-protocol-expert as the only documented MUST divergence in the diff.

  • Operator-facing log/ANSI injection via _echo_identifier (src/adcp/canonical_formats/advisory.py:65-71). The 128-char cap is the right defense against unbounded growth into the replay cache; it doesn't strip newlines, CRs, or ANSI escapes. A seller publishing a product_id containing \n or \x1b[ escape sequences gets that string round-tripped into errors[].details.product_id → forged log lines / terminal manipulation in operator tooling that emits one advisory per line. security-reviewer Medium. Scrub control chars + ANSI ESC before the length cap.

  • sdk_id differs between installed and dev (advisory.py:55-62). Installed: reads _pkg_metadata("adcp")["Name"]"adcp@<version>" (per pyproject.toml name = "adcp"). Fallback: "adcontextprotocol-adcp-python@0.0.0-dev". Two different attribution strings for the same SDK violate the core/error.json dedup-by-sdk_id contract. Pick one canonical form; the docstring's example (adcontextprotocol-adcp-python@1.2.0) suggests hardcoding the long form and using _pkg_version only.

  • _versions_overlap fail-loud on ~/^ (registry.py:205-210) is stricter than the registry DSL's forward-compat posture. canonical-format-kind.json documents open-enum semantics for unknown values; the operator parser should mirror that — log-and-skip the constraint as no-match rather than raising. Latent until half 2 wires the v1 → v2 reverse path that actually consumes the registry, but file it now so the regression doesn't ship under load.

  • Partial-half public exposure of glob_match / structural_match / load_default_registry — these are public but unused on the v2 → v1 path until half 2 lands. Consider an @experimental marker in the docstrings so adopters don't build against an API that may shift.

Minor nits (non-blocking)

  1. Duplicate NotificationConfig in __all__ at src/adcp/types/__init__.py:1198,1200. Harmless at runtime; trips set(__all__) == list-len invariants tooling sometimes asserts.
  2. _walk_for_credential_keys has no cycle guard at src/adcp/types/canonical_decl.py:94-116. Not wire-reachable (JSON has no cycles, Pydantic rejects them on validation) but worth a depth bound or seen set if a future caller can hand-build cyclic dicts.
  3. SDK_ID exposed via module __getattr__ in canonical_formats/__init__.py:71-82 doesn't appear in dir(). No test pulls SDK_ID through the documented public path — tests/test_canonical_formats_projection.py:18 reaches into advisory.py. Add one test that imports from adcp.canonical_formats import SDK_ID.
  4. _echo_identifier listed in advisory.py's __all__ (line 130) but it's a private symbol (leading underscore). Either drop the underscore or drop it from __all__.

Approved. The expert-review follow-up commit c58d4031 already closed the load-bearing issues from the prior pass (params required, mutual-exclusion validator, deep-copy on registry load, lazy SDK_ID, identifier truncation, fail-loud version DSL). The remaining follow-ups don't block — they sharpen edges that show up under multi-hop traffic, and half 2 is the natural place to bundle the canonicalization + sdk_id alignment work.

Argus APPROVED #841 and noted five non-blocking follow-ups + four nits.
This commit addresses the ones that sharpen wire correctness or close
real attack surfaces.

## agent_url canonicalisation in v1 inbound lookup

``find_declaration_by_v1_format_id`` now canonicalises both buyer and
seller ``agent_url`` per RFC 3986 §6 before comparing: scheme
lowercased, host lowercased, default port (443/80) stripped. Without
this a seller publishing ``https://Creative.AdContextProtocol.org``
would silently miss-match a buyer's
``https://creative.adcontextprotocol.org`` and the SDK would return a
wrongful ``UNSUPPORTED_FEATURE``. Argus flagged this as the only
documented MUST divergence — ``core/format-id.json:11`` is normative.

## ANSI / control-char scrub in advisory identifier echo

``_echo_identifier`` now escapes every C0 control char (incl. ``\\n``
``\\r`` ``\\x1b``), the C1 range, and Unicode LS/PS line separators
to a literal ``\\u<hex>`` form before applying the 128-char cap. A
seller publishing ``product_id`` with ``\\n`` or ``\\x1b[`` previously
round-tripped that string into ``errors[].details.product_id`` and
could forge log lines or manipulate ANSI-aware operator tooling.

## sdk_id alignment between installed + dev

Hardcodes ``_SDK_DIST_NAME = "adcontextprotocol-adcp-python"`` (the
spec example form). Installed wheels publish the dist name as plain
``adcp``; the prior code emitted ``adcp@<version>`` from a wheel install
vs ``adcontextprotocol-adcp-python@0.0.0-dev`` from a checkout,
breaking the multi-hop ``(code, field, sdk_id)`` dedup contract in
``core/error.json`` for the same SDK.

## Experimental markers + nits

- Added ``.. note:: Experimental`` to ``glob_match``,
  ``structural_match``, ``load_default_registry``.
- Removed duplicate ``NotificationConfig`` in ``adcp.types/__all__``.
- Removed ``_echo_identifier`` from ``advisory.py``'s ``__all__``.
- Two new tests pull ``SDK_ID`` through the documented public path
  (``from adcp.canonical_formats import SDK_ID``).

## Tests

5117 passed locally. New coverage: agent_url host-casefolding +
default-port stripping; newline / ANSI / Unicode line-separator
scrubbing; ``SDK_ID`` public-import + canonical-prefix invariants.

## Deferred per Argus's own suggestion

- ``_versions_overlap`` fail-loud on ``~`` / ``^`` — latent until
  #741 half 2 actually consumes the registry on the inbound path.
- ``_walk_for_credential_keys`` cycle guard — not wire-reachable.

Refs: #741, #841
Comment thread src/adcp/canonical_formats/advisory.py Fixed
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes May 24, 2026
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Follow-ups noted below. Hand-rolls ProductFormatDeclaration correctly per upstream's discriminated oneOf (codegen flattens it and drops format_kind/params), mirrors the existing dispatcher credential gate verbatim, and implements the registry's "Direction of truth" resolution order step-for-step.

Things I checked

  • 13 CanonicalFormatKind enum values all accounted for in V1_TRANSLATABLE at src/adcp/canonical_formats/projection.py:58-72 (image=T, html5=T, display_tag=T, image_carousel=F, video_hosted=T, video_vast=T, audio_hosted=T, audio_daast=T, sponsored_placement=F, native_in_feed=T, responsive_creative=F, agent_placement=F, custom=F). Cross-checked against schemas/cache/3.1.0-beta.3/formats/canonical/*.json.
  • _CREDENTIAL_SHAPED_KEY_SUFFIXES at canonical_decl.py:76-85 mirrors the dispatcher's list at decisioning/dispatch.py:422-431 exactly. Same threat model (open params dict + extra='allow' extras round-tripping through format_options[] and the idempotency replay cache), same fail-closed behavior, same suffix set.
  • Mutual-exclusion check at canonical_decl.py:219-236 faithfully implements the schema's allOf.not clause (product-format-declaration.json:118-138); params required at L132-145 matches required: ["format_kind", "params"] (L6-9).
  • model_copy(deep=True) at registry.py:139 — multi-tenant isolation via Pydantic v2 deep-copy. lru_cache on the parsed registry, fresh copy per caller.
  • 29 new public exports are purely additive vs origin/main — none renamed, none removed. feat: is the correct semver signal.
  • tests/test_import_layering.py:55 allowlist scopes correctly to canonical_decl.py only; advisory.py / projection.py / registry.py / format_options.py all import via adcp.types. code-reviewer-deep: clean layering.
  • Resolution order at projection.py:140-207 matches registry "Direction of truth" steps 1-4 exactly. ad-tech-protocol-expert: sound-with-caveats; closed-set rejection shape (UNSUPPORTED_FEATURE + details.rejected_value + details.accepted_values) is correct per error.json:93-96.

Follow-ups (non-blocking — file as issues)

  1. _canonicalize_agent_url covers RFC 3986 §6 but not §5.2.4 path dot-segment normalization. format_options.py:36-63 lowercases scheme + host and strips default port — that's two of the four bullets the spec mandates. manifest.json:990 and bundled/content-standards/calibrate-content-request.json:304 both spell out "normalize path dot-segments" as part of the documented /docs/reference/url-canonicalization rule. Failure mode is a false-negative on find_declaration_by_v1_format_id → wrongful UNSUPPORTED_FEATURE. Pipe parts.path through posixpath.normpath (or hand-rolled remove_dot_segments) and add a test pair https://x.example/a/./b/../chttps://x.example/a/c. Percent-encoding casing (RFC 3986 §6.2.2.1) worth aligning while in there.
  2. Credential-suffix list is mirrored from the dispatcher and inherits its gaps. Both lists miss access_key, private_key, authorization, passwd, signature. security-reviewer-deep flagged this at High. Right fix is one follow-up that updates BOTH canonical_decl.py:76-85 and decisioning/dispatch.py:422-431 in lockstep so the gates don't drift — not a PR-841-only patch.
  3. _echo_identifier scrub misses BOM + bidi overrides + zero-width chars. advisory.py:96-104 covers C0/C1/LS/PS but not U+FEFF, U+202A-U+202E, U+2066-U+2069, U+200B-U+200D, U+2060. Same threat model as the existing scrub (operator-tooling log injection through seller-published product_id); add to the codepoint guard list.
  4. _versions_overlap fail-loud asymmetry. registry.py:228-230 silently continues on unparseable RHS (">=foo") while the bare-operator typo path at L219-224 fails loud. Latent until #741 half 2 consumes the registry on the inbound path — Argus's prior pass already deferred this and the reasoning still holds.
  5. FORMAT_DECLARATION_DIVERGENT narrows-check. product-format-declaration.json:59 puts the v2-side check between declaration params and referenced v1 requirements in-scope for v2→v1. PR description already lists it as deferred to #741 half 2; confirm it lands there.
  6. params_as audit. canonical_decl.py:266-282 invokes canonical_type.model_validate(self.params) on seller-controlled data. Codegen models don't typically emit custom validators, but _ergonomic.py / _forward_compat.py patch some types with BeforeValidator coercion — grep those for side-effecting validators before 3.1 GA.

Minor nits (non-blocking)

  1. Duplicate WholesaleFeedEvent / WholesaleFeedWebhook in adcp.types.__all__. src/adcp/types/__init__.py:1201-1202 and :1236-1237. The duplication shows up in the snapshot at tests/fixtures/public_api_snapshot.json:1043-1046. The de-dup pass at c58d403 caught the NotificationConfig duplicate but missed this pair — interesting how snapshot dupes hide right next to the entries you're staring at. Drop the new pair, regenerate snapshot.
  2. canonical_formats/__init__.py module docstring drift. Public-surface block at L17-32 lists project_* / validate_* / find_* / load_default_registry / SdkAdvisory but __all__ (L85-101) also exports RegistryLoadError, V1_TRANSLATABLE, V2ToV1Projection, glob_match, structural_match, find_declaration_by_v1_format_id, make_sdk_advisory. Add them.
  3. CanonicalFormatHtml5Banner doesn't match the discriminator value. The wire value is html5 and the canonical schema is html5.json. ad-tech-protocol-expert: not wire-divergent (class names aren't on the wire) but worth aligning to CanonicalFormatHtml5 before 3.1 GA — public-rename is cheap pre-GA, expensive after.
  4. _canonicalize_agent_url exception scoping. format_options.py:51-63 wraps urlsplit in try/except ValueError, but parts.port at L59 is what actually raises on bad authority like https://x:abc. Unreachable today via FormatId.agent_url: AnyUrl upstream validation, but move port = parts.port inside the try to keep the function's "non-throwing on malformed input" docstring honest.

Safe to merge.



Three findings from the inline code-quality bot review were blocking
the admin merge per the repo's "all comments must be resolved" rule.

## Empty except in registry packaged-bundle lookup

``_read_registry_json`` swallows ``ModuleNotFoundError``,
``FileNotFoundError``, ``OSError`` from the packaged ``_schemas/``
lookup so dev installs can fall through to the schema-cache
walk-up. Added an explanatory comment documenting that the
``pass`` is intentional and what the fallback path is, so static
analysis (and future readers) know the bare ``except`` isn't
hiding a bug.

## SDK_ID listed in __all__ but not module-level

Replaced the lazy ``__getattr__`` dispatch with a plain module-level
``SDK_ID = _resolve_sdk_id()`` assignment in
``canonical_formats/advisory.py``. The lazy form was over-engineered
— ``_resolve_sdk_id`` is already ``@lru_cache``'d, and the
``importlib.metadata`` lookup runs at module-import time anyway in
practice (no setuptools-scm late-resolution path actually flows
through this module). The eager assignment is identical in cost
once and satisfies CodeQL's explicit-export check.

``canonical_formats/__init__.py`` now imports ``SDK_ID`` from
``advisory`` like any normal name; the ``__getattr__`` shim is
removed.

## Tests

89 canonical-formats tests pass. The existing public-import test
(``test_sdk_id_reachable_via_public_package_path``) still exercises
the documented public path; just via a real module-level binding now.

Refs: #741, #841
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. The CodeQL-bot cleanup at ca09d4b1 swaps the lazy __getattr__ SDK_ID dispatch for an eager SDK_ID: str = _resolve_sdk_id() at src/adcp/canonical_formats/advisory.py:110lru_cache(maxsize=1)'d, no setuptools-scm late-resolution path flowing through this module, no import-chain risk. Strictly safer than the __getattr__ shim, and CodeQL's __all__ introspection check goes quiet.

Things I checked

  • ca09d4b1 eager SDK_ID is total. _pkg_version("adcp") runs once at module-import; PackageNotFoundError → "0.0.0-dev" fallback covers dev installs. advisory.py only imports Error/Recovery/Source from adcp.types, which is fully loaded by the time adcp.canonical_formats initializes — no circular risk. security-reviewer: re-entrancy risk on the prior lazy form is now gone.
  • Resolution order at src/adcp/canonical_formats/projection.py:115-207 matches the 4-step "Direction of truth (normative)" section in registries/v1-canonical-mapping.json exactly. ad-tech-protocol-expert: SOUND.
  • V1_TRANSLATABLE table at projection.py:58-72 verified against schemas/cache/3.1.0-beta.3/formats/canonical/*.json — four explicit False overrides (agent_placement, sponsored_placement, responsive_creative, image_carousel) plus custom=False; remaining 8 inherit True from _base.json. Enum-coverage test pins it against drift.
  • Hand-rolled ProductFormatDeclaration at src/adcp/types/canonical_decl.py:122-264 carries all 9 schema properties plus discriminator + open params. allOf.not mutual-exclusion enforced at _check_mutual_exclusion. Credential-shaped-key gate at :238-264 mirrors the dispatcher byte-for-byte.
  • Credential-suffix list parity: canonical_decl.py:76-85decisioning/dispatch.py:422-431 are identical 8-suffix tuples. security-reviewer: CLEAN.
  • Multi-tenant registry isolation at registry.py:91-145: _load_registry_uncopied lru_cache'd; load_default_registry returns model_copy(deep=True). Tenant A mutation cannot reach tenant B's view.
  • sdk_id format at advisory.py:47,71: adcontextprotocol-adcp-python@<version> matches core/error.json:117 spec example verbatim.
  • Closed-set rejection at format_options.py:87-114: UNSUPPORTED_FEATURE + details.rejected_value + details.accepted_values is the wire-correct shape per error.json:95.
  • _echo_identifier control-char scrub at advisory.py:74-107: covers C0 + DEL/C1 + U+2028/U+2029. Length cap fires post-escape (so escape expansion stays bounded).
  • Public-API audit: 29 net-new exports, purely additive. feat(canonical-formats): without ! is correct — no removals, no required→optional flips, no enum-value removals.
  • Layering: canonical_decl.py added to tests/test_import_layering.py:55 ALLOWED_FILES with documented rationale. canonical_formats/* modules import via adcp.types (verified).
  • code-reviewer-deep: SOUND-WITH-CAVEATS, no blockers. ad-tech-protocol-expert-deep: SOUND-WITH-CAVEATS. security-reviewer-deep: LOW, no must-fix.

Follow-ups (non-blocking — file as issues)

  • Test docstring drift from ca09d4b1. tests/test_canonical_formats_public_api.py:75-80 still claims SDK_ID "is resolved lazily via module __getattr__." The shim is gone; the assertion still passes, but the comment lies about the path under test. Update or drop the second sentence.
  • allOf[0] not enforced on ProductFormatDeclaration. schemas/cache/3.1.0-beta.3/core/product-format-declaration.json:66-117 requires format_kind=custom to carry format_shape + format_schema AND (canonical_formats_only=true OR v1_format_ref), and non-custom kinds MUST NOT carry format_shape/format_schema. The hand-rolled class accepts wire-invalid combinations. Same family as the existing allOf.not validator at canonical_decl.py:219-236 — bundle with #741 part 2.
  • Path dot-segment normalization in _canonicalize_agent_url (format_options.py:36-63). RFC 3986 §5.2.4 — manifest.json:990 enumerates it as part of the canonicalization rule. Failure mode is over-strict identity matching on URLs with /./ or /../ segments; v1 catalog URLs in practice don't carry these, so latent.
  • Bidi / zero-width / BOM scrub in _echo_identifier (advisory.py:96-107). C0/C1/LS/PS covered; U+FEFF, U+202A-U+202E, U+2066-U+2069, U+200B-U+200D, U+2060 are not. Operator tooling that renders advisories in a terminal/UI (Slack, Sentry, log viewers) honors bidi/ZW for visual spoofing. security-reviewer: LOW — length cap still bounds blast radius and no credential exfil path.
  • _walk_for_credential_keys cycle guard (canonical_decl.py:94-116). Not wire-reachable (JSON has no cycles); reachable only via SDK-internal programmatic constructor. Latent RecursionError. ~3 lines of seen: set[int] closes it. Argus-prior pass already deferred this.
  • Carry-forward from prior Argus passes: _versions_overlap fail-loud on ~/^ is latent until #741 part 2 wires the v1 → v2 inbound consumer; FORMAT_DECLARATION_DIVERGENT narrowing check between params and referenced v1 requirements is on the half-2 list.

Minor nits (non-blocking)

  1. WholesaleFeedEvent / WholesaleFeedWebhook still duplicated in adcp.types.__all__ at src/adcp/types/__init__.py:1201-1202 and :1236-1237. The duplication is baked into the snapshot at tests/fixtures/public_api_snapshot.json:1043-1046. The de-dup pass at c58d4031 and the snapshot regeneration at 202474f4 both saw these and walked past — notable how a snapshot dupe right next to the entries you're staring at hides better than a hidden one.
  2. tests/test_import_layering.py:2-7 module docstring still names "aliases.py, _ergonomic.py, _generated.py, and the public adcp.types/__init__.py" — ALLOWED_FILES is now 7 entries deep. Refresh the docstring when _versions_overlap does.

Safe to merge.

@bokelley bokelley merged commit b455e2a into main May 24, 2026
23 checks passed
@bokelley bokelley deleted the bokelley/issue-741-status branch May 24, 2026 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant