Skip to content

bug(adagents): cafemedia.com (Raptive AAO file) blocks SDK with HTTP 403 — Cloudflare challenge on httpx client fingerprint #801

@bokelley

Description

@bokelley

Symptom

Calling fetch_adagents(\"cafemedia.com\") against the largest known production AAO file (2.6 MB, declares 6,843 properties spanning 6,800 publisher_domains, sole sales agent: https://interchange.io) raises:

AdagentsValidationError: Failed to fetch adagents.json: HTTP 403

Cloudflare returns cf-mitigated: challenge — the request is being scored as automated traffic.

Not a User-Agent issue

Initial hypothesis was that the AdCP-Client/1.0 UA was getting blocked. Tested every reasonable UA value via httpx.AsyncClient:

UA httpx result
AdCP-Client/1.0 (SDK default) 403, cf-mitigated=challenge
AdCP-Client/5.7.0 403, cf-mitigated=challenge
Mozilla/5.0 403, cf-mitigated=challenge
curl/8.4.0 403, cf-mitigated=challenge
python-httpx/0.27 403, cf-mitigated=challenge

All five UAs → 403 from httpx. Meanwhile curl works regardless of UA:

UA curl result
curl default 200
AdCP-Client/1.0 200
Mozilla/5.0 200
empty UA 200

So the discriminator isn't the User-Agent header. It's the client-level fingerprint — TLS ClientHello (ciphers, ALPN, extension order), HTTP/2 SETTINGS frame, header order, or some combination Cloudflare scores as bot-like for httpx specifically.

Reproduction

import asyncio, httpx

async def main():
    async with httpx.AsyncClient(follow_redirects=True) as c:
        r = await c.get(
            \"https://cafemedia.com/.well-known/adagents.json\",
            headers={\"User-Agent\": \"AdCP-Client/1.0\"},
            timeout=15,
        )
        print(r.status_code, dict(r.headers).get(\"cf-mitigated\"))

asyncio.run(main())
# 403 challenge
curl -sI -A 'AdCP-Client/1.0' https://cafemedia.com/.well-known/adagents.json | head -1
# HTTP/2 200

Impact

Cafemedia's adagents.json is the canonical production reference for managed-network delegation (~6,800 publisher properties under Raptive). It is structurally exemplary — uses spec-correct publisher_properties with selection_type: by_tag, #750 (5.7.0) resolves it correctly to 6,843 properties when given the parsed data.

But every sales agent using this SDK fails to fetch the file at all — they get 403 before resolution can run. Hit while validating 5.7.0 end-to-end for the Prebid Sales Agent (prebid/salesagent#511) onboarding Raptive.

Possible fixes

Three places this could be addressed; not opining on which is right — the maintainers know the SDK design constraints better than I do:

  1. Coordinate with Raptive / Cloudflare to allowlist the SDK. Right thing semantically — adagents.json is public, intended for crawlers. A WAF rule that blocks programmatic fetches of /.well-known/adagents.json defeats the whole point. Raptive ops contact: adops@raptive.com (from cafemedia's adagents.json contact block).
  2. Switch fetch transport to a fingerprint-impersonating client like curl_cffi or [httpx with custom TLS context mimicking browser ClientHello]. Heavier dependency, but the only way the SDK reliably fetches behind aggressive bot management.
  3. Emit a better error. At minimum, when status is 403 + cf-mitigated: challenge, raise a typed AdagentsBlockedByBotMitigationError with the publisher contact and a one-line remediation hint, so downstream UIs can surface "ask publisher to allowlist AAO crawlers" instead of "HTTP 403".

Option 1 is the long-term right answer; options 2/3 are defensive. They aren't mutually exclusive.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions