Symptom
Calling fetch_adagents(\"cafemedia.com\") against the largest known production AAO file (2.6 MB, declares 6,843 properties spanning 6,800 publisher_domains, sole sales agent: https://interchange.io) raises:
AdagentsValidationError: Failed to fetch adagents.json: HTTP 403
Cloudflare returns cf-mitigated: challenge — the request is being scored as automated traffic.
Not a User-Agent issue
Initial hypothesis was that the AdCP-Client/1.0 UA was getting blocked. Tested every reasonable UA value via httpx.AsyncClient:
| UA |
httpx result |
AdCP-Client/1.0 (SDK default) |
403, cf-mitigated=challenge |
AdCP-Client/5.7.0 |
403, cf-mitigated=challenge |
Mozilla/5.0 |
403, cf-mitigated=challenge |
curl/8.4.0 |
403, cf-mitigated=challenge |
python-httpx/0.27 |
403, cf-mitigated=challenge |
All five UAs → 403 from httpx. Meanwhile curl works regardless of UA:
| UA |
curl result |
| curl default |
200 |
AdCP-Client/1.0 |
200 |
Mozilla/5.0 |
200 |
| empty UA |
200 |
So the discriminator isn't the User-Agent header. It's the client-level fingerprint — TLS ClientHello (ciphers, ALPN, extension order), HTTP/2 SETTINGS frame, header order, or some combination Cloudflare scores as bot-like for httpx specifically.
Reproduction
import asyncio, httpx
async def main():
async with httpx.AsyncClient(follow_redirects=True) as c:
r = await c.get(
\"https://cafemedia.com/.well-known/adagents.json\",
headers={\"User-Agent\": \"AdCP-Client/1.0\"},
timeout=15,
)
print(r.status_code, dict(r.headers).get(\"cf-mitigated\"))
asyncio.run(main())
# 403 challenge
curl -sI -A 'AdCP-Client/1.0' https://cafemedia.com/.well-known/adagents.json | head -1
# HTTP/2 200
Impact
Cafemedia's adagents.json is the canonical production reference for managed-network delegation (~6,800 publisher properties under Raptive). It is structurally exemplary — uses spec-correct publisher_properties with selection_type: by_tag, #750 (5.7.0) resolves it correctly to 6,843 properties when given the parsed data.
But every sales agent using this SDK fails to fetch the file at all — they get 403 before resolution can run. Hit while validating 5.7.0 end-to-end for the Prebid Sales Agent (prebid/salesagent#511) onboarding Raptive.
Possible fixes
Three places this could be addressed; not opining on which is right — the maintainers know the SDK design constraints better than I do:
- Coordinate with Raptive / Cloudflare to allowlist the SDK. Right thing semantically — adagents.json is public, intended for crawlers. A WAF rule that blocks programmatic fetches of
/.well-known/adagents.json defeats the whole point. Raptive ops contact: adops@raptive.com (from cafemedia's adagents.json contact block).
- Switch fetch transport to a fingerprint-impersonating client like
curl_cffi or [httpx with custom TLS context mimicking browser ClientHello]. Heavier dependency, but the only way the SDK reliably fetches behind aggressive bot management.
- Emit a better error. At minimum, when status is 403 +
cf-mitigated: challenge, raise a typed AdagentsBlockedByBotMitigationError with the publisher contact and a one-line remediation hint, so downstream UIs can surface "ask publisher to allowlist AAO crawlers" instead of "HTTP 403".
Option 1 is the long-term right answer; options 2/3 are defensive. They aren't mutually exclusive.
References
Symptom
Calling
fetch_adagents(\"cafemedia.com\")against the largest known production AAO file (2.6 MB, declares 6,843 properties spanning 6,800 publisher_domains, sole sales agent:https://interchange.io) raises:Cloudflare returns
cf-mitigated: challenge— the request is being scored as automated traffic.Not a User-Agent issue
Initial hypothesis was that the
AdCP-Client/1.0UA was getting blocked. Tested every reasonable UA value viahttpx.AsyncClient:AdCP-Client/1.0(SDK default)AdCP-Client/5.7.0Mozilla/5.0curl/8.4.0python-httpx/0.27All five UAs → 403 from httpx. Meanwhile curl works regardless of UA:
AdCP-Client/1.0Mozilla/5.0So the discriminator isn't the User-Agent header. It's the client-level fingerprint — TLS ClientHello (ciphers, ALPN, extension order), HTTP/2 SETTINGS frame, header order, or some combination Cloudflare scores as bot-like for httpx specifically.
Reproduction
Impact
Cafemedia's adagents.json is the canonical production reference for managed-network delegation (~6,800 publisher properties under Raptive). It is structurally exemplary — uses spec-correct
publisher_propertieswithselection_type: by_tag, #750 (5.7.0) resolves it correctly to 6,843 properties when given the parsed data.But every sales agent using this SDK fails to fetch the file at all — they get 403 before resolution can run. Hit while validating 5.7.0 end-to-end for the Prebid Sales Agent (prebid/salesagent#511) onboarding Raptive.
Possible fixes
Three places this could be addressed; not opining on which is right — the maintainers know the SDK design constraints better than I do:
/.well-known/adagents.jsondefeats the whole point. Raptive ops contact:adops@raptive.com(from cafemedia's adagents.jsoncontactblock).curl_cffior [httpxwith custom TLS context mimicking browser ClientHello]. Heavier dependency, but the only way the SDK reliably fetches behind aggressive bot management.cf-mitigated: challenge, raise a typedAdagentsBlockedByBotMitigationErrorwith the publisher contact and a one-line remediation hint, so downstream UIs can surface "ask publisher to allowlist AAO crawlers" instead of "HTTP 403".Option 1 is the long-term right answer; options 2/3 are defensive. They aren't mutually exclusive.
References
adcp/adagents.py:1018-1097(_fetch_adagents_url)