[Backport 7.78.x] Fix pysnmp 7.x API compatibility and asyncio event-loop race condition by dd-octo-sts[bot] · Pull Request #23158 · DataDog/integrations-core

dd-octo-sts · 2026-04-03T15:07:40Z

Backport 7f3c485 from #23125.

Summary

Companion to #23124. Fixes all breakage introduced by the pysnmp 5.x → 7.x upgrade:

Migrates imports and call sites to the renamed 7.x API (sendVarBinds → send_varbinds, runDispatcher → run_dispatcher, etc.)
Fixes varBindTable handling — pysnmp 7.x returns a flat list of (OID, val) pairs instead of nested rows
Fixes the event loop never stopping after an SNMP response (callbacks now call loop.stop())
Fixes MIB sharing, constraint enforcement, and thread event-loop setup in the discovery thread
Fixes asyncio event-loop race condition in discovery worker threads: pysnmp 7.x's AsyncioDispatcher captures asyncio.get_event_loop() at construction time. When multiple ThreadPoolExecutor workers shared the same loop, run_dispatcher() returned immediately on all but the first worker (loop.is_running() == True), leaving ctx={} and raising KeyError('error'). Each worker now creates its own isolated event loop and re-creates the SnmpEngine within that context.

Motivating issue: VULN-59246

#23125) * Bump pyasn1 from 0.4.8 to 0.6.3 to fix CVE-2026-30922 CVE-2026-30922 (CVSS 7.5 High): pyasn1 prior to 0.6.3 is vulnerable to DoS via unbounded recursion when decoding deeply nested ASN.1 structures. The fix introduces MAX_NESTING_DEPTH=100. VULN-59246 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add changelog entry for pyasn1 bump (PR #23095) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update license metadata for pyasn1 0.6.3; fix orjson license parsing - pyasn1 0.6.3 changed from BSD-3-Clause to BSD-2-Clause - Add orjson override to .ddev/config.toml to handle its compound MPL-2.0 AND (Apache-2.0 OR MIT) license expression Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Bump pysnmp 5.1.0 to 7.1.22 to fix ModuleNotFoundError with pyasn1 0.6.3 pyasn1 0.6.x removed the `compat` module; pysnmp 5.x imported from it (pysnmp/debug.py: `from pyasn1.compat.octets import octs2ints`). pysnmp 7.1.22 drops this import and explicitly supports pyasn1 0.6.3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update pysnmp imports to 7.x API pysnmp 7.x removed the asyncore backend; update import paths: - CommunityData and friends: pysnmp.hlapi → pysnmp.hlapi.v3arch.asyncio - lcd singleton: pysnmp.hlapi.asyncore.cmdgen → instantiate from pysnmp.hlapi.v3arch.asyncio.lcd.CommandGeneratorLcdConfigurator - vbProcessor singleton: pysnmp.hlapi.asyncore.cmdgen → instantiate from pysnmp.hlapi.varbinds.CommandGeneratorVarBinds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix import ordering after pysnmp 7.x API migration Move module-level singleton instantiations (lcd, vbProcessor) to after all import statements to satisfy E402 lint rule. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix pysnmp 7.x API incompatibilities pysnmp 7.x removed the re-export of auth constants and ObjectType from pysnmp.hlapi; use pysnmp.hlapi.v3arch.asyncio as the hlapi module alias so getattr(hlapi, protocol_name) and hlapi.ObjectType still work. UdpTransportTarget.create() is now async-only in 7.x; resolve the transport address synchronously via socket.getaddrinfo() to preserve the synchronous register_device_target() call path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix pysnmp 7.x renamed attributes - CommunityData/UsmUserData: mpModel → message_processing_model - SnmpEngine: getMibBuilder() → get_mib_builder() - MibSource: getMibSources() → get_mib_sources(), fullPath() → full_path() - CommandGeneratorVarBinds: getMibViewController() → get_mib_view_controller() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Migrate commands.py to pysnmp 7.x API Rename makeVarBinds/unmakeVarBinds/sendVarBinds to snake_case variants. Add per-engine WeakKeyDictionary cache since make_varbinds/unmake_varbinds now expect a Dict rather than a SnmpEngine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix pysnmp 7.x event loop never stopping after SNMP response In pysnmp 7.x, run_dispatcher() calls loop.run_forever() which does not self-terminate when pending requests finish (unlike pysnmp 5.x asyncore). Stop the loop explicitly in each callback and use the non-deprecated transport_dispatcher.run_dispatcher() API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix pysnmp 7.x varBindTable format and ensure loop always stops pysnmp 7.x changed the GETNEXT/BULK callback varBindTable from a list of rows to a flat list of (OID, val) pairs. The old row-iteration logic raised exceptions before loop.stop() could be called, causing asyncio to swallow the exception and leave run_forever() running indefinitely. Fix varBindTable processing for getnext and bulk callbacks, and wrap loop.stop() in finally blocks so the event loop always exits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix pysnmp 7.x compatibility: MIB sharing, constraint enforcement, thread event loop - commands.py: Pre-populate engine cache with a MibViewController backed by the engine's own MibBuilder so vbProcessor.make_varbinds and the OIDResolver share the same MIB namespace. Without this, MIBs loaded during make_varbinds (TCP-MIB etc.) were invisible to the resolver when lookup_mib=False, causing metrics like tcpRtoAlgorithm to resolve to generic mib-2 entries instead of their names. - commands.py: pysnmp 7.x bypasses constraint checking for received SimpleAsn1Type values in resolve_with_mib (via direct _value assignment). Restore the original enforce_mib_constraints behaviour by explicitly cloning through the MIB syntax after resolution; this raises SmiError/ValueConstraintError for out-of-range values. Capture callback exceptions in cbCtx so they survive the asyncio event loop and get re-raised to the caller after run_dispatcher() returns. - discovery.py: Worker threads have no asyncio event loop by default in Python 3.10+. pysnmp 7.x needs one at transport construction time (UdpTransport.__init__ calls asyncio.get_event_loop()). Create and set a new loop at the top of discover_instances() so all SNMP operations in the thread work correctly. - mibs.py: Use the non-deprecated add_mib_sources() instead of addMibSources(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix asyncio event-loop race condition in SNMP discovery workers pysnmp 7.x's AsyncioDispatcher captures asyncio.get_event_loop() at construction time. When multiple ThreadPoolExecutor workers share the same event loop (L_main or L_discovery), run_dispatcher() returns immediately on all but the first worker (loop.is_running() == True), leaving ctx={} empty. A subsequent ctx['error'] access then raises KeyError, surfaced as "Failed to collect metrics - 'error'". Additionally, abandoned pending requests leave stale timer callbacks on the shared loop that fire during subsequent checks, stopping run_forever() prematurely before the SNMP response callback fires. Fix: give each worker thread its own asyncio event loop and re-create the SnmpEngine within that context (so AsyncioDispatcher captures the worker-local loop). Pass isolated_loop=True from the executor submit call; the non-discovery single-device path is unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Close per-worker asyncio event loop after each device check Each _check_device(isolated_loop=True) call creates a fresh asyncio loop but never closed it, causing file descriptors and handles to accumulate in reused ThreadPoolExecutor threads across check intervals. Add an outer try/finally that cancels any pending tasks (pysnmp's handle_timeout coroutine), waits for them to handle CancelledError, closes the loop, and clears the thread's current-loop reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address code-review issues in pysnmp 7.x compatibility layer - commands.py: protect _engine_cache with a threading.Lock to eliminate the check-then-set race on the WeakKeyDictionary under concurrent workers - commands.py: use ctx.get('error') in _handle_error so a missing key (callback never fired) raises a clear error instead of a bare KeyError - commands.py: add except/re-raise to snmp_getnext and snmp_bulk callbacks matching the snmp_get pattern; unmake_varbinds exceptions previously produced an opaque KeyError on ctx['var_bind_table'] instead of the underlying SmiError - snmp.py: shadow config with copy.copy() before reinitialize_engine() so the shared InstanceConfig in discovered_instances is never mutated by worker threads; each worker gets its own _snmp_engine and device - discovery.py: wrap the discovery while-loop in try/finally to close the event loop when the thread exits, releasing UDP socket FDs accumulated during network scans Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix changelog entry: use PR #23125 instead of #23095 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Restore previous event loop instead of setting to None after cleanup asyncio.set_event_loop(None) marks _set_called=True on the policy, so any subsequent asyncio.get_event_loop() in the same thread raises RuntimeError even in the main thread (Python 3.12+). This broke tests that call discover_instances() directly in the main thread (e.g. test_discovery_tags), leaving all subsequent tests unable to create an SnmpCheck / InstanceConfig. Save and restore the caller's event loop (or None if there wasn't one) instead of unconditionally clearing it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add clarifying comments to _check_device event loop management Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Trim verbose comments to single lines per project style Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Drain accumulated pysnmp tasks between host scans in discover_instances pysnmp 7.x schedules handle_timeout as an asyncio task. When a response arrives first, loop.stop() is called by the response callback but the handle_timeout task remains pending. On the next host's run_forever() call, the stale task fires immediately and calls loop.stop() before the new SNMP request completes, leaving ctx={} and causing rapid KeyError looping (observed as write_persistent_cache called 50k+ times in test_cache_building). Cancel and drain all pending tasks after each fetch_sysobject_oid call so the loop is clean before scanning the next host. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 7f3c485)

datadog-datadog-prod-us1 · 2026-04-06T19:32:04Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: d7551bb | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

codecov · 2026-04-06T19:52:56Z

Codecov Report

❌ Patch coverage is 92.33716% with 20 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (7.78.x@3b3b36b). Learn more about missing BASE report.

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dd-octo-sts bot added bot backport labels Apr 3, 2026

dd-octo-sts bot requested review from a team as code owners April 3, 2026 15:07

dd-octo-sts bot requested a review from alpatriciodd April 3, 2026 15:07

datadog-agent-integrations-bot bot added integration/snmp team/agent-integrations team/network-device-monitoring-core labels Apr 3, 2026

Kyle-Neale changed the base branch from 7.78.x to backport-23124-to-7.78.x April 3, 2026 15:30

Kyle-Neale changed the base branch from backport-23124-to-7.78.x to 7.78.x April 3, 2026 15:31

steveny91 approved these changes Apr 6, 2026

View reviewed changes

temporal-github-worker-1 bot added agent/approved ecosystems/review-requested product/review-requested labels Apr 6, 2026

Merge branch '7.78.x' into backport-23125-to-7.78.x

d7551bb

dkirov-dd merged commit c4acd40 into 7.78.x Apr 7, 2026
28 of 30 checks passed

dkirov-dd deleted the backport-23125-to-7.78.x branch April 7, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport 7.78.x] Fix pysnmp 7.x API compatibility and asyncio event-loop race condition#23158

[Backport 7.78.x] Fix pysnmp 7.x API compatibility and asyncio event-loop race condition#23158
dkirov-dd merged 2 commits into7.78.xfrom
backport-23125-to-7.78.x

dd-octo-sts bot commented Apr 3, 2026

Uh oh!

datadog-datadog-prod-us1 bot commented Apr 6, 2026 •

edited by datadog-official bot

Loading

Uh oh!

codecov bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dd-octo-sts bot commented Apr 3, 2026

Summary

Uh oh!

datadog-datadog-prod-us1 bot commented Apr 6, 2026 • edited by datadog-official bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

datadog-datadog-prod-us1 bot commented Apr 6, 2026 •

edited by datadog-official bot

Loading