fix: retry transient network errors during OAuth token refresh#987
Draft
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Draft
fix: retry transient network errors during OAuth token refresh#987devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Conversation
Add ConnectionError, ConnectTimeout, and ReadTimeout to the backoff retry decorator in _make_handled_request so transient network failures during OAuth token refresh are automatically retried. When retries are exhausted, wrap the exception in AirbyteTracedException with FailureType.transient_error and a clear user-facing message instead of exposing raw Python exception class names. Also wrap generic exceptions in _make_handled_request with AirbyteTracedException and FailureType.system_error for consistent structured error reporting. Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1776051256-fix-oauth-connection-error-retry#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1776051256-fix-oauth-connection-error-retryPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
The test_cloud_read_with_private_endpoint test asserted the old generic Exception message. Updated to match the new AirbyteTracedException message from _make_handled_request. Co-Authored-By: bot_apk <apk@cognition.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The CDK's OAuth token refresh mechanism (
_make_handled_requestinabstract_oauth.py) did not retry on transient network errors likeConnectionError,ConnectTimeout, orReadTimeout. The@backoff.on_exceptiondecorator only retried onDefaultBackoffException(raised for HTTP 429/5xx responses). Transient network errors — such asConnectionResetErrorduring SSL handshake — fell through the exception handler and were re-raised without retry, producing a generic message like:This was classified as
system_errorinstead oftransient_error, which could prevent automatic retry at the platform level and cause unnecessary oncall alerts.Changes
Added transient network exceptions to the backoff retry decorator —
requests.exceptions.ConnectionError,ConnectTimeout, andReadTimeoutare now retried with exponential backoff (up to 300s), consistent withTRANSIENT_EXCEPTIONSinrate_limiting.pywhich already handles these for regular HTTP stream requests.Wrapped exhausted-retry network errors in
AirbyteTracedException— When retries are exhausted,refresh_access_tokencatches the transient exception and wraps it inAirbyteTracedExceptionwithFailureType.transient_errorand a clear user-facing message:"OAuth access token refresh request failed due to a network error."Wrapped generic exceptions with
AirbyteTracedException— The catch-allexcept Exceptionblock in_make_handled_requestnow raisesAirbyteTracedExceptionwithFailureType.system_errorinstead of a bareException, ensuring consistent structured error reporting.Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16179
Review & Testing Checklist for Human
refresh_access_tokenwrapper correctly catches only transient network exceptions (not allRequestExceptionsubclasses)except Exception→AirbyteTracedException(system_error)change in_make_handled_requestdoesn't alter behavior for existing exception types that should propagate differentlysource-hubspot) and verify token refresh still works normallyConnectionErrorduring token refresh to confirm retry + proper error messageNotes
source-hubspotrate_limiting.pyline 19-25 which definesTRANSIENT_EXCEPTIONSincludingConnectionErrorfor regular HTTP stream requestsLink to Devin session: https://app.devin.ai/sessions/c8765b364f2d49c6a3d6e94361bba420