Add Bugsnag error grouping with stable normalized keys#234
Merged
Conversation
This was referenced May 9, 2026
Collaborator
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced May 9, 2026
754d49c to
148b0ab
Compare
Ark-kun
reviewed
May 12, 2026
| import re | ||
|
|
||
| _POD_NAME_PATTERN = re.compile(r"task-[a-zA-Z0-9]+-[a-zA-Z0-9]+") | ||
| _OBJECT_REPR_PATTERN = re.compile(r"<[^>]+ object at 0x[0-9a-fA-F]+>") |
Contributor
There was a problem hiding this comment.
Eventually, we might want to fix such address strings (if they are not informative).
Ark-kun
reviewed
May 12, 2026
| import json | ||
| import re | ||
|
|
||
| _POD_NAME_PATTERN = re.compile(r"task-[a-zA-Z0-9]+-[a-zA-Z0-9]+") |
Contributor
There was a problem hiding this comment.
Another prefix: tangle-ce-`` Should probably add tangle- too.
Collaborator
Author
There was a problem hiding this comment.
Added coverage for tangle-ce- and tangle- in general
Ark-kun
approved these changes
May 13, 2026
Contributor
Ark-kun
left a comment
There was a problem hiding this comment.
Great idea for noise reduction. Thank you.
148b0ab to
641aba2
Compare
c0ba47b to
d653ac8
Compare
641aba2 to
3ffdb41
Compare
d653ac8 to
5e2e55c
Compare
3ffdb41 to
bd437a4
Compare
5e2e55c to
ca80b1b
Compare
bd437a4 to
9894484
Compare
ca80b1b to
9a21565
Compare
9894484 to
61efdcb
Compare
9a21565 to
c4c2528
Compare
61efdcb to
40b58e6
Compare
c4c2528 to
4d63d7f
Compare
590ae88 to
e742344
Compare
4d63d7f to
41bd6af
Compare
8239e55 to
da294f0
Compare
Collaborator
Author
|
Verified this stack against known consumers ✅ Error grouping is working as intended |
morgan-wowk
commented
May 14, 2026
| key_value = f"{prefix}: {normalized}" if prefix else normalized | ||
| event.add_tab("custom", {_CUSTOM_GROUPING_KEY: key_value}) | ||
| if prefix and event.errors: | ||
| try: |
Collaborator
Author
There was a problem hiding this comment.
For extra safety against potential package changes in the future, this is wrapped in a try catch with a fallback to no prefixing.
Collaborator
Author
Merge activity
|
Introduces error_normalization.py which strips instance-specific values (pod names, IDs, memory addresses, byte offsets) from exceptions so structurally identical errors collapse to one group in Bugsnag. TANGLE_BUGSNAG_CUSTOM_GROUPING_KEY controls the metadata key name — no-op when unset, allowing Shopify deployments to set it without touching OSS code. System errors reported via record_system_error_exception are prefixed with "SYSTEM_ERROR: " for easy filtering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
da294f0 to
f0880fe
Compare
| """Tests for error_normalization module.""" | ||
|
|
||
| import json | ||
| import unittest.mock as mock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Add Bugsnag error grouping with stable normalized keys
Introduces configurable error grouping so structurally identical exceptions collapse into a single group rather than creating a new entry per unique pod name, UUID, or memory address.
How it works
A new
TANGLE_BUGSNAG_CUSTOM_GROUPING_KEYenv var controls the metadata key name written on each Bugsnag event. When unset the feature is a complete no-op. When set (by a deployment), every notified exception gets acustom[<key>]tab containing a normalized string derived from the exception type and message.System errors reported through
record_system_error_exceptionare additionally prefixed withSYSTEM_ERROR:so they can be filtered or grouped separately from non-system errors.Error taxonomy
The following exception types, from one consumer use case, are normalized to stable grouping keys:
kubernetes ApiException (404): NotFound: pods "{pod}" not foundkubernetes ApiException (400): BadRequest: container "main" in pod {pod} is terminatedkubernetes ApiException (400): BadRequest: container "main" in pod {pod} is waiting to start: PodInitializingkubernetes ApiException (400): BadRequest: container "main" in pod {pod} is not availablekubernetes ApiException (500): InternalError: failed calling webhook "<>": context deadline exceededUnicodeDecodeError: 'utf-8' codec can't decode byte at position {n}MaxRetryError: k8s connection pool max retries exceeded (ReadTimeoutError)OrchestratorError: Unexpected running container status: {object}ExceptionType: {message with addresses/UUIDs/IDs stripped}Many exception types (e.g.
AttributeError,sqlalchemy.exc.OperationalError) already produce stable messages and pass through the fallback unchanged.Changes
error_normalization.py(new) — one public functionnormalize_error_message(*, exception)dispatching to type-specific handlers before falling back to a generic stripper that removes hex addresses, UUIDs, and long alphanumeric IDsbugsnag_instrumentation.py— readsTANGLE_BUGSNAG_CUSTOM_GROUPING_KEY;_before_notifyattaches the normalized key when configured; supports an optionalgrouping_prefixpassed throughnotify(**metadata)orchestrator_sql.py—record_system_error_exceptionpassesgrouping_prefix="SYSTEM_ERROR"so system errors are visually distincttest_error_normalization.py(new) — 15 unit tests covering all error groups and the fallback pathOSS note
The grouping key name is not hardcoded — it is supplied entirely via
TANGLE_BUGSNAG_CUSTOM_GROUPING_KEYat deploy time, so no internal platform names appear in OSS code.