feat: match substring on known public aliases#104786
Conversation
| sentry_sdk.set_tag( | ||
| "num_attrs_without_intersecting_filter", | ||
| len(attrs_without_intersecting_filter.attributes), | ||
| ) |
There was a problem hiding this comment.
Bug: Debug code doubles RPC calls in production
The code block labeled as "debug code" makes an additional attribute_names_rpc call whenever a query_filter is present. This effectively doubles the RPC load for requests with query filters, increasing latency and backend load in production. The comment explicitly identifies this as debug code for checking the intersecting filter behavior, suggesting it may have been unintentionally left in the production codebase.
| if attr.name in additional_substring_matches: | ||
| # dedupe additional known attrs on the first offset | ||
| if offset == 0: | ||
| additional_substring_matches.remove(attr.name) | ||
| # we've already shown this attr in the first offset, so | ||
| # don't show it again | ||
| else: | ||
| continue |
There was a problem hiding this comment.
Bug: The additional_substring_matches set is not reset across paginated calls, causing attributes that appear on pages other than the first to be incorrectly skipped.
Severity: MEDIUM | Confidence: High
🔍 Detailed Analysis
The deduplication logic for substring-matched attributes has a flaw related to pagination. The additional_substring_matches set is initialized once. On the first page (offset == 0), matching attributes are correctly processed and removed from the set. However, on subsequent pages (offset > 0), any attribute that is still in the set (because it didn't appear on page 1) is incorrectly skipped via a continue statement. This leads to missing attributes in the final results if they don't appear on the first page of the paginated response.
💡 Suggested Fix
The logic for handling additional_substring_matches should be re-evaluated. One option is to not skip attributes on subsequent pages, but instead perform the removal and add them to the results, ensuring consistent behavior across all pages. Alternatively, collect all attributes from all pages first before performing this deduplication logic.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/api/endpoints/organization_trace_item_stats.py#L196-L203
Potential issue: The deduplication logic for substring-matched attributes has a flaw
related to pagination. The `additional_substring_matches` set is initialized once. On
the first page (`offset == 0`), matching attributes are correctly processed and removed
from the set. However, on subsequent pages (`offset > 0`), any attribute that is still
in the set (because it didn't appear on page 1) is incorrectly skipped via a `continue`
statement. This leads to missing attributes in the final results if they don't appear on
the first page of the paginated response.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7207400
| # we've already shown this attr in the first offset, so | ||
| # don't show it again | ||
| else: | ||
| continue |
There was a problem hiding this comment.
Bug: Pagination skips known attributes appearing on later pages
The additional_substring_matches set is recreated fresh on each HTTP request (pagination call). When offset > 0, any RPC attribute whose internal_name exists in this set gets skipped via continue. Since the set is never modified for offset > 0, legitimate attributes that happen to match known public aliases will be excluded from page 2 onwards. For example, if sentry.op appears in the RPC response on page 2, it would be skipped because it's in additional_substring_matches.
Additional Locations (1)
Abdkhan14
left a comment
There was a problem hiding this comment.
Unblocking for testing, a lot of cursor comments to take a look at
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #104786 +/- ##
========================================
Coverage 80.47% 80.47%
========================================
Files 9363 9363
Lines 401849 401867 +18
Branches 25833 25833
========================================
+ Hits 323389 323420 +31
+ Misses 78018 78005 -13
Partials 442 442 |
we only care about known public aliases in the first offset, so comments about subsequent pagination doesn't apply |
Additionally:
removes intersecting filter from the ranked request because I'm not quite certain how it works.
adds some tags to see how the intersecting filter might impact the number of attrs returned