feat(attribute-distributions): parallelize stats query#104113
feat(attribute-distributions): parallelize stats query#104113shruthilayaj merged 6 commits intomasterfrom
Conversation
44c5aa5 to
5b6b8fa
Compare
508a20b to
3ccc2e8
Compare
5a5b131 to
02db9ee
Compare
| attrs_response = snuba_rpc.attribute_names_rpc(attrs_request) | ||
|
|
||
| # Chunk attributes and run stats query in parallel | ||
| chunked_attributes: dict[int, list[AttributeKey]] = defaultdict(list[AttributeKey]) |
There was a problem hiding this comment.
Bug: defaultdict is incorrectly instantiated with list[AttributeKey] (a type hint) instead of a callable list, causing a TypeError on first access.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The defaultdict constructor is incorrectly instantiated with list[AttributeKey]. This is a type hint (a types.GenericAlias object in Python 3.9+), not a callable function. When a missing key is accessed in chunked_attributes, defaultdict attempts to call list[AttributeKey](), leading to a TypeError: 'types.GenericAlias' object is not callable. This will cause an immediate server crash in OrganizationTraceItemsStatsEndpoint.get() and OrganizationTraceItemsAttributesRankedEndpoint.get() when processing attribute distributions.
💡 Suggested Fix
Change defaultdict(list[AttributeKey]) to defaultdict(list). The type hint dict[int, list[AttributeKey]] is correct and should remain.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/api/endpoints/organization_trace_item_stats.py#L97
Potential issue: The `defaultdict` constructor is incorrectly instantiated with
`list[AttributeKey]`. This is a type hint (a `types.GenericAlias` object in Python
3.9+), not a callable function. When a missing key is accessed in `chunked_attributes`,
`defaultdict` attempts to call `list[AttributeKey]()`, leading to a `TypeError:
'types.GenericAlias' object is not callable`. This will cause an immediate server crash
in `OrganizationTraceItemsStatsEndpoint.get()` and
`OrganizationTraceItemsAttributesRankedEndpoint.get()` when processing attribute
distributions.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 5300574
0242c8b to
cee4d15
Compare
| for stats in result: | ||
| for stats_type, data in stats.items(): | ||
| stats_results[stats_type]["data"].update(data["data"]) |
There was a problem hiding this comment.
Bug: The organization_trace_item_stats.py endpoint's response format changed, breaking API compatibility.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The organization_trace_item_stats.py endpoint's response format has changed from a dictionary structure, {"data": {stat_type: {...}}}, to a list of single-key dictionaries, {"data": [{stat_type: {...}}, ...]}. This transformation is a breaking API change that will cause clients expecting the original dictionary format to fail.
💡 Suggested Fix
Revert the response format in organization_trace_item_stats.py line 135 to return Response({"data": stats_results}) instead of Response({"data": [{k: v} for k, v in stats_results.items()]}).
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/api/endpoints/organization_trace_item_stats.py#L131-L133
Potential issue: The `organization_trace_item_stats.py` endpoint's response format has
changed from a dictionary structure, `{"data": {stat_type: {...}}}`, to a list of
single-key dictionaries, `{"data": [{stat_type: {...}}, ...]}`. This transformation is a
breaking API change that will cause clients expecting the original dictionary format to
fail.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 5302423
| meta=attrs_meta, | ||
| limit=max_attributes, | ||
| type=attr_type, | ||
| intersecting_attributes_filter=cohort_2, |
There was a problem hiding this comment.
Bug: Suspect cohort attributes missed due to baseline-only attribute filtering
The TraceItemAttributeNamesRequest uses intersecting_attributes_filter=cohort_2 (the baseline cohort), which means only attributes present in the baseline are fetched and analyzed. These attributes are then used to query both the suspect cohort (cohort_1) and baseline cohort (cohort_2). Attributes unique to the suspect cohort are completely missed, which defeats the purpose of comparing cohorts to find differentiating attributes. The old code didn't pre-filter attributes, so it analyzed all attributes from both cohorts independently.
Abdkhan14
left a comment
There was a problem hiding this comment.
Lgtm, I would look at that seer comment to make sure we don't break the FE
stats results would have returned |
Parallelize the stats query by explicitly passing a list of attributes to fetch.
Hoping to make the stats and ranked endpoints faster this way.