Skip to content

fix(group_update): Handle last_seen integer overflow on groups#104297

Merged
yuvmen merged 7 commits intomasterfrom
yuvmen/handle_lastseen_integer_overflow
Dec 5, 2025
Merged

fix(group_update): Handle last_seen integer overflow on groups#104297
yuvmen merged 7 commits intomasterfrom
yuvmen/handle_lastseen_integer_overflow

Conversation

@yuvmen
Copy link
Copy Markdown
Member

@yuvmen yuvmen commented Dec 2, 2025

Using error upsampling groups can cross the MAX_INT value of last_seen, handle these errors and cap the value to deal with these errors for now.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 2, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/buffer/base.py 86.95% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           master   #104297    +/-   ##
=========================================
  Coverage   80.57%    80.58%            
=========================================
  Files        9340      9337     -3     
  Lines      399123    399487   +364     
  Branches    25553     25553            
=========================================
+ Hits       321581    321912   +331     
- Misses      77093     77126    +33     
  Partials      449       449            

Using error upsampling groups can cross the MAX_INT value of last_seen,
handle these errors and cap the value to deal with these errors for now.
the update dict actually already holds other values than experssion, but the
direct assignment we added causes it to error. Instead of creating a complex
type for the update dict just set it to Any
@yuvmen yuvmen marked this pull request as ready for review December 3, 2025 22:54
update_kwargs["times_seen"] = MAX_INT32
try:
group.update(using=None, **update_kwargs)
logger.warning(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth logging this to datadog instead. I think that warnings will end up as sentry errors too, so you might want to drop these to info.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh right actually these would just spam, I guess a datadog metric is the cheapest way to keep track of it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went ahead and replaced with metric, and added a metric for the retry failure as well

)
except Exception as retry_error:
# If the capped update also fails, log and skip
logger.warning(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth logging as an error if this fails

else:
group.update(using=None, **update_kwargs)
try:
group.update(using=None, **update_kwargs)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious whether we could just do something like this to save a db roundtrip?

if "times_seen" in update_kwargs:
    update_kwargs["times_seen"] = min(update_kwargs["times_seen"], MAX_INT32)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I thought about it as well, and I did decide to go this route to maintain visibility and keep track of when this happens... but thinking about it some more I dont know if its worth the round trip and the ever increasing metric just to be able to tell this is going on. We will always be able to check for groups with MAX_INT last_seen to know which are affected. It felt like bad practice but now I am actually thinking this is the cleaner option, I think ill change to do this.

Copy link
Copy Markdown
Member Author

@yuvmen yuvmen Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok its not that simple, we dont use the times_seen value, we just tell it to basically inc the existing value - giving it an expression of like V + 1 to handle race conditions and such. Looks like there is no easy way to do it at the DB level without overflowing, the only way is using Case which would really slow down updates for all groups and I am reluctant to do that. I think keeping updates fast for the 99.9% case and slow for these groups might be better.
However we can actually just add a check for the current last_seen value and if it is MAX_INT, not update times_seen, which would prevent the overflow

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Least helpful here?

I think something like

for c, v in columns.items():
    if c == "times_seen":
        update_kwargs[c] = Least(F(c) + v, MAX_INT32)

runs atomically to set times_seen as the smallest of the values provided

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I tried, but it turns out Least() has to actually calculate that value, and it overflows and errors in the DB when it does that, so no dice

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only thing that can be done in DB level that I found is adding a CASE statement, but that would really slow down updates, even Least would but I would have accepted that I think

Copy link
Copy Markdown
Contributor

@thetruecpaul thetruecpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with nits.

This is better than current state, but I think (either before landing or as a followup) I'd suggest you look into @shashjar's comment about Least. Obviously we can't do the impossible, but if there is a way to do this in fewer DB hits then that'd be awesome

logger = logging.getLogger(__name__)

# Maximum value for a 32-bit signed integer
MAX_INT32 = 2_147_483_647
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you use BoundedPositiveIntegerField.MAX_VALUE instead of a new constant to avoid duplication & match the Group field definition?


if not signal_only:
update_kwargs: dict[str, Expression] = {c: F(c) + v for c, v in columns.items()}
update_kwargs: dict[str, Any] = {c: F(c) + v for c, v in columns.items()}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we keep the type as constrained as possible? Maybe dict[str, Expression | int] instead of opening the door to Any?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original Expression is actually not accurate and that dict does not contain only Expression, mypy just doesnt detect it but the update_kwargs contains other things than Expression, the real type is more convoluted and it actually not enforced so I wanted it to be clear here. The extra dict can contain datetime or any other arbitrary type, so Any is probably warranted here.

@yuvmen yuvmen enabled auto-merge (squash) December 5, 2025 18:23
@yuvmen yuvmen merged commit 55abbe6 into master Dec 5, 2025
67 checks passed
@yuvmen yuvmen deleted the yuvmen/handle_lastseen_integer_overflow branch December 5, 2025 18:35
ryan953 pushed a commit that referenced this pull request Dec 9, 2025
Using error upsampling groups can cross the MAX_INT value of last_seen,
handle these errors and cap the value to deal with these errors for now.
@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants