Skip to content

ANR: per-transaction java.util.Timer in SentryTracer spawns a thread on the calling thread #5663

Description

@kozaxinan

Integration

sentry-android

Build System

Gradle

AGP Version

8.43.1

Proguard

Enabled

Other Error Monitoring Solution

Firebase Crashlytics

Version

34.14.0

Steps to Reproduce

Setup — minimal app with user-interaction tracing on:

SentryAndroid.init(this) { options ->
options.dsn = ""
options.tracesSampleRate = 1.0
options.isEnableUserInteractionTracing = true // the path under test
}

Add any clickable/scrollable view with a resolvable view id (e.g. a Button
with android:id, or a RecyclerView) so SentryGestureListener resolves a target
and starts a transaction on the gesture.

A) Observe the thread-per-transaction behavior (any device):

  1. Set a breakpoint in the java.util.Timer constructor, or in
    SentryTracer. at the new Timer(true) line.
  2. Tap the button N times (or scroll the list, pausing >idleTimeout between
    gestures so each interaction opens a fresh transaction).
  3. Observe the breakpoint hits once per interaction → one Timer constructor →
    one Thread.start() per transaction.
    Equivalent without a debugger: dump threads before and after a burst of
    taps (adb shell kill -3 , or Thread.getAllStackTraces().size) — the
    thread count rises with the number of in-flight transactions, and you'll
    see transient "Timer-N" threads being created and torn down.

B) Surface the main-thread blocking (reproduces the ANR signature):

  1. Run on a low-end / heavily loaded device (or an emulator throttled with
    few cores + memory pressure; adb shell am start ... while a stress app
    runs in the background). Thread creation cost is what we're amplifying.
  2. Rapidly tap/scroll to fire many user-interaction transactions in quick
    succession. Each one does Thread.start() (nativeCreate) on the main thread.
  3. Under enough memory/CPU pressure, one of those native thread spawns blocks
    on the runtime lock long enough to trip the ANR watchdog — matching the
    reported stack (SentryGestureListener.onUp → startTransaction →
    SentryTracer. → new Timer → Thread.nativeCreate →
    art::ConditionVariable::WaitHoldingLocks).

Expected Result

Creating Traces should be performant and have minimal allocation. Tracing performance of application should not consume runtime resource aggressively.

Actual Result

We're seeing production ANRs where the main thread is blocked inside the SentryTracer constructor, specifically creating the per-transaction java.util.Timer. The Timer constructor starts a dedicated thread, and that thread spawn (Thread.nativeCreate) is what blocks. It happens on every user-interaction transaction (tap/scroll/swipe), so it fires on the main thread on a hot path.

This is the same underlying concern as #2130 (closed as not planned), but we now have custom ANR with 1 second duration report showing it has a real, production impact, and we think there's a cleaner fix than the shared-Timer idea that was originally proposed.

ANR stack trace

The main thread is blocked here while handling a touch event:

at nativeCreate (Thread.java)
at start (Thread.java:1433)
at <init> (Timer.java:197)
at <init> (Timer.java:166)
at <init> (SentryTracer.java:102)
at createTransaction (DefaultSpanFactory.java:15)
at createTransaction (Scopes.java:987)
at startTransaction (Scopes.java:917)
at startTransaction (Sentry.java:1224)
at startTransaction (ScopesAdapter.java:300)
at startTracing (SentryGestureListener.java:265)
at onUp (SentryGestureListener.java:84)
at handleTouchEvent (SentryWindowCallback.java:77)
at dispatchTouchEvent (SentryWindowCallback.java:58)
... (DecorView / ViewRootImpl input dispatch)

The native frame underneath shows the thread being parked on a runtime lock during thread creation:

at art::ConditionVariable::WaitHoldingLocks
  • ~100% on low-end devices.
  • Concentrated in emerging markets
  • Always on a gesture.
  • The absolute user count is modest for us because we collect that custom shorten ANR data only from Beta.

Why this is more than the object cost

The Timer object itself is cheap (a small TaskQueue heap + the thread wrapper). The expensive part is that every Timer constructor calls Thread.start(), and we do that once per transaction. java.util.Timer is explicitly designed for one thread to service thousands of scheduled tasks — so creating one per transaction is the opposite of how it's meant to be used.

Each SentryTracer only ever schedules two tasks (idle timeout + deadline timeout) and already tracks/cancels them individually (idleTimeoutTask, deadlineTimeoutTask, cancelIdleTimer(), cancelDeadlineTimer()). The only thing that requires a per-instance Timer today is the timer.cancel() in finish(). So the structure is already 90% of the way to using a shared scheduler.

Proposed fix (different from #2130)

#2130 suggested reusing a single Timer. The likely downside there: cancelled TimerTasks stay in the timer's heap until their scheduled fire time, so under high transaction volume the queue grows until purge() runs — a memory wart.

Instead, the SDK already has the right primitive: ISentryExecutorService / SentryExecutorService — a single daemon-thread ScheduledThreadPoolExecutor(1), created lazily off the main thread, reachable via options.getExecutorService(), and already used for other delayed work (with a prewarm() for queue pre-allocation). SentryTracer is the last place still on raw java.util.Timer.

Concretely:

  1. In SentryExecutorService, call setRemoveOnCancelPolicy(true) on the ScheduledThreadPoolExecutor. This evicts cancelled tasks immediately, which sidesteps the heap-growth problem that made the shared-Timer approach unattractive — and benefits every other caller too.
  2. In SentryTracer, replace the Timer/TimerTask pair with Future<?> handles from executorService.schedule(runnable, delayMillis):
    • keep the existing cancel-previous-then-reschedule logic, using future.cancel(false);
    • in finish(), cancel only the tracer's own two futures (drop timer.cancel(), since the executor is owned by SentryOptions);
    • guard scheduling against RejectedExecutionException for the shutdown window.

Metadata

Metadata

Assignees

No fields configured for issues without a type.

Projects

Status
Waiting for: Product Owner

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions