Skip to content

feat(taskworker): Minimum Worker Concurrency#735

Merged
george-sentry merged 26 commits into
mainfrom
george/push-taskbroker/better-child-management-alternative
Jun 30, 2026
Merged

feat(taskworker): Minimum Worker Concurrency#735
george-sentry merged 26 commits into
mainfrom
george/push-taskbroker/better-child-management-alternative

Conversation

@george-sentry

@george-sentry george-sentry commented Jun 29, 2026

Copy link
Copy Markdown
Member

Linear

Refs STREAM-1269

Description

In #731, we deferred setting the status to SERVING for workers in push mode until all threads were warmed up because before, Kubernetes thought workers were ready to receive activations even though they weren't, causing significant throughput declines during worker redeployments.

Something similar happens when child processes recycle, or exit voluntarily because they have executed a certain, configurable number of tasks. This is meant to prevent memory leaks.

In some cases, child processes recycle at the same time, decreasing throughput significantly until all child processes are ready again. We can solve this problem by improving the child management process as follows.

  • Spawn children and wait until they are all ready before advertising SERVING
  • When a process has executed the maximum number of tasks specified, it does NOT exit right away - instead, it tells the parent that it's ready to exit and continues working
  • The parent kills as many children as it can without falling below a configurable minimum concurrency (default is 0)

This way, after startup, there is some number of running child processes at all times. We may still see throughput dips during worker recycle waves, but the dip will be much less severe and somewhat predictable.

@george-sentry george-sentry requested a review from a team as a code owner June 29, 2026 20:58
@linear-code

linear-code Bot commented Jun 29, 2026

Copy link
Copy Markdown

STREAM-1269

Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
…eorge/push-taskbroker/better-child-management-alternative
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
Comment thread clients/python/src/taskbroker_client/worker/worker.py
Comment thread clients/python/tests/worker/test_worker.py
Comment thread clients/python/src/taskbroker_client/worker/worker.py
Comment thread clients/python/src/taskbroker_client/worker/worker.py

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0f61d82. Configure here.

Comment thread clients/python/src/taskbroker_client/worker/worker.py

@evanh evanh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think this is a good change (it all makes sense to me) but I think you could add more metrics. Some things I think would be useful:

  • How long the child had to wait before the parent set the release event
  • A gauge of how many children are in each state (pending/running/exiting)
  • How many children are waiting to exit at any one time (exiting deque)

@george-sentry george-sentry requested a review from evanh June 30, 2026 15:22
…eorge/push-taskbroker/better-child-management-alternative
Comment thread clients/python/src/taskbroker_client/worker/worker.py Outdated
@george-sentry george-sentry merged commit 086764a into main Jun 30, 2026
27 checks passed
@george-sentry george-sentry deleted the george/push-taskbroker/better-child-management-alternative branch June 30, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants