improvement(cleanup): batchTrigger fan-out, chunked queries, batched S3, faster outlier drain by waleedlatif1 · Pull Request #4688 · simstudioai/sim

waleedlatif1 · 2026-05-21T02:08:30Z

Summary

Fan cleanup-tasks/logs/soft-deletes out via tasks.batchTrigger (500 ws/chunk); bump to large-1x with concurrencyLimit: 5
Chunk bulk DELETEs (1000 IDs/stmt) and collectChatFiles JSONB SELECT (500 chats/stmt) to bound worker memory and lock duration
Replace per-key position() table scans with one LATERAL unnest scan per 200-key chunk
Route storage deletes through StorageService.deleteFiles (S3 DeleteObjects: 1000 keys/HTTP)
Raise per-run row cap to 100K so long-tail tenants (one prod workspace has 723K doomed rows) drain in days, not weeks

Type of Change

Improvement

Testing

Verified position-query SQL rewrite returns identical results to original against local Postgres with seeded data
tsc, biome check, check:api-validation all pass
98 adjacent tests pass (uploads, snapshot service, billing core)
Trigger.dev batchTrigger usage validated against official docs (SDK 4.4.3, all options within documented caps)

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…S3, faster outlier drain - Fan cleanup-tasks/logs/soft-deletes out via tasks.batchTrigger (500 ws/chunk); bump to large-1x with concurrencyLimit: 5 - Chunk bulk DELETEs (1000 IDs/stmt) and collectChatFiles JSONB SELECT (500 chats/stmt) to bound worker memory and lock duration - Replace per-key position() table scans with one LATERAL unnest scan per 200-key chunk - Route storage deletes through StorageService.deleteFiles (S3 DeleteObjects: 1000 keys/HTTP) - Raise per-run row cap to 100K so long-tail tenants (one prod workspace has 723K doomed rows) drain in days, not weeks

vercel · 2026-05-21T02:08:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		May 21, 2026 2:20am

cursor · 2026-05-21T02:08:40Z

PR Summary

Medium Risk
Touches retention job dispatching and bulk deletion paths (DB + storage), so misconfiguration or logic errors could lead to missed cleanups or excessive load, but changes are primarily batching/throughput controls.

Overview
Retention cleanup jobs are now dispatched and executed in fixed-size chunks: dispatchCleanupJobs pre-resolves workspaces/retention, fans out via tasks.batchTrigger (or queue fallback), and updates payloads to carry workspaceIds, retentionHours, and label (with a runGlobalHousekeeping flag for one-off plan-wide work).

Cleanup execution is tuned for scale and bounded resource usage: cleanup tasks run on large-1x with concurrencyLimit: 5, per-run DB delete capacity is increased (100K cap), explicit-ID deletes are chunked (1000/statement), chat file collection is chunked (500 chats/query), and large-value reference checks in cleanup-logs are rewritten to chunked unnest scans instead of per-key queries.

Storage deletion is batched: cleanup flows now group keys by storage context and call StorageService.deleteFiles, which adds provider-aware bulk deletion (S3 DeleteObjects in 1000-key requests, otherwise bounded-concurrency per-file) and surfaces per-key failures for logging.

^{Reviewed by Cursor Bugbot for commit ddbdacb. Configure here.}

greptile-apps · 2026-05-21T02:16:52Z

Greptile Summary

This PR overhauls the cleanup pipeline for scale: workspace chunks are fanned out via tasks.batchTrigger, bulk DELETEs and JSONB SELECTs are chunked to bound lock duration and memory, the per-key position() scan loop is replaced with a single LATERAL unnest pass per 200-key chunk, and S3 storage deletes are routed through the new StorageService.deleteFiles (DeleteObjects) batch API.

Dispatcher (cleanup-dispatcher.ts): replaces sequential jobQueue.enqueue calls with tasks.batchTrigger; adds 500-workspace chunks with per-chunk labels (free/1, free/2); runGlobalHousekeeping is pinned to the first matching chunk.
Batch-delete primitives (batch-delete.ts): new chunkArray, selectRowsByIdChunks, and deleteRowsById helpers; per-run row cap raised to 100K; delete chunks capped at 1000 IDs to bound FK-trigger queue length.
Storage (storage-service.ts, s3/client.ts): new deleteFiles uses DeleteObjects (1000 keys/HTTP) for S3 and falls back to a 25-worker concurrent loop for Blob/local.

Confidence Score: 5/5

Safe to merge; all changed paths are background cleanup jobs with no user-facing data mutations, and the chunking logic is correct.

The LATERAL unnest SQL rewrite is logically equivalent to the original per-key loop and handles multi-batch cross-workspace key sharing correctly. The new deleteRowsById and selectRowsByIdChunks primitives are well-bounded. The two observations flagged are low-probability edge cases that do not affect correctness under normal operating conditions.

cleanup-dispatcher.ts has a minor jobCount semantic change worth confirming with the monitoring team; cleanup-tasks.ts has a theoretical run-child ordering edge case only reachable with more than 100K eligible runs per workspace chunk.

Important Files Changed

Filename	Overview
apps/sim/lib/billing/cleanup-dispatcher.ts	Rewrites dispatch to use tasks.batchTrigger with 500-workspace/chunk fan-out; jobCount in the return value now reflects the number of batchTrigger API calls (typically 1), not the number of task runs triggered
apps/sim/lib/cleanup/batch-delete.ts	Adds chunked ID-list DELETE (deleteRowsById), SELECT helper (selectRowsByIdChunks), and raises the per-run cap to 100K; well-guarded with accurate upper-bound failure semantics
apps/sim/background/cleanup-logs.ts	Replaces N per-key position() scans with a LATERAL unnest scan per 200-key chunk; correctness verified — deletedLogIds are excluded so only retained rows are checked for references
apps/sim/background/cleanup-tasks.ts	Pre-selects doomed chat IDs for both copilot backend cleanup and DB deletion; run children deleted before parent runs to respect FK ordering
apps/sim/lib/uploads/core/storage-service.ts	Adds deleteFiles() using S3 DeleteObjects for batch deletes; Blob path falls back to bounded-concurrency per-file loop; correctly exported via export * as StorageService
apps/sim/lib/uploads/providers/s3/client.ts	Adds deleteManyFromS3 with 1000-key chunking and Quiet:true; correctly collects per-key errors from response.Errors and network-level errors separately

Sequence Diagram

sequenceDiagram
    participant Cron as Cron Route
    participant Dispatcher as cleanup-dispatcher
    participant Trigger as Trigger.dev batchTrigger
    participant Task as cleanup-* task (xN)
    participant DB as Postgres
    participant S3 as S3 DeleteObjects

    Cron->>Dispatcher: dispatchCleanupJobs(jobType)
    Dispatcher->>DB: listActiveWorkspaceCleanupScopeRows()
    Dispatcher->>DB: resolvePersonalPlanTypes / getOrgSubscription
    Dispatcher->>Dispatcher: buildCleanupChunks() 500 ws/chunk
    Dispatcher->>Trigger: tasks.batchTrigger up to 1000 payloads
    Trigger-->>Dispatcher: batchId
    Dispatcher-->>Cron: jobIds chunkCount workspaceCount

    Note over Task: Runs concurrently concurrencyLimit 5
    Task->>DB: selectRowsByIdChunks 50 batches x 2000 rows
    Task->>DB: chunkedBatchDelete onBatch filterLargeValueKeys LATERAL unnest 200 keys/chunk
    Task->>S3: StorageService.deleteFiles deleteManyFromS3 1000 keys/HTTP
    Task->>DB: DELETE WHERE id IN chunkIds 1000 IDs/stmt

_{Reviews (2): Last reviewed commit: "improvement(cleanup): chunk-index labels..." | Re-trigger Greptile}

… counter Addresses Greptile review feedback: - Disambiguate downstream logs when a plan splits into multiple workspace chunks (e.g. 'free/1', 'free/2') - Document that deleteRowsById's failed counter is an upper bound (chunk rolls back to 0 deletes on error)

waleedlatif1 · 2026-05-21T02:27:35Z

@greptile

waleedlatif1 · 2026-05-21T02:27:38Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit ddbdacb. Configure here.}

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated

Comment thread apps/sim/lib/cleanup/batch-delete.ts

vercel Bot temporarily deployed to Preview May 21, 2026 02:20 Inactive

cursor Bot reviewed May 21, 2026

View reviewed changes

waleedlatif1 merged commit 11ad891 into staging May 21, 2026
14 checks passed

waleedlatif1 deleted the waleedlatif1/trigger-cleanup-larger-machine branch May 21, 2026 02:51

waleedlatif1 mentioned this pull request May 21, 2026

v0.6.86: CORS updates, OAuth MCP, navigation pinning dynamic pages, google slides endpoints, DB access pattern improvements #4690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(cleanup): batchTrigger fan-out, chunked queries, batched S3, faster outlier drain#4688

improvement(cleanup): batchTrigger fan-out, chunked queries, batched S3, faster outlier drain#4688
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/trigger-cleanup-larger-machine

waleedlatif1 commented May 21, 2026

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 21, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 21, 2026

Uh oh!

waleedlatif1 commented May 21, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedlatif1 commented May 21, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

greptile-apps Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 21, 2026

Uh oh!

waleedlatif1 commented May 21, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 21, 2026 •

edited

Loading

cursor Bot commented May 21, 2026 •

edited

Loading

greptile-apps Bot commented May 21, 2026 •

edited

Loading