improvement(cleanup): batchTrigger fan-out, chunked queries, batched S3, faster outlier drain#4688
Conversation
…S3, faster outlier drain - Fan cleanup-tasks/logs/soft-deletes out via tasks.batchTrigger (500 ws/chunk); bump to large-1x with concurrencyLimit: 5 - Chunk bulk DELETEs (1000 IDs/stmt) and collectChatFiles JSONB SELECT (500 chats/stmt) to bound worker memory and lock duration - Replace per-key position() table scans with one LATERAL unnest scan per 200-key chunk - Route storage deletes through StorageService.deleteFiles (S3 DeleteObjects: 1000 keys/HTTP) - Raise per-run row cap to 100K so long-tail tenants (one prod workspace has 723K doomed rows) drain in days, not weeks
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Cleanup execution is tuned for scale and bounded resource usage: cleanup tasks run on Storage deletion is batched: cleanup flows now group keys by storage Reviewed by Cursor Bugbot for commit ddbdacb. Configure here. |
Greptile SummaryThis PR overhauls the cleanup pipeline for scale: workspace chunks are fanned out via
Confidence Score: 5/5Safe to merge; all changed paths are background cleanup jobs with no user-facing data mutations, and the chunking logic is correct. The LATERAL unnest SQL rewrite is logically equivalent to the original per-key loop and handles multi-batch cross-workspace key sharing correctly. The new deleteRowsById and selectRowsByIdChunks primitives are well-bounded. The two observations flagged are low-probability edge cases that do not affect correctness under normal operating conditions. cleanup-dispatcher.ts has a minor jobCount semantic change worth confirming with the monitoring team; cleanup-tasks.ts has a theoretical run-child ordering edge case only reachable with more than 100K eligible runs per workspace chunk. Important Files Changed
Sequence DiagramsequenceDiagram
participant Cron as Cron Route
participant Dispatcher as cleanup-dispatcher
participant Trigger as Trigger.dev batchTrigger
participant Task as cleanup-* task (xN)
participant DB as Postgres
participant S3 as S3 DeleteObjects
Cron->>Dispatcher: dispatchCleanupJobs(jobType)
Dispatcher->>DB: listActiveWorkspaceCleanupScopeRows()
Dispatcher->>DB: resolvePersonalPlanTypes / getOrgSubscription
Dispatcher->>Dispatcher: buildCleanupChunks() 500 ws/chunk
Dispatcher->>Trigger: tasks.batchTrigger up to 1000 payloads
Trigger-->>Dispatcher: batchId
Dispatcher-->>Cron: jobIds chunkCount workspaceCount
Note over Task: Runs concurrently concurrencyLimit 5
Task->>DB: selectRowsByIdChunks 50 batches x 2000 rows
Task->>DB: chunkedBatchDelete onBatch filterLargeValueKeys LATERAL unnest 200 keys/chunk
Task->>S3: StorageService.deleteFiles deleteManyFromS3 1000 keys/HTTP
Task->>DB: DELETE WHERE id IN chunkIds 1000 IDs/stmt
Reviews (2): Last reviewed commit: "improvement(cleanup): chunk-index labels..." | Re-trigger Greptile |
… counter Addresses Greptile review feedback: - Disambiguate downstream logs when a plan splits into multiple workspace chunks (e.g. 'free/1', 'free/2') - Document that deleteRowsById's failed counter is an upper bound (chunk rolls back to 0 deletes on error)
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit ddbdacb. Configure here.
Summary
tasks.batchTrigger(500 ws/chunk); bump tolarge-1xwithconcurrencyLimit: 5collectChatFilesJSONB SELECT (500 chats/stmt) to bound worker memory and lock durationposition()table scans with oneLATERAL unnestscan per 200-key chunkStorageService.deleteFiles(S3DeleteObjects: 1000 keys/HTTP)Type of Change
Testing
tsc,biome check,check:api-validationall passbatchTriggerusage validated against official docs (SDK 4.4.3, all options within documented caps)Checklist