feat(retention): export/clean/rehydrate endpoints for task content by smoreinis · Pull Request #243 · scaleapi/scale-agentex

smoreinis · 2026-05-19T20:12:22Z

Summary

Adds an operational surface for bounded retention of task chat content in shared infrastructure. Callers can snapshot a task's content, delete it from the shared stores, and later restore it byte-identically — preserving message IDs and timestamps so tool-call and reasoning references between messages stay valid.

Snapshots can be transported either inline as JSON or via caller-supplied presigned URLs, so very long conversations don't bump into proxy/ALB body-size limits.

New endpoints

Method	Path	Behavior
`GET`	`/tasks/{task_id}/export`	Returns a self-contained snapshot (messages + task_states) inline. Same shape that rehydrate accepts — schema parity is the load-bearing invariant for byte-identical round-trip.
`POST`	`/tasks/{task_id}/export`	Builds the same snapshot and PUTs it to a caller-supplied presigned upload URL. Returns a small ack with byte/item counts. Use when the snapshot is too large for an inline response.
`POST`	`/tasks/{task_id}/clean`	Deletes content across Mongo `messages`, Mongo `task_states`, Postgres `events`; resets `agent_task_tracker.last_processed_event_id`; sets `tasks.cleaned_at`.
`POST`	`/tasks/{task_id}/rehydrate`	Restores content from a snapshot with caller-supplied IDs preserved; clears `cleaned_at`. Accepts either inline content (`messages` + `task_states`) or a presigned `snapshot_url` to download from — mutually exclusive.

Design notes

Two transport modes for the snapshot

For modest conversations, inline JSON works fine and is the simplest path. For very long conversations (deep reasoning content, extensive tool traces, many attachments), the snapshot can exceed nginx's 1 MB default client_max_body_size or AWS ALB's 1 MB body cap. The URL transport sidesteps both:

Export to URL: caller generates a presigned PUT URL on their object storage; Agentex uploads the JSON snapshot to it; response is a small ack so callers can verify byte/item counts without re-downloading.
Rehydrate from URL: caller passes snapshot_url (presigned GET); Agentex downloads, parses as TaskSnapshotEntity, and runs the same insert path as inline.

The existing inline paths are unchanged. Pydantic validators enforce inline-XOR-url on rehydrate; mixed payloads are rejected.

SSRF guard

Both URL paths are vulnerable to SSRF if unguarded — a malicious or careless caller could point Agentex at internal services, cloud metadata (169.254.169.254), or private network ranges. utils/url_validation.py implements a deny-known-bad guard: requires https, resolves the hostname asynchronously, and rejects any IP that is private, loopback, link-local, reserved, multicast, or unspecified. Uses loop.getaddrinfo so it doesn't block the event loop.

The guard has a known limitation: DNS rebinding (returning a public IP for the validation check and a private IP for the actual request) isn't mitigated. Acceptable for v1 given that callers are authenticated and authorized — the threat model is operator misconfiguration, not active attack.

Cleanup design

Domain logic in TaskRetentionService — both the HTTP routes and the (future) scheduled Temporal sweep workflow will call the same service methods, so the cleanup path is exercised by the same code in both contexts.
Cleanup operation order — Mongo deletes (idempotent by task_id) first, then Postgres operations, then tasks.cleaned_at last. A retry after partial failure converges because each step is idempotent and cleaned_at is the gate that keeps subsequent runs from re-doing work.
Guards on clean:
- Refuses tasks with status == RUNNING (regardless of force=true).
- Refuses tasks with unprocessed events past their cursor.
- Refuses tasks not idle for idle_days (default 7) unless force=true.
- Already-cleaned tasks return an empty result rather than raising.
Optimistic concurrency for the unprocessed-events check — no row locks; an event arriving in the narrow race window between check and delete will be deleted with the rest. Acceptable for v1; surfaced in audit logs (events_deleted > 0 on an idle-checked task is a signal).
No audit table for v1 — clean, rehydrate, and export_to_url operations emit structured log lines (task_cleanup_completed, task_rehydrate_completed, task_export_to_url_completed) with the result payload. Datadog log search is the forensic trail.
ID preservation is a caller contract — the caller (external integrator) must capture Agentex-generated IDs at write time and supply them at rehydrate time. Agentex does not enforce this; the round-trip invariant depends on it.
tasks.params is out of scope for v1 — not exported, not stripped during cleanup, not restored. If it turns out to carry chat content for specific agents, follow up.

Schema change

A single nullable column on tasks:

ALTER TABLE tasks ADD COLUMN cleaned_at TIMESTAMPTZ NULL;

This is a metadata-only ALTER (Postgres ≥11 doesn't rewrite the table). Falls within the project's safe-migration shape — passes the migration safety linter.

Other changes

adapter_mongodb.batch_create now translates pymongo BulkWriteError containing only duplicate-key sub-errors (code 11000) into DuplicateItemError (HTTP 400). Previously it fell through to the generic Exception handler and surfaced as HTTP 500. Narrowly scoped — non-duplicate bulk-write errors still surface as ServiceError.
Two new repository methods:
- EventRepository.delete_by_task_id(task_id) → int
- AgentTaskTrackerRepository.reset_cursors_for_task(task_id) → int
TaskRetentionService injects DHttpxClient for upload/download.

Tests

21 integration tests in tests/integration/api/task_retention/test_task_retention_api.py covering:

Export (inline): happy path, empty task, nonexistent task (404)
Export to URL: happy path (mock httpx PUT, body byte-equal to inline export), non-https scheme rejected, 127.0.0.1 SSRF rejected
Clean: success across all surfaces, cursor reset, RUNNING refused, already-cleaned no-op, unprocessed events refused, nonexistent task (404)
Rehydrate (inline): byte-identical round-trip, active task refused, task_id mismatch refused, per-entity task_id mismatch refused for messages and task_states, ID collision refused
Rehydrate from URL: byte-identical round-trip via downloaded snapshot, both inline + url rejected by validator, 127.0.0.1 SSRF rejected

Suite runs in ~24s via testcontainers and passes.

Test plan

Run make test FILE=tests/integration/api/task_retention/
Sanity-check the migration applies cleanly on a non-empty tasks table (metadata-only, should be instant)
Manual inline round-trip via curl against a non-RUNNING task: GET /export → save snapshot → POST /clean → POST /rehydrate with the snapshot → re-export → diff
Manual URL round-trip: generate a presigned PUT URL on a sandbox bucket, POST /export to it, generate a presigned GET URL for the uploaded object, POST /clean, then POST /rehydrate with snapshot_url pointing at the GET URL, then re-export → diff
Confirm cleaned_at surfaces in GET /tasks/{id} responses
Confirm the existing tasks API and tests still pass (no behavior change for active tasks)

Follow-ups (not in this PR)

Scheduled Temporal cleanup workflow that calls clean_task on a daily sweep
Concurrent-write defense during clean (currently a known narrow race with force=true)
Optional: gzipped body format for the URL transport (5–10× smaller for repetitive chat content); streaming/NDJSON for very large snapshots
Explicit byte and timeout caps on upload/download once we have data on real-world snapshot sizes
Decision on tasks.params content stripping if it proves to carry chat content for any agent

Greptile Summary

This PR adds a bounded-retention surface for task chat content: callers can snapshot, delete, and byte-identically restore a task's messages and states via four new endpoints (GET/POST /export, POST /clean, POST /rehydrate), backed by a new TaskRetentionService that drives the same logic for both the HTTP admin surface and a future Temporal sweep workflow.

Export / clean / rehydrate: paginated snapshot building, idempotent multi-store deletion (Mongo messages + states, Postgres events + tracker cursors, tasks.cleaned_at), and ordered batch-insert restore with caller-supplied ID preservation — all with precondition guards (RUNNING check, idle-threshold, unprocessed-events check, cleaned-state gate).
URL transport mode: presigned PUT (export) and presigned GET (rehydrate) paths for snapshots that exceed proxy body limits, protected by an async SSRF guard that resolves hostnames and rejects non-public IP ranges.
Supporting changes: BulkWriteError → DuplicateItemError translation in the MongoDB adapter for all-duplicate-key bulk write failures; cleaned_at nullable column on tasks; two new repository methods (delete_by_task_id, reset_cursors_for_task); 21 integration tests covering the round-trip invariant and all guards.

Confidence Score: 4/5

Safe to merge after stripping the presigned URL from the audit log line in export_task_to_url; all other logic is well-guarded and thoroughly tested.

The task_export_to_url_completed structured log writes the full presigned PUT URL — including the embedded HMAC signature and credential — to Datadog. Any Datadog user can extract and reuse the URL within its validity window to overwrite the object storage object. The remaining changes (SSRF guard, idempotent multi-store cleanup, ID-preserving rehydrate, adapter error translation) are solid and covered by 21 integration tests.

agentex/src/domain/services/task_retention_service.py — the export_task_to_url audit log at line 165 includes upload_url verbatim.

Security Review

Credential leakage via logs (task_retention_service.py line 165): The task_export_to_url_completed structured log emits upload_url verbatim. Presigned PUT URLs for S3/GCS embed the access key, expiry, and HMAC signature in query parameters. Any user with Datadog log access can reuse the URL to write to the object storage location within the URL's validity window. The fix is to strip query parameters before logging (log only scheme + host + path).
DNS rebinding (acknowledged in PR description): The SSRF guard in url_validation.py resolves the hostname at validation time but the subsequent HTTP request uses the original hostname. A DNS rebinding attack can return a public IP during validation and a private IP for the actual request. This is explicitly noted as an accepted limitation for v1 given authenticated callers.
No other injection, auth-bypass, or input-validation issues were identified in the new routes or service layer.

Important Files Changed

Filename	Overview
agentex/src/domain/services/task_retention_service.py	Core retention logic for export/clean/rehydrate; well-guarded with idle/status/cursor checks, but the export_task_to_url audit log emits the full presigned URL (credential leak to Datadog).
agentex/src/utils/url_validation.py	SSRF guard that enforces https-only and rejects private/loopback/reserved IPs; uses asyncio.get_event_loop() (deprecated in 3.10+) instead of get_running_loop().
agentex/src/api/routes/task_retention.py	Four new REST handlers (export GET, export-to-URL POST, clean POST, rehydrate POST) with correct DAuthorizedId gating (read/update/delete) mirroring existing task routes.
agentex/src/api/schemas/task_retention.py	Pydantic schemas with a correct _exactly_one_source validator that rejects both empty payloads and mixed inline+URL payloads on rehydrate.
agentex/src/adapters/crud_store/adapter_mongodb.py	Adds BulkWriteError → DuplicateItemError translation for ordered insert_many with all-11000 sub-errors; non-duplicate bulk errors still surface as ServiceError.
agentex/src/domain/repositories/event_repository.py	Adds delete_by_task_id — idempotent DELETE with rowcount return, consistent with the existing repository style.
agentex/src/domain/repositories/agent_task_tracker_repository.py	Adds reset_cursors_for_task — bulk UPDATE setting last_processed_event_id = NULL for all trackers tied to a task, correctly scoped within a transactional session.
agentex/database/migrations/alembic/versions/2026_05_19_1929_adding_task_cleaned_at_6c942325c828.py	Metadata-only nullable column addition (cleaned_at TIMESTAMPTZ NULL) with reversible downgrade; no table rewrite on Postgres 11+.
agentex/tests/integration/api/task_retention/test_task_retention_api.py	21 integration tests covering the full round-trip, all precondition guards, SSRF rejection, and byte-identical snapshot invariant for both inline and URL transport modes.
agentex/src/domain/use_cases/task_retention_use_case.py	Thin orchestration layer forwarding to TaskRetentionService; correctly translates the HTTP-facing force flag into enforce_idle_threshold for the service.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Route as task_retention.py
    participant UC as TaskRetentionUseCase
    participant Svc as TaskRetentionService
    participant Mongo as MongoDB
    participant PG as PostgreSQL
    participant Store as Object Storage

    Note over Caller,Store: Export (inline)
    Caller->>Route: "GET /tasks/{id}/export"
    Route->>UC: export_task(task_id)
    UC->>Svc: export_task(task_id)
    Svc->>PG: task_repository.get(id)
    Svc->>Mongo: paginated messages (asc created_at)
    Svc->>Mongo: paginated task_states (asc created_at)
    Svc-->>Caller: TaskSnapshotEntity JSON

    Note over Caller,Store: Export to URL
    Caller->>Route: "POST /tasks/{id}/export {upload_url}"
    Route->>UC: export_task_to_url(task_id, upload_url)
    UC->>Svc: export_task_to_url(...)
    Svc->>Svc: validate_external_url (SSRF guard)
    Svc->>Svc: export_task snapshot
    Svc->>Store: PUT snapshot JSON (presigned URL)
    Svc-->>Caller: "{uploaded_bytes, counts}"

    Note over Caller,Store: Clean
    Caller->>Route: "POST /tasks/{id}/clean {force, idle_days}"
    Route->>UC: clean_task(task_id, force, idle_days)
    UC->>Svc: "clean_task(enforce_idle_threshold=!force)"
    Svc->>PG: task_repository.get check cleaned_at status
    Svc->>Svc: _is_task_idle / _has_unprocessed_events
    Svc->>Mongo: delete messages by task_id
    Svc->>Mongo: delete task_states by task_id
    Svc->>PG: delete events by task_id
    Svc->>PG: reset tracker cursors
    Svc->>PG: "tasks.cleaned_at = now()"
    Svc-->>Caller: TaskCleanupResultEntity

    Note over Caller,Store: Rehydrate (inline or URL)
    Caller->>Route: "POST /tasks/{id}/rehydrate {snapshot or snapshot_url}"
    Route->>UC: rehydrate_task(task_id, snapshot or url)
    UC->>Svc: rehydrate_task(...)
    alt snapshot_url provided
        Svc->>Svc: validate_external_url (SSRF guard)
        Svc->>Store: GET snapshot JSON (presigned URL)
        Svc->>Svc: parse TaskSnapshotEntity
    end
    Svc->>Svc: validate task_id per-entity task_id
    Svc->>PG: task_repository.get assert cleaned_at NOT NULL
    Svc->>Mongo: batch_create messages (caller IDs preserved)
    Svc->>Mongo: batch_create task_states (caller IDs preserved)
    Svc->>PG: "tasks.cleaned_at = NULL"
    Svc-->>Caller: 204 No Content

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
agentex/src/domain/services/task_retention_service.py:161-170
**Presigned URL credentials written to Datadog logs**

`upload_url` is a presigned S3/GCS PUT URL whose query parameters embed the access key, date, signature, and expiry. Logging it verbatim means any user with Datadog log access can reuse the URL to overwrite the object within its validity window. The forensic goal (knowing where the upload went) is achievable by logging only the base URL (scheme + host + path, stripping query parameters) — that uniquely identifies the storage object without exposing the credential.

Strip query params before logging: `urllib.parse.urlunparse(parsed._replace(query=""))` where `parsed = urlparse(upload_url)`.

### Issue 2 of 2
agentex/src/utils/url_validation.py:35-37
`asyncio.get_event_loop()` is deprecated in Python 3.10+ when called outside a running event loop. Since `validate_external_url` is an `async` function there will always be a running loop, but `get_running_loop()` is the correct idiomatic call here — it raises `RuntimeError` immediately if no loop is running (instead of silently creating a new one) and avoids the deprecation path entirely.

```suggestion
    loop = asyncio.get_running_loop()
    try:
        infos = await loop.getaddrinfo(parsed.hostname, parsed.port or 443)
```

_{Reviews (5): Last reviewed commit: "address review: reject empty rehydrate p..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

Adds an operational surface for bounded retention of task chat content in shared infrastructure. Callers can snapshot a task's content, delete it from the shared stores, and later restore it byte-identically from the snapshot — preserving message IDs and timestamps so tool-call and reasoning references remain valid. Three new endpoints under /tasks/{task_id}: - GET /export — returns a self-contained snapshot (messages + task_states) - POST /clean — deletes content across Mongo messages, Mongo task_states, Postgres events; resets agent_task_tracker cursors; sets tasks.cleaned_at - POST /rehydrate — restores content from a snapshot, clears cleaned_at Domain layer lives in TaskRetentionService so the eventual scheduled sweep workflow and the HTTP endpoints share the same code path. Cleanup uses a "Mongo deletes first, Postgres marker last" order so retries after partial failure converge correctly. The active-task, idle-threshold, and unprocessed-events guards refuse cleanup when the task isn't safe to drop. Schema: - New nullable tasks.cleaned_at column (TIMESTAMPTZ, metadata-only ALTER) - No new audit table — cleanup operations emit structured log lines Other changes: - adapter_mongodb.batch_create now translates pymongo BulkWriteError with all-duplicate-key sub-errors into DuplicateItemError (HTTP 400) instead of letting it surface as ServiceError (HTTP 500) - New EventRepository.delete_by_task_id and AgentTaskTrackerRepository.reset_cursors_for_task methods Tests: 13 integration tests covering happy paths, all precondition guards, and the byte-identical export → clean → rehydrate round-trip.

github-actions · 2026-05-19T20:13:08Z

✱ Stainless preview builds

This PR will update the agentex-sdk SDKs with the following commit messages.

openapi

feat(api): add export/clean/rehydrate to tasks, cleaned_at field, entity types

python

feat(api): add cleaned_at field to task response types

typescript

feat(api): add cleaned_at field to tasks responses

✅ agentex-sdk-openapi studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅

⚠️

agentex-sdk-typescript studio · code

Your SDK build had at least one "warning" diagnostic.
generate ⚠️ → build ✅ → lint ✅ → test ✅
npm install https://pkg.stainless.com/s/agentex-sdk-typescript/d8862bdfe1c42bbcf7308272b971e713b705d17b/dist.tar.gz

⚠️

agentex-sdk-python studio · code

Your SDK build had at least one "warning" diagnostic.
generate ⚠️ → build ✅ → lint ✅ → test ✅
pip install https://pkg.stainless.com/s/agentex-sdk-python/38ed3384094f7f07f6b2482489f457fd1dc4f76d/agentex_sdk-0.11.4-py3-none-any.whl

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-05-26 23:17:30 UTC

… task_ids Two P1 issues from review. **Authorization (security)** The three retention endpoints were inheriting only the global auth middleware, not the resource-level authorization that every other /tasks/{task_id}/* route enforces. Any authenticated principal could export, clean, or rehydrate a task they don't own. Adds DAuthorizedId to all three handlers matching the existing pattern: - export → AuthorizedOperationType.read - clean → AuthorizedOperationType.delete - rehydrate → AuthorizedOperationType.update **Per-entity task_id validation** snapshot.task_id was checked against the path task_id, but each embedded TaskMessageEntity and StateEntity carries its own task_id field that batch_create forwards straight to MongoDB. A caller could pass snapshot.task_id = "A" with messages whose task_id = "B" and pollute task B's collection — Mongo has no FK to reject it. Adds explicit per-item validation in rehydrate_task before any insert. Returns 400 with the offending index in the message so the caller can find the bad entry. Tests: 2 new integration tests covering the mismatched-task_id cases for both messages and task_states. Full suite (15 tests) still passes.

danielmillerp

looks good mostly just questions

danielmillerp · 2026-05-21T17:21:38Z

+    export → clean → rehydrate is a round-trip-equivalent operation.
+    """
+    snapshot = await use_case.export_task(task_id)
+    return ExportTaskResponse.model_validate(snapshot)


this could probs be used for the cengage use case too, but looks like we are holding this in memory and then sending over wire, do we want to stream here, or output to s3 file ?

The use case I remember just wanted a list of task IDs and task names, so this would probably be too much data (since it also exports the messages as well). Not sure if there's another use case you had in mind where they also care about chat content as well.

Either way, you're totally correct that we're holding this in memory and sending this over the wire for now. Like the rehydrate case, I think the cleanest way to support larger amounts of data here would be to allow an optional presigned upload URL where we can then output this data to (to avoid worrying about long running streaming requests or having to manage which files we upload to with which cloud providers)

danielmillerp · 2026-05-21T17:23:27Z

+workflow calls the same use case (TaskRetentionUseCase.clean_task), not
+these endpoints.
+
+Authorization mirrors the existing /tasks routes via DAuthorizedId:


confirming, we get this for free with tasks auth z work being done right?

right now, this just uses the DAuthorizedId shortcuts which relies on agentex-auth and uses the existing SGP permissions table so we're getting this for free based on the initial implementation (not sure if that's the authz work you were referring to being done)

danielmillerp · 2026-05-21T17:47:16Z

+            "unprocessed-events checks still apply. Admin use only."
+        ),
+    )
+    idle_days: int = Field(


danielmillerp · 2026-05-21T18:04:13Z

+    async def delete_by_task_id(self, task_id: str) -> int:
+        """Delete all events for a task. Idempotent. Returns rows deleted."""
+        async with (
+            self.start_async_db_session(True) as session,


what happens if we force clean and then new messages come in as we are cleaning?

for events, we have a lock during the session that does the delete so we will only be able to persist new events after the delete finishes.
for messages, we page through existing messages so if we have new messages coming in then they will get returned by one of the subsequent page fetches and also get deleted.
ultimately we don't have a good way to acquire a lock on postgres and mongodb data at once so beyond aiming to run this cleanup during off hours to reduce the risk of a race, the only other thing i can think of doing is to verify that the task is still idle before each batch of deletes.

actually, thinking through this more - once the cleanup has started, the only real way to leave this task in a consistent state is to finish the cleanup. if we abort half way through, then it ends up being in neither a cleaned nor a rehydrated state and we don't have any way to roll back the cleaning since the rehydration data would be coming in from outside the system.

For very long conversations (deep reasoning content, extensive tool traces, many attachments), the rehydrate JSON body can exceed proxy / ALB body-size limits. Same risk on the export response. Adds an opt-in URL path to both operations: - POST /tasks/{task_id}/export (new) — Agentex builds the snapshot and PUTs it as JSON to a caller-supplied presigned upload URL. Returns a small ack with byte/item counts. - POST /tasks/{task_id}/rehydrate (extended) — accepts an optional snapshot_url field. When set, Agentex GETs the URL and parses the body as a TaskSnapshotEntity. Mutually exclusive with inline content; the Pydantic validator rejects mixed payloads. The existing inline paths (GET /export, POST /rehydrate with inline content) are unchanged. Calls that fit in a JSON body still take the direct route — only oversized callers need to use object storage. SSRF guard (utils/url_validation.py): deny-known-bad. Requires https, resolves the hostname, and rejects any address that is_private, is_loopback, is_link_local (covers 169.254.169.254 cloud metadata), is_reserved, is_multicast, or is_unspecified. Uses async DNS so it doesn't block the event loop. Format: plain JSON for both directions (round-trip-symmetric with the inline path). Gzipped/streaming are future possibilities. Caps: no explicit byte/timeout limits in v1. If real callers hit issues we'll add them then. Sync: callers wait for upload/download to complete. Async-with-status is a follow-up if needed. Tests: 6 new integration tests covering happy paths (mock httpx PUT/GET, verify body == inline export, verify byte-identical round-trip through URL form), SSRF rejection (127.0.0.1), scheme rejection (http), and the inline-vs-URL mutual-exclusion validator. Total suite 21 passing. Other: - TaskRetentionService now depends on DHttpxClient (already a global singleton via GlobalDependencies) - New TaskExportToUrlResultEntity carries the upload ack shape

greptile-apps · 2026-05-26T20:44:41Z

+    @model_validator(mode="after")
+    def _exactly_one_source(self):
+        has_inline = bool(self.messages or self.task_states)
+        has_url = self.snapshot_url is not None
+        if has_inline and has_url:
+            raise ValueError(
+                "Provide inline content (messages/task_states) OR snapshot_url, "
+                "not both."
+            )
+        return self


The _exactly_one_source validator guards against providing both an inline payload and snapshot_url, but it does not guard against providing neither. A request with only task_id (empty lists, no URL) passes validation; the route then constructs an empty TaskSnapshotEntity and calls rehydrate_task, which skips all inserts and clears cleaned_at. The task ends up marked as active again with no messages or task states restored — an inconsistent state that silently discards the clean/rehydrate invariant.

Suggested change

@model_validator(mode="after")

def _exactly_one_source(self):

has_inline = bool(self.messages or self.task_states)

has_url = self.snapshot_url is not None

if has_inline and has_url:

raise ValueError(

"Provide inline content (messages/task_states) OR snapshot_url, "

"not both."

)

return self

@model_validator(mode="after")

def _exactly_one_source(self):

has_inline = bool(self.messages or self.task_states)

has_url = self.snapshot_url is not None

if has_inline and has_url:

raise ValueError(

"Provide inline content (messages/task_states) OR snapshot_url, "

"not both."

)

if not has_inline and not has_url:

raise ValueError(

"Provide exactly one of: inline content (messages/task_states) "

"or snapshot_url."

)

return self

Prompt To Fix With AI

This is a comment left during a code review. Path: agentex/src/api/schemas/task_retention.py Line: 73-82 Comment: The `_exactly_one_source` validator guards against providing *both* an inline payload and `snapshot_url`, but it does not guard against providing *neither*. A request with only `task_id` (empty lists, no URL) passes validation; the route then constructs an empty `TaskSnapshotEntity` and calls `rehydrate_task`, which skips all inserts and clears `cleaned_at`. The task ends up marked as active again with no messages or task states restored — an inconsistent state that silently discards the clean/rehydrate invariant. ```suggestion @model_validator(mode="after") def _exactly_one_source(self): has_inline = bool(self.messages or self.task_states) has_url = self.snapshot_url is not None if has_inline and has_url: raise ValueError( "Provide inline content (messages/task_states) OR snapshot_url, " "not both." ) if not has_inline and not has_url: raise ValueError( "Provide exactly one of: inline content (messages/task_states) " "or snapshot_url." ) return self ``` How can I resolve this? If you propose a fix, please make it concise.

Addressed in 243924b — applied the validator change; also added test_rehydrate_rejects_empty_payload and adjusted two existing tests that previously sent empty payloads to hit downstream guards.

Greptile P1: the _exactly_one_source validator rejected "both inline AND url" but allowed "neither". A request with just task_id, empty messages/task_states, no snapshot_url passed validation, then rehydrate_task would silently skip all inserts and clear cleaned_at — leaving the task active with no content. Whatever was deleted at clean time is permanently gone. Validator now rejects empty payloads with the same shape of error as the "both" case. Two existing tests adjusted to send a real message so they reach the cleaned_at / task_id checks they were meant to exercise. New test_rehydrate_rejects_empty_payload codifies the empty-rejection so a future refactor can't silently regress. The "rehydrate an empty cleaned task" case is now unsupported. If that's ever needed, an explicit allow_empty flag is the right way to opt in; we have no use case for it today.

greptile-apps · 2026-05-26T22:22:14Z

+        logger.info(
+            "task_export_to_url_completed",
+            extra={
+                "task_id": result.task_id,
+                "upload_url": result.upload_url,
+                "uploaded_bytes": result.uploaded_bytes,
+                "messages_count": result.messages_count,
+                "task_states_count": result.task_states_count,
+            },
+        )


Presigned URL credentials written to Datadog logs

upload_url is a presigned S3/GCS PUT URL whose query parameters embed the access key, date, signature, and expiry. Logging it verbatim means any user with Datadog log access can reuse the URL to overwrite the object within its validity window. The forensic goal (knowing where the upload went) is achievable by logging only the base URL (scheme + host + path, stripping query parameters) — that uniquely identifies the storage object without exposing the credential.

Strip query params before logging: urllib.parse.urlunparse(parsed._replace(query="")) where parsed = urlparse(upload_url).

Prompt To Fix With AI

This is a comment left during a code review. Path: agentex/src/domain/services/task_retention_service.py Line: 161-170 Comment: **Presigned URL credentials written to Datadog logs** `upload_url` is a presigned S3/GCS PUT URL whose query parameters embed the access key, date, signature, and expiry. Logging it verbatim means any user with Datadog log access can reuse the URL to overwrite the object within its validity window. The forensic goal (knowing where the upload went) is achievable by logging only the base URL (scheme + host + path, stripping query parameters) — that uniquely identifies the storage object without exposing the credential. Strip query params before logging: `urllib.parse.urlunparse(parsed._replace(query=""))` where `parsed = urlparse(upload_url)`. How can I resolve this? If you propose a fix, please make it concise.

smoreinis requested a review from a team as a code owner May 19, 2026 20:12

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Comment thread agentex/src/api/routes/task_retention.py Outdated

Comment thread agentex/src/domain/services/task_retention_service.py

danielmillerp approved these changes May 21, 2026

View reviewed changes

smoreinis added 2 commits May 26, 2026 13:00

Merge branch 'main' into task-retention-endpoints

df1f0e0

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

smoreinis merged commit 42fb6e5 into main May 26, 2026
32 checks passed

smoreinis deleted the task-retention-endpoints branch May 26, 2026 23:15

Conversation

smoreinis commented May 19, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New endpoints

Design notes

Two transport modes for the snapshot

SSRF guard

Cleanup design

Schema change

Other changes

Tests

Test plan

Follow-ups (not in this PR)

Greptile Summary

Confidence Score: 4/5

Security Review

Important Files Changed

Sequence Diagram

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

Uh oh!

Uh oh!

danielmillerp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smoreinis commented May 19, 2026 •

edited by greptile-apps Bot

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading