Skip to content

feat: batch event ingestion on POST /track via { type: 'batch' } envelope#377

Open
mraj602-tohands wants to merge 11 commits into
Openpanel-dev:mainfrom
Tohands-Private-Limited:feat/batch-v2-upstream
Open

feat: batch event ingestion on POST /track via { type: 'batch' } envelope#377
mraj602-tohands wants to merge 11 commits into
Openpanel-dev:mainfrom
Tohands-Private-Limited:feat/batch-v2-upstream

Conversation

@mraj602-tohands

@mraj602-tohands mraj602-tohands commented May 27, 2026

Copy link
Copy Markdown

Closes #373. Supersedes #374.

Summary

POST /track accepts a new body shape alongside the existing single-event one:

{ "type": "batch", "payload": [ { "type": "track", "payload": { ... } }, ... ] }

Up to 2000 events / 10 MB per request. Each item is dispatched through the exact same per-event pipeline as a single-event request, so offline-capable SDKs (mobile, IoT, edge) can drain a local buffer in one round-trip. No new route.

Behavior

Response. Single events keep their 200 { deviceId, sessionId }. Batch envelopes return 202 { accepted, rejected[] }.

Per-row rejection, not whole-batch failure. The envelope is validated loosely (payload: unknown[], 1–2000 items); each item is safeParsed through zTrackHandlerPayload in the controller. Failures become { index, reason: 'validation' | 'internal', error } — one malformed row in a 2000-event batch doesn't fail the other 1999, and the caller can re-send just the bad indices. alias is rejected per-row with the same message as the single-event 400.

Salts + geo fetched once per request. buildSharedRequestContext does the salts query and request-IP geo lookup once; buildEventContext reuses them per item. Only events overriding __ip trigger a second geo lookup — for a 2000-event batch that's 1 PG round-trip + 1 geo lookup instead of 2000 of each.

Bounded concurrency. Items are processed 50 at a time so an unbounded Promise.all over 2000 events can't spike Redis/geo budgets on smaller self-hosted instances. Per-item index is preserved across chunks.

Caps. 2000 events / 10 MB mirrors Mixpanel /import. The 10 MB cap uses Fastify's bodyLimit; it's route-wide, so single-event posts share the same ceiling (harmless — a single event is tiny).

Batch retries aren't deduped. Batch envelopes are exempt from the request dedup hook, so an offline-first SDK retrying a whole buffered batch isn't silently dropped.

Shared dispatch. The single-event switch lives in a shared dispatchEvent (with a never exhaustiveness check) so single and batch paths can't drift.

Reimplements the batch ingestion feature from PR Openpanel-dev#374 on top of current
upstream/main (post buffer-perf, kafka client, ClickHouse round-robin).
Adds __syncedAt property to all worker-processed events.

See PR description for full architectural details.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds POST /track batch ingestion with envelope validation, pooled per-request context (geo/salts), per-item validation and dispatch, bounded concurrency (50), and aggregated 202 responses reporting accepted and per-index rejected reasons.

Changes

Batch event ingestion with per-event validation and bounded concurrency

Layer / File(s) Summary
Batch envelope validation schema
packages/validation/src/track.validation.ts
Added TRACK_BATCH_MAX_EVENTS = 2000, zTrackBatchHandlerPayload enforcing { type: 'batch', payload: unknown[] [1..2000] }, and inferred ITrackBatchHandlerPayload type.
Per-request and per-event context refactor
apps/api/src/controllers/track.controller.ts
Introduced SharedRequestContext (projectId/geo/salts) pooled once per request via buildSharedRequestContext, and buildEventContext to derive validated timestamp and identity/device/session per event with optional __ip override triggering conditional geo re-fetch.
Event routing and single-event handler
apps/api/src/controllers/track.controller.ts
Added internal dispatchEvent to route validated event types to handlers, rejecting alias and unknown types with 400. Refactored exported handler to accept batch envelopes, reject alias early for non-batch, dispatch via dispatchEvent, and return { deviceId, sessionId } for single-event requests.
Batch handler with bounded concurrency
apps/api/src/controllers/track.controller.ts
Implemented handleBatch with BATCH_CONCURRENCY = 50, per-item zTrackHandlerPayload.safeParse validation, per-item buildEventContext + dispatchEvent execution, and aggregation into 202 response with accepted count and per-index rejected[index, reason, error] (reason: 'validation' for 4xx, 'internal' otherwise).
Route schema, body limit, and response types
apps/api/src/routes/track.router.ts
Updated POST / to accept union of single-event and batch payloads with optional client credentials, set bodyLimit: TRACK_BODY_LIMIT_BYTES (10 MB), and extended OpenAPI responses to include 202 for batches returning accepted and per-item rejected[] (index, reason, error).
Test infrastructure
apps/api/src/routes/track-batch.router.test.ts
Added hoisted mocks (db/queue/geo/common-server/redis), test lifecycle setup and per-test mock cleanup, and request/payload helpers (postTrack, postBatch, factories).
Batch endpoint and validation tests
apps/api/src/routes/track-batch.router.test.ts
Auth/envelope validation (401 missing clientId, 400 on invalid payloads), happy path 202 responses with accepted/rejected, queue dispatch for track, profile upserts for identify, mixed event types, geo re-fetch on __ip, timestamp handling, device ID pass-through, and conversion of dispatch failures into per-item internal rejections.
Per-item validation and chunk-boundary regression
apps/api/src/routes/track-batch.router.test.ts
Per-index validation failures don't fail entire batch; alias and nested batch envelopes rejected per-item; all-invalid batches skip queue calls; 200-event chunk-boundary regression asserts exact rejected index preservation across concurrency chunks.
sequenceDiagram
  participant Client
  participant Handler
  participant buildSharedRequestContext
  participant geo_lookup
  participant buildEventContext
  participant dispatchEvent
  participant queueAdd
  participant upsertProfile

  Client->>Handler: POST /track (single or batch)
  Handler->>buildSharedRequestContext: extract projectId, headers, IP -> salts
  buildSharedRequestContext->>geo_lookup: lookup(request IP)
  geo_lookup-->>buildSharedRequestContext: geo
  loop per event
    Handler->>buildEventContext: sharedCtx + raw event
    alt __ip override differs
      buildEventContext->>geo_lookup: lookup(__ip)
      geo_lookup-->>buildEventContext: geo
    end
    buildEventContext-->>dispatchEvent: TrackContext
    dispatchEvent->>queueAdd: enqueue track jobs
    dispatchEvent->>upsertProfile: identify -> upsertProfile
  end
  Handler-->>Client: 200 (single) or 202 {accepted, rejected[]}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Behold, the batch arrives in one swift hop,
Two thousand events skip the network hop!
Each validated, dispatched, no whole-batch flop,
With geo pooled once—efficiency at the top!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature: batch event ingestion via a new envelope type on the existing POST /track route.
Linked Issues check ✅ Passed The PR implements all core coding requirements from #373: batch ingestion (up to 2000 events/10 MB), per-item validation/rejection, per-event timestamp handling for sessions, shared request context (salts/geo once), bounded concurrency (50), and preserved single-event regression.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to batch transport: validation schemas, controller dispatch refactor, router union, and tests. No unrelated modifications to worker, session, or domain logic.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/worker/src/jobs/events.incoming-events.test.ts (1)

553-596: 💤 Low value

Consider adding test coverage for historical events when lock is not acquired.

This test verifies historical event handling when the lock is acquired (session_start emitted). Consider adding a companion test where getLock returns false to verify that:

  1. session_start is skipped
  2. The event itself is still created
  3. sessionEnd is still not scheduled (historical path)

This would complete coverage of the historical event path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/src/jobs/events.incoming-events.test.ts` around lines 553 - 596,
Add a new test alongside the existing historical-event test that simulates the
lock NOT being acquired by mocking getLock to return false; call incomingEvent
with the same historical payload and assert that createEvent is still called for
the event (but there is no 'session_start' createEvent call), sessionsQueue.add
is not called (no sessionEnd scheduled), and that the event createEvent call
preserves deviceId/sessionId; reference the incomingEvent handler, getLock mock,
createEvent mock, and sessionsQueue.add to locate and update the test harness.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/worker/src/jobs/events.incoming-event.ts`:
- Around line 217-219: The referrer fields (enrichment.referrer,
enrichment.referrer_name, enrichment.referrer_type) use "?? undefined" which can
yield undefined instead of falling back to the base event; change them to use
the same fallback pattern as the other fields by replacing "enrichment.referrer
?? undefined" with "enrichment.referrer ?? baseEvent.referrer" and likewise
"enrichment.referrer_name ?? baseEvent.referrer_name" and
"enrichment.referrer_type ?? baseEvent.referrer_type" so string-type
expectations are preserved.

---

Nitpick comments:
In `@apps/worker/src/jobs/events.incoming-events.test.ts`:
- Around line 553-596: Add a new test alongside the existing historical-event
test that simulates the lock NOT being acquired by mocking getLock to return
false; call incomingEvent with the same historical payload and assert that
createEvent is still called for the event (but there is no 'session_start'
createEvent call), sessionsQueue.add is not called (no sessionEnd scheduled),
and that the event createEvent call preserves deviceId/sessionId; reference the
incomingEvent handler, getLock mock, createEvent mock, and sessionsQueue.add to
locate and update the test harness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91718804-8af2-4a6e-9663-17127061acf7

📥 Commits

Reviewing files that changed from the base of the PR and between c470b8b and 4a8a8c4.

📒 Files selected for processing (9)
  • apps/api/src/controllers/event.controller.ts
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • apps/api/src/utils/ids.ts
  • apps/worker/src/jobs/events.incoming-event.ts
  • apps/worker/src/jobs/events.incoming-events.test.ts
  • packages/queue/src/queues.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/queue/src/queues.ts

Comment thread apps/worker/src/jobs/events.incoming-event.ts Outdated
mraj602 and others added 2 commits May 28, 2026 05:19
Tests use toStrictEqual, so the new __syncedAt property in event
properties needs to be included in assertions. Uses expect.any(String)
since the exact ISO timestamp varies per run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hment

When session enrichment has null referrer fields, fall back to
baseEvent values instead of undefined. Consistent with all other
enrichment fields (path, os, browser, geo, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mraj602-tohands

Copy link
Copy Markdown
Author

Hi @lindesvard, just following up on this PR in case it was missed. Happy to make any changes or adjustments if needed. Would appreciate a review when you get a chance 🙂

@lindesvard

Copy link
Copy Markdown
Contributor

Hey, today our session management is very time dependent, so batching events might cause issues here. I have started working on improving this locally and will probably need to merge that before I can merge this (this will avoid your changes in the incoming-events file etc)

Secondly I think we should just piggy back on our existing /track endpoint. It takes a generic body

{
  "type": "batch",
  "payload": [{event}, {event}, {event}]
}

This would make proxying etc easier since we dont need to add yet another endpoint to proxy.

So what I would like you todo is to:

  1. Remove the batch endpoint and use our existing track endpoint
  2. Remove changes in the incoming events file (that will be solved in my local work to handle sessions better)

mraj602 and others added 4 commits June 4, 2026 18:46
…changes

Per review feedback on Openpanel-dev#377:
- Remove the separate /track/batch endpoint; POST /track now accepts
  { "type": "batch", "payload": [event, ...] } alongside single events,
  so proxies only ever deal with one endpoint.
- Revert all worker/session changes (events.incoming-event.ts, queue
  payload, ids.ts, event.controller.ts) back to upstream behavior —
  session handling improvements will land separately.
- Batch items reuse the existing per-event pipeline unchanged, including
  the isTimestampFromThePast flag semantics; per-item validation errors
  are reported per-index in rejected[] without failing the whole batch.
- Raise the /track body limit to 10 MB to fit max-size batches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tch arms

Adds batch coverage for group/assign_group/increment/decrement/replay
dispatch, per-event __ip geo override, non-object items, single-event
alias 400, and queue-failure internal rejections.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mraj602-tohands mraj602-tohands changed the title feat: bulk event ingestion (POST /track/batch) for offline-first SDKs feat: batch event ingestion on POST /track via { type: 'batch' } envelope Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/api/src/controllers/track.controller.ts (2)

594-606: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Map rejected[] to the public response shape instead of returning internal state.

This currently pushes the full BatchItemResult objects into the response, so each rejected item includes status: 'rejected'. apps/api/src/routes/track.router.ts documents the 202 body as { index, reason, error }, so the controller should emit that exact shape instead of relying on serializer behavior.

Suggested fix
   let accepted = 0;
-  const rejected: Extract<BatchItemResult, { status: 'rejected' }>[] = [];
+  const rejected: Array<{
+    index: number;
+    reason: 'validation' | 'internal';
+    error: string;
+  }> = [];
   for (const result of results) {
     if (result.status === 'accepted') {
       accepted += 1;
     } else {
-      rejected.push(result);
+      rejected.push({
+        index: result.index,
+        reason: result.reason,
+        error: result.error,
+      });
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 594 - 606, The
current loop pushes full BatchItemResult objects into the rejected array and
returns them in the 202 response; update the controller (the loop that iterates
over results and the rejected array sent in reply.status(202).send) to map each
rejected item to the public shape { index, reason, error } instead of the
internal object (i.e., when result.status !== 'accepted' push or collect {
index: result.index, reason: result.reason, error: result.error }), then send
that mapped array in the response so it matches the documented contract.

156-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the “no user-agent => no identity” behavior.

apps/api/src/utils/ids.ts explicitly returns empty deviceId/sessionId when ua is missing. Replacing a missing header with 'unknown/1.0' here changes that contract and can collapse unrelated server-side traffic behind the same IP into the same device/session.

Suggested fix
 interface SharedRequestContext {
   projectId: string;
   requestIp: string;
-  requestUa: string;
+  requestUa?: string;
   requestHeaders: Record<string, string | undefined>;
   requestGeo: GeoLocation;
   salts: Salts;
 }
@@
-  const requestIp = request.clientIp;
-  const requestUa = request.headers['user-agent'] ?? 'unknown/1.0';
-  const requestHeaders = getStringHeaders(request.headers);
+  const requestIp = request.clientIp;
+  const requestHeaders = getStringHeaders(request.headers);
+  const requestUa = requestHeaders['user-agent'];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 156 - 158, The
code currently forces a default user-agent string by setting requestUa =
request.headers['user-agent'] ?? 'unknown/1.0', which breaks the contract in
ids.ts that expects a missing UA to yield empty deviceId/sessionId; change this
to preserve "no UA => no identity" by leaving requestUa undefined/null when the
header is absent (e.g., requestUa = request.headers['user-agent'] as string |
undefined) and ensure downstream uses of requestUa (the identity generation code
that calls into ids.ts) continue to treat undefined/empty as "no UA" rather than
a placeholder.
🧹 Nitpick comments (2)
apps/api/src/routes/track.router.ts (1)

37-37: 💤 Low value

Consider splitting the long description for readability.

The description string is comprehensive but spans a very long single line. While this works fine for OpenAPI documentation, breaking it into a template literal with multiple lines would improve source readability.

♻️ Optional refactor for readability
-      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events ({ "type": "batch", "payload": [event, ...] }). Batch requests accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request; each event is dispatched through the same pipeline as a single-event request. Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
+      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events.
+        Batch requests use { "type": "batch", "payload": [event, ...] } and accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request.
+        Each event is dispatched through the same pipeline as a single-event request.
+        Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track.router.ts` at line 37, The long single-line OpenAPI
description on the route config should be split across multiple template-literal
lines for readability: update the description property in track.router.ts (the
route's description string that references TRACK_BATCH_MAX_EVENTS) to a
multi-line template literal or joined string segments so the content flows
across lines while preserving the embedded ${TRACK_BATCH_MAX_EVENTS}
interpolation and exact wording; locate the description key in the route
definition and break the string into readable lines (or an array .join(''))
without changing the text or interpolation semantics.
apps/api/src/routes/track-batch.router.test.ts (1)

204-359: ⚡ Quick win

Consider adding test coverage for the 5-day historical floor.

The test suite covers historical timestamps (two hours ago), but the PR objectives state: "Enforce a 5-day historical acceptance window" with "older events rejected per-row." There's no test verifying that events with timestamps older than 5 days are rejected with a per-item validation error.

📋 Suggested test case
it('rejects events older than 5 days per-item', async () => {
  const sixDaysAgo = Date.now() - 6 * 24 * 60 * 60 * 1000;
  const res = await postBatch([
    validTrack('recent'),
    {
      type: 'track' as const,
      payload: {
        name: 'ancient_event',
        properties: { __timestamp: new Date(sixDaysAgo).toISOString() },
      },
    },
  ]);
  expect(res.statusCode).toBe(202);
  const body = res.json();
  expect(body.accepted).toBe(1);
  expect(body.rejected).toHaveLength(1);
  expect(body.rejected[0]).toMatchObject({
    index: 1,
    reason: 'validation',
  });
  expect(body.rejected[0].error).toMatch(/timestamp|past|day/i);
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track-batch.router.test.ts` around lines 204 - 359, Add a
test that verifies the 5-day historical floor by calling postBatch with one
valid track (use validTrack) and one track whose payload.properties.__timestamp
is older than 5 days (e.g., Date.now() - 6*24*60*60*1000), then assert the
response is 202, accepted is 1, rejected length is 1, and the rejected item
matches { index: 1, reason: 'validation' } and its error message matches a
timestamp/past/day regex; name the test "rejects events older than 5 days
per-item" and place it alongside the other batch tests so it exercises the same
pipeline (postBatch, validTrack).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 109-128: The change removed the 5-day cutoff and now accepts
arbitrarily old __timestamp values; reintroduce a FIVE_DAYS_MS constant (e.g., 5
* 24 * 60 * 60 * 1000) and add an early check using clientTimestampNumber and
safeTimestamp: if clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS, return
the per-item rejection path required by the batch contract (the same
rejection/validation response your controller uses elsewhere) before the
existing ONE_MINUTE_MS and FIFTEEN_MINUTES_MS logic so old events are rejected
instead of persisted.

---

Outside diff comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 594-606: The current loop pushes full BatchItemResult objects into
the rejected array and returns them in the 202 response; update the controller
(the loop that iterates over results and the rejected array sent in
reply.status(202).send) to map each rejected item to the public shape { index,
reason, error } instead of the internal object (i.e., when result.status !==
'accepted' push or collect { index: result.index, reason: result.reason, error:
result.error }), then send that mapped array in the response so it matches the
documented contract.
- Around line 156-158: The code currently forces a default user-agent string by
setting requestUa = request.headers['user-agent'] ?? 'unknown/1.0', which breaks
the contract in ids.ts that expects a missing UA to yield empty
deviceId/sessionId; change this to preserve "no UA => no identity" by leaving
requestUa undefined/null when the header is absent (e.g., requestUa =
request.headers['user-agent'] as string | undefined) and ensure downstream uses
of requestUa (the identity generation code that calls into ids.ts) continue to
treat undefined/empty as "no UA" rather than a placeholder.

---

Nitpick comments:
In `@apps/api/src/routes/track-batch.router.test.ts`:
- Around line 204-359: Add a test that verifies the 5-day historical floor by
calling postBatch with one valid track (use validTrack) and one track whose
payload.properties.__timestamp is older than 5 days (e.g., Date.now() -
6*24*60*60*1000), then assert the response is 202, accepted is 1, rejected
length is 1, and the rejected item matches { index: 1, reason: 'validation' }
and its error message matches a timestamp/past/day regex; name the test "rejects
events older than 5 days per-item" and place it alongside the other batch tests
so it exercises the same pipeline (postBatch, validTrack).

In `@apps/api/src/routes/track.router.ts`:
- Line 37: The long single-line OpenAPI description on the route config should
be split across multiple template-literal lines for readability: update the
description property in track.router.ts (the route's description string that
references TRACK_BATCH_MAX_EVENTS) to a multi-line template literal or joined
string segments so the content flows across lines while preserving the embedded
${TRACK_BATCH_MAX_EVENTS} interpolation and exact wording; locate the
description key in the route definition and break the string into readable lines
(or an array .join('')) without changing the text or interpolation semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fd25152-7171-48d5-a4a0-65a06d2b317f

📥 Commits

Reviewing files that changed from the base of the PR and between 6149a20 and 74c229f.

📒 Files selected for processing (4)
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/validation/src/track.validation.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/api/src/controllers/track.controller.ts (2)

594-606: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Map rejected[] to the public response shape instead of returning internal state.

This currently pushes the full BatchItemResult objects into the response, so each rejected item includes status: 'rejected'. apps/api/src/routes/track.router.ts documents the 202 body as { index, reason, error }, so the controller should emit that exact shape instead of relying on serializer behavior.

Suggested fix
   let accepted = 0;
-  const rejected: Extract<BatchItemResult, { status: 'rejected' }>[] = [];
+  const rejected: Array<{
+    index: number;
+    reason: 'validation' | 'internal';
+    error: string;
+  }> = [];
   for (const result of results) {
     if (result.status === 'accepted') {
       accepted += 1;
     } else {
-      rejected.push(result);
+      rejected.push({
+        index: result.index,
+        reason: result.reason,
+        error: result.error,
+      });
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 594 - 606, The
current loop pushes full BatchItemResult objects into the rejected array and
returns them in the 202 response; update the controller (the loop that iterates
over results and the rejected array sent in reply.status(202).send) to map each
rejected item to the public shape { index, reason, error } instead of the
internal object (i.e., when result.status !== 'accepted' push or collect {
index: result.index, reason: result.reason, error: result.error }), then send
that mapped array in the response so it matches the documented contract.

156-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the “no user-agent => no identity” behavior.

apps/api/src/utils/ids.ts explicitly returns empty deviceId/sessionId when ua is missing. Replacing a missing header with 'unknown/1.0' here changes that contract and can collapse unrelated server-side traffic behind the same IP into the same device/session.

Suggested fix
 interface SharedRequestContext {
   projectId: string;
   requestIp: string;
-  requestUa: string;
+  requestUa?: string;
   requestHeaders: Record<string, string | undefined>;
   requestGeo: GeoLocation;
   salts: Salts;
 }
@@
-  const requestIp = request.clientIp;
-  const requestUa = request.headers['user-agent'] ?? 'unknown/1.0';
-  const requestHeaders = getStringHeaders(request.headers);
+  const requestIp = request.clientIp;
+  const requestHeaders = getStringHeaders(request.headers);
+  const requestUa = requestHeaders['user-agent'];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 156 - 158, The
code currently forces a default user-agent string by setting requestUa =
request.headers['user-agent'] ?? 'unknown/1.0', which breaks the contract in
ids.ts that expects a missing UA to yield empty deviceId/sessionId; change this
to preserve "no UA => no identity" by leaving requestUa undefined/null when the
header is absent (e.g., requestUa = request.headers['user-agent'] as string |
undefined) and ensure downstream uses of requestUa (the identity generation code
that calls into ids.ts) continue to treat undefined/empty as "no UA" rather than
a placeholder.
🧹 Nitpick comments (2)
apps/api/src/routes/track.router.ts (1)

37-37: 💤 Low value

Consider splitting the long description for readability.

The description string is comprehensive but spans a very long single line. While this works fine for OpenAPI documentation, breaking it into a template literal with multiple lines would improve source readability.

♻️ Optional refactor for readability
-      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events ({ "type": "batch", "payload": [event, ...] }). Batch requests accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request; each event is dispatched through the same pipeline as a single-event request. Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
+      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events.
+        Batch requests use { "type": "batch", "payload": [event, ...] } and accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request.
+        Each event is dispatched through the same pipeline as a single-event request.
+        Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track.router.ts` at line 37, The long single-line OpenAPI
description on the route config should be split across multiple template-literal
lines for readability: update the description property in track.router.ts (the
route's description string that references TRACK_BATCH_MAX_EVENTS) to a
multi-line template literal or joined string segments so the content flows
across lines while preserving the embedded ${TRACK_BATCH_MAX_EVENTS}
interpolation and exact wording; locate the description key in the route
definition and break the string into readable lines (or an array .join(''))
without changing the text or interpolation semantics.
apps/api/src/routes/track-batch.router.test.ts (1)

204-359: ⚡ Quick win

Consider adding test coverage for the 5-day historical floor.

The test suite covers historical timestamps (two hours ago), but the PR objectives state: "Enforce a 5-day historical acceptance window" with "older events rejected per-row." There's no test verifying that events with timestamps older than 5 days are rejected with a per-item validation error.

📋 Suggested test case
it('rejects events older than 5 days per-item', async () => {
  const sixDaysAgo = Date.now() - 6 * 24 * 60 * 60 * 1000;
  const res = await postBatch([
    validTrack('recent'),
    {
      type: 'track' as const,
      payload: {
        name: 'ancient_event',
        properties: { __timestamp: new Date(sixDaysAgo).toISOString() },
      },
    },
  ]);
  expect(res.statusCode).toBe(202);
  const body = res.json();
  expect(body.accepted).toBe(1);
  expect(body.rejected).toHaveLength(1);
  expect(body.rejected[0]).toMatchObject({
    index: 1,
    reason: 'validation',
  });
  expect(body.rejected[0].error).toMatch(/timestamp|past|day/i);
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track-batch.router.test.ts` around lines 204 - 359, Add a
test that verifies the 5-day historical floor by calling postBatch with one
valid track (use validTrack) and one track whose payload.properties.__timestamp
is older than 5 days (e.g., Date.now() - 6*24*60*60*1000), then assert the
response is 202, accepted is 1, rejected length is 1, and the rejected item
matches { index: 1, reason: 'validation' } and its error message matches a
timestamp/past/day regex; name the test "rejects events older than 5 days
per-item" and place it alongside the other batch tests so it exercises the same
pipeline (postBatch, validTrack).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 109-128: The change removed the 5-day cutoff and now accepts
arbitrarily old __timestamp values; reintroduce a FIVE_DAYS_MS constant (e.g., 5
* 24 * 60 * 60 * 1000) and add an early check using clientTimestampNumber and
safeTimestamp: if clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS, return
the per-item rejection path required by the batch contract (the same
rejection/validation response your controller uses elsewhere) before the
existing ONE_MINUTE_MS and FIFTEEN_MINUTES_MS logic so old events are rejected
instead of persisted.

---

Outside diff comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 594-606: The current loop pushes full BatchItemResult objects into
the rejected array and returns them in the 202 response; update the controller
(the loop that iterates over results and the rejected array sent in
reply.status(202).send) to map each rejected item to the public shape { index,
reason, error } instead of the internal object (i.e., when result.status !==
'accepted' push or collect { index: result.index, reason: result.reason, error:
result.error }), then send that mapped array in the response so it matches the
documented contract.
- Around line 156-158: The code currently forces a default user-agent string by
setting requestUa = request.headers['user-agent'] ?? 'unknown/1.0', which breaks
the contract in ids.ts that expects a missing UA to yield empty
deviceId/sessionId; change this to preserve "no UA => no identity" by leaving
requestUa undefined/null when the header is absent (e.g., requestUa =
request.headers['user-agent'] as string | undefined) and ensure downstream uses
of requestUa (the identity generation code that calls into ids.ts) continue to
treat undefined/empty as "no UA" rather than a placeholder.

---

Nitpick comments:
In `@apps/api/src/routes/track-batch.router.test.ts`:
- Around line 204-359: Add a test that verifies the 5-day historical floor by
calling postBatch with one valid track (use validTrack) and one track whose
payload.properties.__timestamp is older than 5 days (e.g., Date.now() -
6*24*60*60*1000), then assert the response is 202, accepted is 1, rejected
length is 1, and the rejected item matches { index: 1, reason: 'validation' }
and its error message matches a timestamp/past/day regex; name the test "rejects
events older than 5 days per-item" and place it alongside the other batch tests
so it exercises the same pipeline (postBatch, validTrack).

In `@apps/api/src/routes/track.router.ts`:
- Line 37: The long single-line OpenAPI description on the route config should
be split across multiple template-literal lines for readability: update the
description property in track.router.ts (the route's description string that
references TRACK_BATCH_MAX_EVENTS) to a multi-line template literal or joined
string segments so the content flows across lines while preserving the embedded
${TRACK_BATCH_MAX_EVENTS} interpolation and exact wording; locate the
description key in the route definition and break the string into readable lines
(or an array .join('')) without changing the text or interpolation semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fd25152-7171-48d5-a4a0-65a06d2b317f

📥 Commits

Reviewing files that changed from the base of the PR and between 6149a20 and 74c229f.

📒 Files selected for processing (4)
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/validation/src/track.validation.ts
🛑 Comments failed to post (1)
apps/api/src/controllers/track.controller.ts (1)

109-128: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the 5-day rejection path for historical events.

This now accepts any past __timestamp and only marks it as historical after 15 minutes. That breaks the batch contract in this PR and will persist arbitrarily old events instead of rejecting them per item.

Suggested fix
   const ONE_MINUTE_MS = 60 * 1000;
+  const FIVE_DAYS_MS = 5 * 24 * 60 * ONE_MINUTE_MS;
   const FIFTEEN_MINUTES_MS = 15 * ONE_MINUTE_MS;

   // Use safeTimestamp if invalid or more than 1 minute in the future
   if (
     Number.isNaN(clientTimestampNumber) ||
     clientTimestampNumber > safeTimestamp + ONE_MINUTE_MS
   ) {
     return { timestamp: safeTimestamp, isTimestampFromThePast: false };
   }
+
+  if (clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS) {
+    throw new HttpError('Timestamp is older than 5 days', { status: 400 });
+  }

   // isTimestampFromThePast is true only if timestamp is older than 15 minutes
   const isTimestampFromThePast =
     clientTimestampNumber < safeTimestamp - FIFTEEN_MINUTES_MS;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

  // Constants for time validation
  const ONE_MINUTE_MS = 60 * 1000;
  const FIVE_DAYS_MS = 5 * 24 * 60 * ONE_MINUTE_MS;
  const FIFTEEN_MINUTES_MS = 15 * ONE_MINUTE_MS;

  // Use safeTimestamp if invalid or more than 1 minute in the future
  if (
    Number.isNaN(clientTimestampNumber) ||
    clientTimestampNumber > safeTimestamp + ONE_MINUTE_MS
  ) {
    return { timestamp: safeTimestamp, isTimestampFromThePast: false };
  }

  if (clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS) {
    throw new HttpError('Timestamp is older than 5 days', { status: 400 });
  }

  // isTimestampFromThePast is true only if timestamp is older than 15 minutes
  const isTimestampFromThePast =
    clientTimestampNumber < safeTimestamp - FIFTEEN_MINUTES_MS;

  return {
    timestamp: clientTimestampNumber,
    isTimestampFromThePast,
  };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 109 - 128, The
change removed the 5-day cutoff and now accepts arbitrarily old __timestamp
values; reintroduce a FIVE_DAYS_MS constant (e.g., 5 * 24 * 60 * 60 * 1000) and
add an early check using clientTimestampNumber and safeTimestamp: if
clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS, return the per-item
rejection path required by the batch contract (the same rejection/validation
response your controller uses elsewhere) before the existing ONE_MINUTE_MS and
FIFTEEN_MINUTES_MS logic so old events are rejected instead of persisted.

@mraj602-tohands

Copy link
Copy Markdown
Author

@lindesvard Reworked as requested:

  1. /track/batch is gone — batch now piggybacks on POST /track as { "type": "batch", "payload": [...] }. Single-event requests are untouched (same 200 response); envelopes return 202 with { accepted, rejected[] } for per-row errors.
  2. All worker/incoming-events changes removed — the diff is now 4 API-side files (validation, controller, router, tests). getTimestamp/session semantics are exactly as on main, so your session rework won't collide with this; batched historical events will pick it up automatically since they go through the same per-event pipeline.

Happy to adjust the caps (2000 events / 10 MB) or the response shape if you'd prefer different numbers.

@lindesvard

Copy link
Copy Markdown
Contributor

I have merged my PR now, I see that you have a new merge conflict (probably after my PR)

mraj602 and others added 4 commits June 12, 2026 14:48
Resolve the single conflict in track.controller.ts caused by upstream's
new session management (Openpanel-dev#393):

- Adopt the PR's shared `dispatchEvent(validatedBody, context)` call in
  `handler` (keeps single-event and batch paths from drifting).
- Fold main's `replay` session-id backcompat into `dispatchEvent`'s replay
  case: prefer the SDK-echoed `payload.sessionId`, else the server-derived
  `context.sessionId` — mirroring the single-event /track behaviour main
  introduced.

The merge also adopts main's getDeviceId(eventTimeMs) + getOverrideDeviceId
into buildEventContext (no batch behaviour change). getTimestamp is identical
to main; no 5-day floor reintroduced (this PR remains transport-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 202 batch response pushed the internal BatchItemResult (carrying
status: 'rejected') into rejected[]. The route's 202 schema already
strips the extra field on the wire, but this makes the controller emit
the documented shape directly instead of relying on serializer behaviour.

Addresses CodeRabbit's review on PR Openpanel-dev#377. No wire change: existing SDK
consumers (openpanel_flutter, lvgl-cpp-example) read only index/reason/
error and ignore status.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After merging main's new session management, POST /track runs the
`duplicateHook` (a 100 ms body-hash dedup) as a preValidation hook. It
only exempted `replay`, so a `{ type: 'batch' }` envelope was subject to
dedup: an offline-first SDK retrying a whole batch within the window would
get the entire batch silently dropped and answered with `200 'Duplicate
event'` instead of the `202 { accepted, rejected }` shape its consumers
parse.

Skip the dedup for batch envelopes too (origin-bearing clients only ever
triggered it; server-side IoT traffic without an origin header was already
safe). Adds a unit test covering batch/replay skips and the single-event
dedup path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- The 10 MB bodyLimit is route-wide; correct the comment that implied
  single events were unaffected (their cap is raised above Fastify's
  default — harmless, but worth stating accurately).
- Break the long single-line OpenAPI description into multiple lines for
  readability (no wording/interpolation change). Addresses CodeRabbit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mraj602-tohands

Copy link
Copy Markdown
Author

Rebased onto current main and pushed a few follow-ups so this is clean to merge:

  • Merge with main (new session management, feat: new session management #393). The one conflict was the single-event dispatch in handler; resolved by keeping the shared dispatchEvent(...) and folding feat: new session management #393's replay session-id backcompat into it, so single and batch paths stay in sync.
  • rejected[] shape. Now emits the documented { index, reason, error } directly instead of relying on the response serializer to strip the internal status field.
  • Dedup hook. POST /track runs feat: new session management #393's duplicateHook (100 ms body-hash dedup), which only exempted replay. Batch envelopes are now exempted too — otherwise an offline-first SDK retrying a whole batch could have it silently dropped and answered with 200 'Duplicate event' instead of the 202 { accepted, rejected } shape. Covered by a new duplicate.hook.test.ts.

On the #373 acceptance criteria around historical sessions / the 5-day floor: this PR is intentionally transport-only — those were deferred to the session-management rework, which has since landed as #393. Because the PR now sits on top of #393, batched historical events already get a deterministic session derived from their own __timestamp (each event flows through getDeviceId({ ..., eventTimeMs })). The one piece not included is the hard 5-day rejection floormain's getTimestamp has no such floor, so adding it here would be a behavioral change beyond transport. Happy to add it as a follow-up if you'd like it enforced at ingestion.

@mraj602-tohands

mraj602-tohands commented Jun 12, 2026

Copy link
Copy Markdown
Author

@lindesvard resolved the conflict and rebased on top of #393 — should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bulk event ingestion (POST /track/batch) for offline-first SDKs — IoT, mobile, edge

3 participants