Rework telemetry event categories#268
Conversation
Split the overloaded `system` category into intent-based categories so the set a caller configures matches the set they see on events. `system` now means VM health only (oom kills, service crashes); client attach/detach lifecycle moves to `connection`, periodic screenshots to `screenshot`, captcha outcomes to `captcha`, CDP-collector health to the auto-managed `monitor`, and `api` is renamed `control`. kernel-images-api is now authoritative on category: a known event type is assigned its canonical category from a generated `CategoryForType` lookup (derived from openapi.yaml via `go:generate`), and any caller-supplied value is ignored. Unknown custom types must carry an explicit category. Behavior changes: - Nothing is force-always-on. The publish filter is "you get the categories you enabled", with `monitor` riding along automatically whenever a CDP category is captured. - Default (empty config / enabled:true) captures every category except `screenshot`, which is heavy base64 data and opt-in. - The CDP collector starts only when a CDP category is enabled, so a system-only subscriber pays no collector cost. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Created a monitoring plan for this PR. What this PR does: Expands the browser telemetry event taxonomy from 5 to 9 categories and renames the Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
Reconcile the CDP collector before committing the new config instead of after. The collector start is the only fallible step, so doing it first means a failure returns 500 without mutating the session config or middleware, for both fresh and already-active PUT/PATCH. Resolves the default category set in telemetryConfigFromOAPI so the effective categories are known at reconcile time. Addresses Bugbot review on #268. Co-authored-by: Cursor <cursoragent@cursor.com>
The prior atomicity fix reconciled the collector before committing the config, which dropped any events the collector emitted in the window before the session filter went live. Commit the config first (infallible) so the filter is active before the collector starts, then reconcile, and roll back fully to the prior config on a collector start failure. Reverting never needs a fallible collector start, so rollback cannot fail. Addresses the second Bugbot review on #268. Co-authored-by: Cursor <cursoragent@cursor.com>
The CDP collector captures a screenshot via ffmpeg on page-load and uncaught-exception events (throttled to once per 2s). With the screenshot category off by default, those captures ran and were then dropped at the telemetry filter, spending ffmpeg on output nobody receives. Pass a screenshotEnabled predicate (wired to TelemetrySession.CategoryEnabled) into the monitor and skip the capture entirely when the category is disabled. A nil predicate always captures, preserving existing test behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit b1b4bd4. Configure here.
rgarcia
left a comment
There was a problem hiding this comment.
lgtm, just one thought about control naming and whether it applies to all API calls in a way that makes sense
also remember to update openapi.yaml in control plane api once this is live
Summary
Cleans up the browser telemetry category taxonomy before it's really in customers' hands. The driving problems:
systemhad become a junk drawer (VM crashes, CDP-connection lifecycle, screenshots, and collector health all in one always-on bucket), a few events had no category and silently rodesystem, andsystemcould not be turned off.One consistent, intent-based category set
Events are categorized by what happened, and the set a caller configures equals the set they see:
console,network,page,interaction,control(wasapi),connection(CDP + live-view attach/detach),system(VM health only),screenshot,captcha, plus the auto-managedmonitor(CDP-collector health).Category is server-authoritative
kernel-images-apiowns category assignment. A known eventtypeis stamped with its canonical category from a generatedCategoryForTypelookup and any caller-supplied value is ignored; unknown custom types must carry an explicit category (400 otherwise). The lookup is generated fromopenapi.yamlviago:generate(wired intomake oapi-generate), so the spec stays the single source of truth and CI catches drift on a dirty diff.Behavior
monitorriding along automatically whenever a CDP category is captured.enabled: true) = every category exceptscreenshot(heavy base64, opt-in).system-only subscriber (e.g. "tell me about OOM/crashes, not page activity") pays no collector cost.Test plan
go build ./...,go vet ./...,gofmtcleanmake oapi-generateregeneratesoapi.go+category_gen.gocleanlylib/recorderffmpeg/pulse-audio test failures are unrelated (fail onmaintoo)Made with Cursor
Note
Medium Risk
Changes telemetry filtering, publish validation, and PUT/PATCH apply semantics across API and collectors; misconfiguration or rollback bugs could drop events or leave partial state until fixed.
Overview
This PR reworks browser telemetry categories so they match intent, are server-authoritative, and align with configurable capture (nothing is force-always-on).
Taxonomy: Replaces the old five-knob model (
api→control, splits lifecycle out ofsystem) withconnection(CDP/live view),screenshot(opt-in),captcha, and automonitor(CDP collector health, not a user toggle). OpenAPI, generatedoapi, and producers (cdpmonitor,devtoolsproxy, middleware) emit the new categories.Publish path:
CategoryForType(generated fromopenapi.yamlviacategorygen, wired intomake oapi-generate) stamps known event types; custom types must send a valid category or get 400. Clientcategoryon known types is ignored.Config & runtime: Empty PUT/PATCH resolves to
DefaultCategories(all on exceptscreenshot). Clearing requires every user category off.TelemetrySessionfilters only enabled categories and addsmonitorwhen any CDP category is on. PUT/PATCH commit config thenreconcileTelemetryState(CDP monitor +api_callmiddleware); failed collector start rolls back and can return 500 on PATCH too. Screenshots skip ffmpeg when the screenshot category is disabled.Risk: Medium — broad telemetry/API behavior change and new failure modes on config apply; instance-side only per PR notes (public SDK/docs follow separately).
Reviewed by Cursor Bugbot for commit b1b4bd4. Bugbot is set up for automated code reviews on this repo. Configure here.