fix(appkit): forward async handler rejections to a terminal error middleware instead of crashing#439
Open
MarioCadenas wants to merge 6 commits into
Open
fix(appkit): forward async handler rejections to a terminal error middleware instead of crashing#439MarioCadenas wants to merge 6 commits into
MarioCadenas wants to merge 6 commits into
Conversation
…s return JSON errors Route handlers registered via Plugin.route() are async, but Express 4 does not catch rejected promises from async handlers. When a handler rejected before writing a response — e.g. asUser(req) throwing AuthenticationError because the x-forwarded-access-token / x-forwarded-user headers were missing on genie/serving routes — the rejection escaped Express entirely, the request hung with no response, and the unhandledRejection crashed the Node process (default behavior since Node 15). Any direct hit without proxy headers (scanner, curl, proxy misconfig) could take the app down. Fix: - Plugin.route() now wraps handlers to catch sync throws and async rejections and forward them to Express via next(err). - ServerPlugin registers a terminal (err, req, res, next) middleware after all routes, extensions, and frontend middleware. It logs the error, guards res.headersSent, maps AppKitError to its statusCode with its message, preserves HTTP statusCode-bearing errors (masking 5xx messages in production), and maps unknown errors to 500 with a generic message in production — matching Plugin.execute() semantics. Adds unit tests for the middleware and integration tests asserting a handler that throws AuthenticationError returns 401 JSON and a generic throw returns 500 JSON, with no unhandledRejection in either case. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…logging/masking Follow-up to the terminal error middleware, addressing review findings: - Factor the async-rejection wrapper into forwardAsyncErrors (utils/ safe-handler.ts) and use it from both Plugin.route() and PluginContext applyRoute/applyMiddleware, so handlers registered through the buffering APIs no longer escape Express 4 as unhandledRejections. 4-arg error middleware is passed through unwrapped (Express detects it by arity). extend() JSDoc now states that async handlers registered directly on the app must handle their own rejections. - errorHandlerMiddleware resolves the status first and logs 4xx at warn with just the message (expected client errors), 5xx/unknown at error with the full error object. - 5xx AppKitError messages are masked as "Server error" outside development, mirroring Plugin.execute(); 4xx messages stay as-is. - hasHttpStatusCode now also requires Number.isInteger(statusCode). - Tests: pin NODE_ENV in the integration suite and scope its unhandledRejection listener to this suite's errors; add coverage for buffered-route rejections, log-level branching, and 5xx masking. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…pKitError in execute - Agents /invocations and /responses handlers now return the _handleInvoke promise so the forwardAsyncErrors wrapper applied by PluginContext.addRoute can forward rejections to the error middleware instead of letting them escape as unhandledRejection. Added an integration test locking in the returned-promise contract and a note on forwardAsyncErrors that handlers must return their promise. - Plugin.execute() now applies the same 5xx disclosure policy as the server error middleware: AppKitError messages with statusCode >= 500 are masked as "Server error" in production and preserved in development; 4xx messages are always preserved. Added unit tests covering both environments. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…eak abort detection Commit 502fd53 made Plugin.execute() mask 5xx AppKitError messages as "Server error" in production. The analytics streaming path reconstructed error semantics by string-matching the failure message ("operation was aborted", "statement was canceled", ...), so in production the mask hid the text, the abort branch never matched, and user cancellations surfaced as generic statement failures (prod-only divergence). Keep the masking (real disclosure win) and stop depending on message text instead: - Add an optional, machine-readable `code` to the failure arm of ExecutionResult. Unlike `message`, it is never masked: derived from "ABORTED" for AbortError-shaped errors, AppKitError.code, or the error class name. - Give ExecutionError.canceled() a stable EXECUTION_CANCELED code so cancellation stays distinguishable from other execution errors. - Analytics now branches on `code` first (extracted into a testable throwFromFailedSqlResult helper) and keeps message sniffing as a fallback for codeless results, so behavior is strictly no-worse. - Tests cover code survival through production masking and the cancellation -> AbortError path with masked messages. The serving plugin's startsWith("Unknown request parameters:") check was audited too: that error is thrown by resolveAndFilter() before execute(), so it never goes through the masking path and needs no change. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…code exports ExecutionError and ExecutionResult gained the machine-readable error code surface (ExecutionError.CANCELED_CODE, the constructor code option, and the optional ExecutionResult.code field). Regenerate the TypeDoc API reference so the committed docs match the package code and the docs CI check passes. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
… 5xx masking + error-code The execute() 5xx-AppKitError message masking and the ExecutionResult.code machinery (plus the ExecutionError.CANCELED_CODE / ABORTED_ERROR_CODE consts and the analytics throwFromFailedSqlResult refactor that depended on them) were scope creep unrelated to the async-rejection crash fix. They have been reverted to main's existing behavior, so this change carries no new risk. Machine-readable error codes will be addressed separately as audit item A-4. The PR now contains only the reliability fix: async route-handler rejections are forwarded to a terminal Express error middleware that returns a proper HTTP error, instead of crashing the Node process. Co-authored-by: Isaac Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Express 4 does not catch rejections from
asyncroute handlers. Any handler that rejects — a downstream Databricks API failure, a bug in first- or third-party plugin code,asUser()throwing when OBO token forwarding isn't enabled, etc. — propagated as an unhandled rejection that crashed the whole Node process and left the triggering request hanging.This change forwards both synchronous throws and async rejections from route handlers to Express's
next(err), where a terminal error middleware turns them into a proper HTTP error response (with status mapping, 4xx-warn / 5xx-error logging, aheadersSentguard, and production 5xx message masking for errors that reach the middleware).Why this is a reliability fix, not a security fix
In a normal Databricks Apps deployment the Apps proxy sits in front of the app, so the "anonymous unauthenticated request triggers a crash" path is unlikely to be reachable externally. The real value here is fault isolation: one bad handler (a plugin bug, a flaky upstream, a misconfigured OBO call) should not take down the server for every other user. This is about keeping the process alive and returning a clean error, not about authentication or authorization.
Scope
This PR was trimmed to just the async-rejection forwarding + terminal error middleware. Two unrelated threads that had been bundled in were removed and reverted to
main's existing behavior:execute()5xx-AppKitErrormessage masking.ExecutionResult.code/ExecutionError.CANCELED_CODE/ABORTED_ERROR_CODEmachinery (and the analyticsthrowFromFailedSqlResultrefactor that depended on it).Machine-readable error codes will be handled separately as audit item A-4.
Changes
utils/safe-handler.ts—forwardAsyncErrorswrapper.plugin.ts— route registration wraps handlers withforwardAsyncErrors.core/plugin-context.ts— wraps buffered routes/middleware.plugins/server/index.ts— terminal error middleware.plugins/agents/agents.ts— handler returns its promise so the wrapper can catch it.This pull request and its description were written by Isaac.