Skip to content

perf(mcp): per-server tool cache + surface OAuth start errors#4691

Merged
waleedlatif1 merged 8 commits into
stagingfrom
waleedlatif1/mcp-oauth
May 21, 2026
Merged

perf(mcp): per-server tool cache + surface OAuth start errors#4691
waleedlatif1 merged 8 commits into
stagingfrom
waleedlatif1/mcp-oauth

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Cache MCP tool discovery per-(workspaceId, serverId) instead of one entry per workspace, so a slow or broken server can't invalidate every other server's cached tools (previously one timeout on Exa stalled the whole MCP Tools page on "Loading…")
  • Surface the actual authorization-server error in the OAuth start route instead of the generic "Failed to start OAuth flow" toast (parses the SDK's Raw body: envelope, falls back gracefully when format doesn't match)
  • Added 7 unit tests covering cache hit/miss, partial failure isolation, forceRefresh, OAuth-pending soft skip, empty workspace, clearCache, and cross-workspace isolation

Type of Change

  • Bug fix
  • Performance improvement

Testing

Tested manually on staging against Exa (timing out), PlanetScale (healthy), and semrush (OAuth-pending) — confirmed healthy servers render instantly from cache while broken ones retry. All 265 MCP unit tests pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 21, 2026 3:57am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 21, 2026

PR Summary

Medium Risk
Medium risk because it changes MCP tool discovery caching/invalidation behavior and modifies error messaging in the OAuth start API, which could affect UI freshness and troubleshooting if edge cases are missed.

Overview
MCP tool discovery is now cached per (workspaceId, serverId) instead of a single workspace-wide entry, so one slow/failing server no longer blocks or invalidates cached tools from healthy servers; forceRefresh bypasses all per-server caches and list-change events only invalidate the affected server.

OAuth start error handling now surfaces safe, user-facing authorization-server messages via a new surfaceOauthError helper (including parsing vendor Raw body envelopes and truncation), while non-OAuth/internal failures return the generic Failed to start OAuth flow to avoid leaking infrastructure details.

Adds focused unit tests for per-server caching behavior (hits, partial failures, OAuth-pending soft skip, clearCache, cross-workspace isolation) and for OAuth error surfacing/non-leakage.

Reviewed by Cursor Bugbot for commit e7eb5f2. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR refactors the MCP tool-discovery cache from a single per-workspace entry to a per-(workspaceId, serverId) entry, so a slow or broken server can no longer block every other server's cached tools. It also improves the OAuth start route by extracting vendor error messages from the SDK's Raw body: envelope rather than returning a generic toast for all failures.

  • Per-server caching (service.ts): discoverTools now writes and reads individual cache keys per server using a DiscoveryOutcome discriminated union; failed servers are excluded without invalidating healthy-server caches; clearCache(workspaceId) enumerates DB rows to delete each key individually.
  • OAuth error surfacing (route.ts): surfaceOauthError differentiates spec-compliant OAuthError subclasses (RFC 6749 fields) from ServerError wrappers (Raw body JSON parsing), truncates at 250 chars, and only surfaces OAuth-originating errors to the client; DB/network errors still get the generic message.
  • Tests (service.test.ts, route.test.ts): 7 new unit tests cover cache hit/miss, partial-failure isolation, forceRefresh, OAuth-pending soft-skip, empty workspace, clearCache, cross-workspace isolation, and error-surfacing behavior.

Confidence Score: 5/5

Safe to merge — the caching refactor is well-isolated, non-OAuth errors still receive the generic client message, and 7 new tests cover the key paths.

The per-server cache change is a targeted replacement of one Map key strategy with another; all branches (cache hit, fetch, OAuth-pending, error) are covered by new tests and the discriminated union makes the control flow easy to follow. The OAuth error-surfacing change is correctly guarded so only OAuthError instances reach surfaceOauthError, while DB and network errors stay behind the generic fallback. No existing behavior is regressed.

No files require special attention; the previous review thread and outside-diff comment capture the only two residual observations.

Important Files Changed

Filename Overview
apps/sim/lib/mcp/service.ts Per-server cache keys, DiscoveryOutcome discriminated union, awaited cache writes before return, and fire-and-forget status updates. clearCache(workspaceId) now makes a DB query to enumerate server IDs (noted in previous review). Connection-manager subscribe callback now directly deletes the specific server key without a DB round-trip.
apps/sim/app/api/mcp/oauth/start/route.ts Adds exported surfaceOauthError to parse vendor error bodies and RFC 6749 fields; outer catch guards non-OAuth errors behind a generic message. The fallback error.message.split newline for ServerErrors without a Raw-body envelope is discussed in the existing review thread.
apps/sim/lib/mcp/service.test.ts New test file with 7 tests covering all major branches of the per-server cache change; mocks are well-isolated; uses beforeEach to clear the singleton cache between tests.
apps/sim/app/api/mcp/oauth/start/route.test.ts Extends existing OAuth route tests with a DB-error non-leak assertion and a standalone surfaceOauthError describe block covering typed errors, Raw body parsing, truncation, and non-Error fallbacks.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[discoverTools called] --> B[getWorkspaceServers]
    B --> C{servers empty?}
    C -- yes --> D[return empty array]
    C -- no --> E[Promise.all over servers]

    E --> F{forceRefresh?}
    F -- no --> G{cache hit for\nworkspace:id:server:id?}
    G -- hit --> H[DiscoveryOutcome: cached]
    G -- miss --> I[resolveConfigEnvVars + createClient + listTools]
    F -- yes --> I

    I -- success --> J[DiscoveryOutcome: fetched]
    I -- OAuthError / UnauthorizedError --> K[DiscoveryOutcome: oauth-pending]
    I -- other error --> L[DiscoveryOutcome: error]

    H --> M[aggregate allTools]
    J --> N[write per-server cache key\nawait Promise.allSettled cacheWrites]
    J --> M
    K --> O[fire DB update: status=disconnected]
    L --> P[fire DB update: status=failed\nincrement failedCount]

    N --> M
    O --> Q[deferredSideEffects fire-and-forget]
    P --> Q

    M --> R[await cacheWrites]
    R --> S[fire deferredSideEffects]
    S --> T[mcpConnectionManager.connect for fetched servers]
    T --> U[return allTools]
Loading

Reviews (4): Last reviewed commit: "chore(mcp): tighten clearCache comment t..." | Re-trigger Greptile

Comment thread apps/sim/app/api/mcp/oauth/start/route.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/mcp/oauth/start/route.ts Outdated
Comment thread apps/sim/lib/mcp/service.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/mcp/service.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit e7eb5f2. Configure here.

@waleedlatif1 waleedlatif1 merged commit d730015 into staging May 21, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/mcp-oauth branch May 21, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant