Skip to content

auth describe: try both verification endpoints before reporting failure#5512

Open
simonfaltum wants to merge 4 commits into
mainfrom
simonfaltum/describe-try-both
Open

auth describe: try both verification endpoints before reporting failure#5512
simonfaltum wants to merge 4 commits into
mainfrom
simonfaltum/describe-try-both

Conversation

@simonfaltum

@simonfaltum simonfaltum commented Jun 9, 2026

Copy link
Copy Markdown
Member

Why

databricks auth describe makes a single verification call based on the client type it resolved (workspace client: CurrentUser.Me, account client: Workspaces.List). If that one call fails, describe prints "Unable to authenticate" even when the credentials are perfectly valid. A profile with an accounts host plus both account_id and workspace_id (older logins wrote this shape) resolves to a workspace client. The token works, but CurrentUser.Me against the accounts host always fails with HTTP 400, so describe reports failure while databricks auth token succeeds.

Fixes #5479.

Changes

Before, describe gave up after one failed verification call; now it tries the other endpoint and only reports failure when neither proves the credentials.

If the first verification call fails, describe builds the other client type from the same resolved config (non-interactively, over the same config pointer) and tries its verification call. Account branch falls back to CurrentUser.Me; workspace branch falls back to Workspaces.List when an account_id is configured (account clients require one). If the fallback succeeds, describe reports success with the matching fields (username from Me, account ID for account-side success). If both calls fail, describe reports the first error. Success paths still make exactly one call.

An earlier revision also treated HTTP 403 on both checks as proof of authentication (to cover non-admin users on account hosts, where Workspaces.List is admin-only). That heuristic was dropped: 403 also comes from network-level gates such as IP access lists and private link, which can answer before credentials are validated, so it is not reliable proof. For non-admin account users, describe keeps reporting the underlying permission error as-is.

The output templates are unchanged.

Test plan

  • Unit tests in cmd/auth/describe_test.go: workspace check fails then account check succeeds; account check fails then workspace check succeeds; both fail reports the first error; no second call without an account_id
  • Acceptance test acceptance/cmd/auth/describe/account-host-with-workspace-id: end-to-end reproduction of the issue (workspace client on an account host, Me returns 400, account fallback succeeds)
  • Existing describe unit and acceptance tests pass unchanged (happy paths still make exactly one verification call)
  • ./task checks

This pull request and its description were written by Isaac.

getAuthStatus made exactly one verification call based on the client type
MustAnyClient picked (account client -> Workspaces.List, workspace client ->
CurrentUser.Me) and reported "Unable to authenticate" when that call
failed, even when the credentials were valid. Account console profiles that
also carry a workspace_id resolve to a workspace client, and CurrentUser.Me
always fails against the accounts host (#5479).

If the first verification call fails, build the other client type from the
same resolved config (non-interactively, over the same config pointer) and
try its verification call before reporting failure. If both fail, report the
first error. Success paths still make exactly one call.

Co-authored-by: Isaac
A 403 response means the server authenticated the caller and refused the
operation; invalid credentials produce a 401. When both verification
endpoints fail but at least one failure is an HTTP 403 API error, report
success instead of failure.

This makes describe truthful for non-admin users on account hosts, where
Workspaces.List is account-admin-only (returns 403) and the account console
cannot serve CurrentUser.Me (returns 400), so both checks fail even though
the credentials are perfectly valid. Username stays empty in this case; the
output template omits it.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Commit: 5d3fa86

Run: 27234296810

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 930 8:43
🟨​ aws windows 7 15 263 928 15:20
💚​ aws-ucws linux 7 15 357 844 8:58
💚​ aws-ucws windows 7 15 359 842 12:29
💚​ azure linux 1 17 264 928 7:14
💚​ azure windows 1 17 266 926 11:08
💚​ azure-ucws linux 1 17 362 840 11:12
🔄​ azure-ucws windows 2 1 17 362 838 13:02
💚​ gcp linux 1 17 260 931 8:09
💚​ gcp windows 1 17 262 929 11:12
24 interesting tests: 15 SKIP, 7 KNOWN, 2 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestFsCpFileToDir ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p
🔄​ TestFsCpFileToDir/local_to_uc-volumes 🙈​s 🙈​s ✅​p ✅​p 🙈​s 🙈​s ✅​p 🔄​f 🙈​s 🙈​s
Top 28 slowest tests (at least 2 minutes):
duration env testname
7:05 aws-ucws windows TestAccept
6:16 gcp windows TestAccept
6:16 azure windows TestAccept
6:08 azure-ucws windows TestAccept
4:19 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:17 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:59 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:55 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:33 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:20 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:15 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:12 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:12 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:09 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:06 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:02 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:01 gcp linux TestAccept
2:58 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:58 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:54 azure linux TestAccept
2:53 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:50 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:46 azure-ucws linux TestAccept
2:44 aws-ucws linux TestAccept
2:43 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:41 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:35 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:29 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

A 403 does not only come from authz denials: network-level gates (IP access
lists, private link) also answer 403, and on workspace hosts those can fire
before the token is validated, so treating 403 as proof of authentication
can report success for invalid credentials. Report the verification failure
as-is instead. Non-admin account users are covered by the server-side
list-workspaces authz rollout, which already allows non-admins on GCP and
AWS public prod.

This reverts commit 5d3fa86.

Co-authored-by: Isaac
Comment thread cmd/auth/describe_test.go
require.Equal(t, "error", status.Status)
assert.ErrorIs(t, status.Error, listErr)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Could we somehow remove duplication from these tests? Maybe a table driven if it's easy to do?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a2473db: folded the five fallback tests into one table-driven test (TestGetAuthStatusVerificationFallback). A zero status in the test server now means "this endpoint must not be called", which absorbs the no-account-id case into the table. Note the file also shrank in the meantime: the 403-as-success commit was dropped from the PR, so the three tests for it are gone.

Review feedback: the five fallback tests repeated the same wiring. One
table-driven test now covers both branches; a zero status in
describeVerifyServer marks an endpoint that must not be called, which
absorbs the no-account-id case.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auth login writes workspace_id to account-level profile, breaking auth describe

3 participants