Skip to content

tests: unblock CLI integration nightlies after main-branch drift#5076

Merged
simonfaltum merged 5 commits intomainfrom
simonfaltum/fix-ci-accumulated-breakage
Apr 23, 2026
Merged

tests: unblock CLI integration nightlies after main-branch drift#5076
simonfaltum merged 5 commits intomainfrom
simonfaltum/fix-ci-accumulated-breakage

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

Why

Integration test nightlies (cli-isolated-pr.yml) have been red on every main run since 2026-04-02, when #4899 temporarily disabled the trigger. The trigger was re-enabled in #5034 and all accumulated failures surfaced at once. Nothing in any in-flight feature PR is to blame; this PR just clears the backlog so nightly signal goes green again.

Two independent regressions:

  1. The host-metadata cache (#5011) regenerated goldens for tests that run locally, but could not touch Cloud=true, Local=false suites. acceptance/selftest/record_cloud/{pipeline-crud,workspace-file-io} still expected the pre-cache /.well-known/databricks-config calls.
  2. Lakeview server behavior now varies by cloud on workspace import. AWS staging includes serialized_dashboard in the updated fields; GCP production no longer clears warehouse_id. The exact-match assertions in TestDashboardAssumptions_WorkspaceImport fail differently on each cloud.

Changes

Before: record_cloud goldens include redundant /.well-known/databricks-config GETs; dashboard test hard-codes exact updated/deleted fields.

Now: goldens regenerated against e2-dogfood (only diff is removal of the cached requests); dashboard assertions use assert.Subset so they tolerate cross-cloud drift but still fail on anything outside the known-allowed set.

  • acceptance/selftest/record_cloud/pipeline-crud/output.txt, acceptance/selftest/record_cloud/workspace-file-io/output.txt: rerun with -update under CLOUD_ENV=aws against e2-dogfood. Both terraform and direct variants produce identical output.
  • integration/assumptions/dashboard_assumptions_test.go: etag and update_time must appear in updated fields; serialized_dashboard is allowed; warehouse_id is the only allowed deletion. Comment points to the observed cross-cloud split so the next reader knows why.

Follows the pattern of the previous Lakeview-behavior-change fix in #4640.

Test plan

  • make checks clean
  • make lint clean (0 issues)
  • go test ./acceptance -run 'TestAccept/selftest/record_cloud/{workspace-file-io,pipeline-crud}' passes against e2-dogfood (both terraform and direct variants)
  • go test ./integration/assumptions -run TestDashboardAssumptions_WorkspaceImport passes against e2-dogfood
  • cli-isolated-pr.yml integration run on this branch comes back green

Two unrelated main-branch regressions surfaced after the integration
test trigger was re-enabled in #5034. Neither is caused by any
in-flight feature PR; they just sat behind the disabled trigger.

- record_cloud goldens: the host-metadata cache (#5011) regenerated
  most goldens but could not touch Cloud=true/Local=false suites.
  Regenerated workspace-file-io and pipeline-crud against e2-dogfood;
  changes are purely removal of redundant /.well-known/databricks-config
  requests.
- dashboard assumptions: Lakeview behavior now varies across clouds:
  AWS staging includes serialized_dashboard in updates; GCP production
  no longer clears warehouse_id on workspace import. Loosened both
  assertions to Subset checks that still catch unexpected drift.

Co-authored-by: Isaac
Simplification pass: two assert.Contains + assert.Subset become
assert.Subset(actual, required) + assert.Subset(allowed, actual),
and the two comments merge into one.

Co-authored-by: Isaac
Same post-cache drift as the other two record_cloud suites; this one
is gated on RequiresUnityCatalog so it skipped on non-UC nightlies but
would fail on the aws-prod-ucws-is / azure-prod-ucws-is runners.
Regenerated against e2-dogfood with TEST_METASTORE_ID set. Flagged by
codex review.

Co-authored-by: Isaac
…te_one/cloud

The prod-is test workspace no longer returns display_name for
test-dabs-1@databricks.com and test-dabs-2@databricks.com in the
permissions API response (the SPN owner still does). Predictable
server-side drift, not reproducible against e2-dogfood because the
SPN/user distinction differs. Mass-replacement edit per the testing
rule exception.

Co-authored-by: Isaac
@simonfaltum
Copy link
Copy Markdown
Member Author

Reverted f05d5cd96 in a30a7392b.

The earlier theory (prod-is stopped returning display_name for test-dabs users) didn't hold. The Azure cloud run on this PR returned display_name consistently across all three retries, so removing the lines made the diff worse. Putting them back to match what the API actually emits.

If it turns out to be flaky between envs, a display_name replacement under acceptance/bin/ would be the cleaner next step instead of edit-loop on the golden files.

@simonfaltum simonfaltum disabled auto-merge April 23, 2026 18:51
@simonfaltum simonfaltum merged commit 5166341 into main Apr 23, 2026
22 of 23 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/fix-ci-accumulated-breakage branch April 23, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants