Initial implementation of the local cache layer#3678
Conversation
…able caching in populate_current_user_cached
| "bundle_mode": "TYPE_UNSPECIFIED", | ||
| "workspace_artifact_path_type": "WORKSPACE_FILE_SYSTEM" | ||
| "workspace_artifact_path_type": "WORKSPACE_FILE_SYSTEM", | ||
| "local_cache_measurements_ms": [...redacted...] |
There was a problem hiding this comment.
(optional) maybe worth recording the keys for the int map.
| // TestFingerprintStability tests that the fingerprintToHash function returns the same hash for the same input. | ||
| func TestFingerprintStability(t *testing.T) { | ||
| fingerprint1 := struct { | ||
| Key string `json:"key"` |
There was a problem hiding this comment.
What happens if there's an embedded struct with an override for the same field:
struct foo {
A string
B string
}
struct bar {
foo
A string
}
Do bar{A:"abc"} and bar{foo{A:"abc"}} compute to the same has or different?
This is a scenario that arises in dashboards so atleast we should document what happens in a unit test.
There was a problem hiding this comment.
This has to be a separate hash, but why dashboards matter here?
There was a problem hiding this comment.
but why dashboards matter here?
Maybe someday we might want to cache dashboards... Not a real usecase but atleast worth pointing out that the hashing is not bulletproof.
|
|
||
| // getAuthorizationHeader extracts the Authorization header from the workspace client configuration. | ||
| // If it fails to authenticate, it returns an empty string. | ||
| func (b *Bundle) getAuthorizationHeader() string { |
There was a problem hiding this comment.
When doing the cache miss analysis, it'll be interesting to corellate that with auth type. I would expect oauth-u2m to have more misses due to short lived API tokens.
| === First call in a session is expected to be a cache miss: | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] using cache key: [SHA256_HASH] | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] failed to stat cache file: (redacted) | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] cache miss, computing |
There was a problem hiding this comment.
nit: should include function name that is being computed? This maybe more relevant in the future where is more than one function.
| account Databricks Account Commands | ||
| api Perform Databricks API call | ||
| auth Authentication related commands | ||
| cache Local cache related commands |
There was a problem hiding this comment.
From the description it's not obvious that is user-level cache (and not whatever we have in .databricks). Perhaps make it explicit "user-level cache"? cc @juliacrawf-db
| Use: "cache", | ||
| Short: "Local cache related commands", | ||
| Long: "Manage local cache used by the Databricks CLI for improved performance", | ||
| } |
There was a problem hiding this comment.
Please update the short/long for this.
pietern
left a comment
There was a problem hiding this comment.
@andrewnester Please take a look at the PR summary as well.
The implementation has drifted from the original summary.
|
Commit: 100a4a6
22 interesting tests: 14 flaky, 7 RECOVERED, 1 SKIP
Top 50 slowest tests (at least 2 minutes):
|
## Changes This PR introduces a local file-based cache layer for the Databricks CLI to improve performance of repeated operations. New libs/cache package: - Standalone generic function GetOrCompute[T any](ctx, cache, fingerprint, compute) that works with any type - File-based cache implementation (fileCache) storing JSON-encoded data - SHA256-based fingerprinting for cache keys from any struct - Automatic cleanup of expired cache files on initialization - Fail-open behavior: cache errors never block operations, just trigger recomputation - Cache isolation by CLI version: ~/.cache/databricks/<version>/<component>/ Cache Modes: - Measurement mode (default): Cache disabled, but still measures potential savings via telemetry - Enabled mode: Set DATABRICKS_CACHE_ENABLED=true to actually use cached values New CLI command: - databricks cache clear - Removes all cached files across all CLI versions Bundle Integration: - New InitializeCache() mutator to set up cache in bundle initialization phase - PopulateCurrentUser now uses cache for CurrentUser.Me() API call ## Why <!-- Why are these changes needed? Provide the context that the reviewer might be missing. For example, were there any decisions behind the change that are not reflected in the code itself? --> This is the first attempt to speed up subsequent `databricks bundle` commands that a bundle developer runs while developing a bundle ## Tests <!-- How have you tested the changes? --> - changed existing acceptance tests to use a dedicated cache folder - added new acceptance test for overall caching functionality and telemetry - added new acceptance test for clearing the cache - added unit tests for `libs/cache` <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. --> --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com> Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>
Changes
This PR introduces a local file-based cache layer for the Databricks CLI to improve performance of repeated operations.
New libs/cache package:
Cache Modes:
New CLI command:
Bundle Integration:
Why
This is the first attempt to speed up subsequent
databricks bundlecommands that a bundle developer runs while developing a bundleTests
libs/cache