Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ targets:
is_continuous: false
pytest_marks: "not dev_skip and not freshness and not manual"
alert_emails: qqqq.can.i.grab.user.email.com
#qqqq unclear whether seeing uses same stuff these should be own vars or dev ones
#qqqq unclear whether personal environment should use personal vars or dev vars
env_serverless_usage_policies:
env_test_all_id: ${var.serverless_usage_policies.dev_test_all_id}
env_test_dq_id: ${var.serverless_usage_policies.dev_test_dq_id}
Expand Down
2 changes: 1 addition & 1 deletion docs/But why is it like that.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ from teamname_functions import core_func as core

## DUMMY_POLICY_FOR_BUNDLE_ID

Without having databricks.yml per target the bundle validate validates all yml included files at the top level, even if they are sync excluded e.g. prod excluding test files. So this policy should not show up in usage. It also mean we need to declare IDs for policys we are not using
Without having databricks.yml per target the bundle validate validates all yml included files at the top level, even if they are sync excluded e.g. prod excluding test files. So this policy should not show up in usage. It also mean we need to declare IDs for policys we are not using
6 changes: 6 additions & 0 deletions docs/Databricks Limitations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Databricks Limitations

## Points to revisit / discuss other databricks teams
- Using NHS geographys in databricks
- how do they do their budget policys
- do they use serverless, or own compute, and cost impact of microsoft discount
100 changes: 64 additions & 36 deletions docs/Cost Visibility.md → docs/Tags and Cost Visibility.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,23 @@
# Cost Visibility
# Tags and Policy Budgets

## What you need to do
Todo: how to assign tagging
Todo: add reminder in pull request template
For every notebook, a tag or query? to identify it???
Tags and cost visibility go together in this document. This is because the budget policys allow us to track serverless costs, and tags allow us to sort pipelines and jobs in the databricks UI amongst other things.

## Approaches
Todo: table, dashboard, budget tagging, and where triggered, when to pick the tag, what the tage means
budget policys good for, sql would be used for stored procs if we continue using them
Having tags set in the tags section, and tags in serverless budget policys could get confusing. So serverless budget policys should be for every job, pipeline and notebook, tags should be occassionaly applied where different or more granuality is required.

## How to view visibility
Todo: link dashboards etc encourage/invite looking at, improvements
The severless budget policys override the tags set in jobs and pipelines for budget tables (where they match), and allow us to capture serverless costs.

# FYI
[budget policies limitations and delay](https://learn.microsoft.com/en-us/azure/databricks/admin/usage/budget-policies)
- Updates to tags won't be reflected in new pipeline updates if the pipeline is in Development mode. The changes take 24 hours to propagate.
- Pipelines triggered by jobs do not inherit the job's serverless budget policy. Users must set the pipeline's policy.
Tags set in the tag section are nice because the UI use them and not the policy set ones currently.

# What you dont need to know
## Policy and Tags Reference

## Setup Budget Policys
⚠️ This is the first thought for budget policy strategy this is worth some thought: Should we track data coming in/out, who is consuming it, batch or live consumed, etc.

00_ because defaultest to highest alphabetically. (We should change this to 5 to give head room)
The ```<env>_<costcentre>_<detail>```
Everyone = People
*Maybe we should have tags in Staging_Analysis_Batch for ingestion or medallion, but until we have some good queries and dashboards we wont know what the most useful tags are*

All = People + Sps
Policy naming
```<number (if you want it to default to a certain order)>-<Env>-<CostCentre>-<Test or Owner (which ever most useful to know)>```
you may add DEFAULT on the end it is a policy to fallback on when one isnt assigned.

I beleive we can use anchors to give these to the dab for the SPs to use rather than having to apply them

Service principle ones will be given via dab and anchors so easy to update. But ones for ppl will need to be ui.
People are given permissions through groups

Other columns are some of the tags will have

qqqq maybe owner names and groups need to become the same. YES

**This approach means you have to apply policies not just have them autoapplied**
**Job tasks need to have policys and tasks need to be seperate enough to have a single policy each not span two**

Unfortunately currently dab is not setting permissions for sps to use the serverless policys so they have to be set through the UI

*Maybe we should have tags in Staging_Analysis_Batch for ingestion or medallion, but until we have some good queries and dashboards the most useful tags to have wont be clear*

| Policy Name | Env | CostCentre | Test / Owner |
|---|---|---|---|
Expand Down Expand Up @@ -70,4 +46,56 @@ Unfortunately currently dab is not setting permissions for sps to use the server
| Dev_Test_All | Dev | Test | All | |
| Dev_Test_DQ | Dev | Test | DQ | |
| Dev_Test_Int | Dev | Test | Int | |
| Dev_Test_Unit | Dev | Test | Unit | |
| Dev_Test_Unit | Dev | Test | Unit | |




## Cost Visibility

### What you need to do
Todo: how to assign tagging
Todo: add reminder in pull request template
For every notebook, a tag or query? to identify it???
Drop down databricks instances, settings, to usage is how to connect the policyies to azure

### Approaches
Todo: table, dashboard, budget tagging, and where triggered, when to pick the tag, what the tage means
budget policys good for, sql would be used for stored procs if we continue using them

### How to view visibility
Todo: link dashboards etc encourage/invite looking at, improvements

# FYI
[budget policies limitations and delay](https://learn.microsoft.com/en-us/azure/databricks/admin/usage/budget-policies)
- Updates to tags won't be reflected in new pipeline updates if the pipeline is in Development mode. The changes take 24 hours to propagate.
- Pipelines triggered by jobs do not inherit the job's serverless budget policy. Users must set the pipeline's policy.

# What you dont need to know

## Azure
- Budgets minimum gradularity is 1 month

## Budget calculation limitations
[Microsoft link serverless budgets](https://learn.microsoft.com/en-us/azure/databricks/admin/account-settings/budgets#known-limitations)
> Budgets do not factor in any billing credits or negotiated discounts your account might have. The spent amount is calculated by multiplying usage by the SKU list price.

## Setup Budget Policys

00_ because defaultest to highest alphabetically. (We should change this to 5 to give head room)
The ```<env>_<costcentre>_<detail>```
Everyone = People

All = People + Sps

I beleive we can use anchors to give these to the dab for the SPs to use rather than having to apply them

Service principle budget policies will be given via dab and anchors so easy to update. But ones for ppl will need to be ui.
People are given permissions to use policies through groups. They apply policies to notebooks etc by clicking the vertical slider symbol on the right and setting policy. (When making jobs and pipelines, policy and tags should be defined and match each other)

Other columns are some of the tags will have

**This approach means you have to apply policies not just have them autoapplied**
**Job tasks need to have policys and tasks need to be seperate enough to have a single policy each not span two**

Unfortunately currently dab is not setting permissions for sps to use the serverless policys so they have to be set through the UI.
5 changes: 2 additions & 3 deletions resources/jobs/utils/full_data_refresh.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,15 @@ resources:
jobs:
medallion_refresh:
name: "[${bundle.target}] medallion_refresh"
# Let policy be set by the calling task
tasks:
- task_key: run_bronze_pipeline
pipeline_task:
pipeline_id: ${resources.pipelines.pipeline_ods_ingestion.id}
# not included serverless policy as this gets run by tests for the DQ tests, the best approach would be policy per task in the job calling this, and seperate the notebooks so each test task can be seperate [Not sure can do it task based](https://learn.microsoft.com/en-us/azure/databricks/admin/usage/budget-policies?source=recommendations#assign-permissions-on-a-policy)


# Exclude streaming if they are triggered whilst active an error occurs
# - task_key: run_silver_pipeline
# depends_on:
# - task_key: run_bronze_pipeline
# pipeline_task:
# pipeline_id: ${resources.pipelines.pipeline_isreporter_live_dlt.id}
# pipeline_id: ${resources.pipelines.pipeline_isreporter_live_dlt.id}
1 change: 1 addition & 0 deletions resources/pipeline/silver/isreporter_dlt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ resources:
catalog: ${var.catalog}
target: ${var.schema_prefix}${var.layer_silver}_${var.domain_reporting}
serverless: true

# configuration:
############### Resource yml files for set of pipelines #################
# If we do bronze, silver ... tranformation based layers with own yml files will define layer level vars here
Expand Down
1 change: 0 additions & 1 deletion resources/test/integration_test_job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ resources:
name: "[${bundle.target}] Integration & Unit Test Runner"

usage_policy_id: ${var.env_serverless_usage_policies.env_test_all_id}

tasks:
##############################################
####################
Expand Down
Loading