Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ And if this happens teams dont get the benefit of others practice, knowledge of
## refs
[first google but its ok its tips on good peer reviews](https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/)

[How to refactor](https://nhsdigital.github.io/rap-community-of-practice/training_resources/coding_tips/refactoring-guide/)

---

# Pull Request Guidance
- You can open pull requests as drafts if you like
Expand All @@ -52,6 +55,8 @@ These ar not show stoppers just good to check
- commit naming convention followed
- branch correctly named
- searched for any personal codes you use to leave notes for yourself when developing e.g. "zzzz" to help you tidy up
- If big table creation or change has Analyze Table been included to improve cost/efficiency. (for pipelines not exploratory notebooks etc)
- Run new jobs and pipelines in personal after deploying the dab, check compute/query time to build up an expectation for what a suboptimal time looks like, include the time in the PR comment so reviewers can offer input. (Can do this in jobs and pipelines or clicking see performance in sql query cells)

### Contribute to code quality and the future
- lint (Run_lint targeted files and make improvements)
Expand All @@ -73,8 +78,11 @@ These ar not show stoppers just good to check
### Contribute to planning
- if this task highlighted a need for any task to be created please make these recommendations or open discussions

---

# Pull Request Form (Please Complete)
🙂🙂🙂🙂🙂🙂 **Beginning Pull Request form** 🙂🙂🙂🙂🙂🙂

*based on https://github.com/TechnologyEnhancedLearning/LearningHub.Nhs.WebUI/blob/master/.github/pull_request_template.md?plain=1*


Expand Down Expand Up @@ -125,3 +133,53 @@ Recommendation may be for future practice, for future refactor tasks, or just go
- [ ] Agreed additional commits/Approved and let author know
- [ ] Added any additional insight to the jira ticket for the testers


🙂🙂🙂🙂🙂🙂 **End of Pull Request Form** 🙂🙂🙂🙂🙂🙂

---

# After Merging to an Environment Branch
**If you are managing environment, you should always be a required peer reviewer for branches into dev, staging, prod**
Having approved the feature branch owner to merge, or merging the branch as part of approval it is vital to then check the environments and the gitaction pipeline.

## Checks for all environments
[Check git deployment action for that environment succeeded](https://github.com/TechnologyEnhancedLearning/DatabricksPOC/actions)
- [ ] Check the environment deployed to
- [ ] Check jobs and pipelines they should all pass though occasionally fails can happen that only require restart, unless expected they rarely should run more than a minute or two
- remember to check all pages
- [ ] Check dashboards for usage, and recheck after an hour or day, for any unexpected behaviours such as step changes


## Dev
- Doesnt have continuous processes (see databricks.yml setup differences)

### Dev Checks
- qqqq dashboard needs creating
- [ ] [Dev dashboard](https://adb-3560006266579683.3.azuredatabricks.net/sql/dashboardsv3/01f11c8a7b89118788f1391e0e6afe7f/pages/19beeac3?o=3560006266579683)
- [ ] [Dev pipelines and jobs](https://adb-295718430158257.17.azuredatabricks.net/jobs?o=295718430158257)
- [ ] [Git actions](https://github.com/TechnologyEnhancedLearning/DatabricksPOC/actions)
- [ ] Checked Dev email address post deployment (qqqq would be good if deployment emailed the address so has a before and after)

## Staging
- Does have continuous processes (see databricks.yml setup differences)

### Staging Check
- qqqq dashboard needs creating
- [ ] [Staging dashboard](https://adb-3642283292081870.10.azuredatabricks.net/sql/dashboardsv3/01f11c8a699f1cfd981cc67dd50ab74f/pages/b36dd0e6?o=3642283292081870)
- [ ] [staging pipelines and jobs](https://adb-3642283292081870.10.azuredatabricks.net/jobs?o=3642283292081870)
- [ ] [Git actions](https://github.com/TechnologyEnhancedLearning/DatabricksPOC/actions)
- [ ] Checked Staging email address post deployment (qqqq would be good if deployment emailed the address so has a before and after)

## Prod
- Prod doesnt run test
- Does have continuous processes (see databricks.yml setup differences)
### Prod Checks
- qqqq dashboard needs creating
- [ ] [Prod dashboard](https://adb-7405617206100704.4.azuredatabricks.net/sql/dashboardsv3/01f12394d70414b8935a1d8390041e0f?o=7405617206100704)
- [ ] [Prod pipelines and jobs](https://adb-7405617206100704.4.azuredatabricks.net/jobs?o=7405617206100704)
- [ ] [Git actions](https://github.com/TechnologyEnhancedLearning/DatabricksPOC/actions)
- [ ] Checked Prod email address post deployment (qqqq would be good if deployment emailed the address so has a before and after)

*Having deployed to Prod a manual release version and description may be desired e.g. [github release page for poc](https://github.com/TechnologyEnhancedLearning/DatabricksPOC/releases/tag/manul-tag)*


22 changes: 22 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,15 @@ Agent instruction: Commit changes only if all local unit tests pass, lint issues

# Human provided instruction to the agent

<!--When updating the human provided context, consider if other files need updating too
Team wide AI assistant instructions:
/Workspace/ a file called assistant_workspace_instructions.md
Your personal AI assistant instruction additions:
Workspace/users/yourusername a file called .assistant_instructions.md
Github Copilot context: /Workspace/Users/username/Branch/.github/copilot-instructions.md
Github agents can be made here if desired Workspace/Users/username/.assistant
-->

The above was generated by AI. What our teams wants is:

## Pull Request
Expand All @@ -126,3 +135,16 @@ The above was generated by AI. What our teams wants is:
- if a change is declined and it may be a change we do not want recommending in future suggest an addition to be made to copilot-instructions.md or github/instructions files, the docs/AI-readme.md and the .assisitive in the workspace
- search databricks upto date information before making recommendation about databricks and ensure its relevant for DABs git source controlled solutions with 3 workspaces deployed via git
- gently mention good commit names and branch names to aid improvement of using git

## context
- we use dabs
- we have a dev staging prod setup
- dev is for analysts staging for testing team
- we want unit testable code
- we are migrating existing sql to databricks
- databricks is serving warehoused dashboard data and live read data for our lms

## Recommendations
- recommend spark over using libraries
- recommend refactoring to python functions and unit tests where appropriate
- recommend changes to make unit testable code over integration testable code
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ See Enviroments in docs folder
- youtube show original docs on dbx and has a repo link
- can thread it

## Refs Infastructure Architecture
[dbx official blog useful workspace seperation](https://www.databricks.com/blog/2022/03/10/functional-workspace-organization-on-databricks.html)

[matches our approach dbx bundle architecture](https://learn.microsoft.com/en-gb/azure/databricks/dev-tools/bundles/)

# Databricks Structure

[DBX POC](https://adb-295718430158257.17.azuredatabricks.net/browse/folders/2302733728786158?o=295718430158257)/
Expand Down
17 changes: 17 additions & 0 deletions databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ variables:
env_name:
description: Deployment environment name (personal, dev, staging, prod)

is_continuous:
default: "false"
description: Set to true for continuous DLT pipeline mode. Staging and prod run continuous to mirror production behaviour. Personal and dev run triggered to save cost.


# ============================================================
Expand All @@ -65,7 +68,13 @@ variables:
storage_account:
description: Databricks workspace-dedicated storage account (dev, staging, prod)

# ============================================================
# Notification Configuration
# ============================================================

#qqqq implement these with vars that are not exposed to git
alert_emails:
description: List of email addresses for failure notifications

# ============================================================
# Service Principal Identity
Expand Down Expand Up @@ -125,7 +134,9 @@ targets:
schema_prefix: ${workspace.current_user.short_name}_
# dev storage account
storage_account: unifiedrptdeltalake
is_continuous: false
pytest_marks: "not dev_skip and not freshness and not manual"
alert_emails: qqqq.can.i.grab.user.email.com
permissions:
- level: CAN_MANAGE
user_name: ${workspace.current_user.userName}
Expand All @@ -148,7 +159,9 @@ targets:
env_name: dev
catalog: dev_catalog
storage_account: unifiedrptdeltalake
is_continuous: false
pytest_marks: "not dev_skip and not freshness and not manual"
alert_emails: BUNDLE_VAR_dev_alert_emails_qqqq
permissions:
- group_name: dev_env_users
level: CAN_VIEW
Expand All @@ -168,8 +181,10 @@ targets:
env_name: staging
catalog: staging_catalog
# Staging storage account
# commenting for safety# is_continuous: true #! ⚠️warning cost check behaviour before prod
storage_account: unifiedrptdeltalake
pytest_marks: "not staging_skip and not freshness and not manual"
alert_emails: BUNDLE_VAR_staging_alert_emails_qqqq
permissions:
- group_name: staging_env_users
level: CAN_VIEW
Expand Down Expand Up @@ -201,7 +216,9 @@ targets:
env_name: prod
catalog: prod_catalog
# Prod storage account
# commenting for safety# is_continuous: true #!⚠️ warning cost check behaviour before prod
storage_account: unifiedrptdeltalake
alert_emails: BUNDLE_VAR_staging_alert_emails_qqqq
# Prod will not have tests in its file system so no pytest_marks here
permissions:
- group_name: prod_env_users
Expand Down
44 changes: 44 additions & 0 deletions docs/AI Usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Notes on what we want from GitHub Copilot, and databricks assistant

Context files for AI can be found at:

Team wide AI assistant instructions:
/Workspace/ a file called assistant_workspace_instructions.md

Your personal AI assistant instruction additions:
Workspace/users/yourusername a file called .assistant_instructions.md

Github Copilot context: /Workspace/Users/username/Branch/.github/copilot-instructions.md

Github agents can be made here if desired Workspace/Users/username/.assistant

*If using both github copilot context and assistant_workspace_instructions it will be desireable to update both at the same time when changes are made*

# ⚠️WARNING⚠️
- Databricks AI has a low limit for requests so:
- write a detailed prompt to scaffold work
- write a detailed prompt to review when done
- write in sql and use to translate to python when learning python
- use other external AI for questions not requiring access to files or our specific prebuilt prompts
- prebuilt prompts can be used in external AI aswell
- Git copilot can be used manually but not as an automatic PR tool **yet** without paying for it (though it is planned to be free)

# Using github copilot in the short term
- for now can try [github copilot client](https://github.com/copilot) and select the repo. and use a prompt like
> "Please confirm the name of the most recent pull request on this repo. Then using the .github/copilot-instructions.md as your context prompt provide a peer review of the pull requests code changes, by providing file names and lines."
- on a second screen you may want to go to the pull request and click the file changes tab
- if the bot needs help understanding describing certain files please consider helping it by adding this to the context file for future, so we can constantly improve its context

# Tips
- Do few high quality detailed prompts
- forward slash offers some prebuilt context options to select
- using inline ai button will focus on work in your window
- using it from the side bar means you can get it to find for example the closest example of what your trying to do in the project for reference
- tell it what you want it to consider, what you want to achieve, how you want it to help, the type of response you want, what your priorities are

# refs
[copilot in git code review](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review)
[github custom context repo prompt generator](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions)



100 changes: 0 additions & 100 deletions docs/AI refinements.md

This file was deleted.

Loading
Loading