Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions LICENCE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) [year] [fullname]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
177 changes: 176 additions & 1 deletion docs/phil-feedback.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,178 @@
# Feedback

Collect feedback here qqqq
Collect feedback here qqqq


## Concerns to check
- Environments and Packages in https://nhsdigital.github.io/rap-community-of-practice/training_resources/python/intro-to-python/ am i doing enough here is it set by my toml?

## Reviewing Other DBX Repos
- [Nice standard Git flow explanation a resource for getting a clear feel of the process and all things github](https://docs.github.com/en/get-started/using-github/github-flow)
- add to git docs once other changes made


## Make a RAP doc
[RAP](https://github.com/NHSDigital/rap-community-of-practice/tree/main/docs)
[what rap is why it matters high level gov stuff long strategy](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/)

""What you need for RAP
There is no specific tool that is required to build a RAP, but both R and Python provide the power and flexibility to carry out end-to-end analytical processes, from data source to final presentation.

Once the minimum RAP has been implemented statisticians and analysts should attempt to further develop their pipeline using:

functions or code modularity
unit testing of functions
error handling for functions
documentation of functions
packaging
code style
input data validation
logging of data and the analysis
continuous integration
dependency management"

[Nice rap blog](https://analysisfunction.civilservice.gov.uk/blog/the-nature-of-reproducible-analytical-pipelines-rap/)
- should we do unit tests together initially, or pair code initially
“Just-in-time” learning is the best approach
"As we progressed through the project we were inevitably presented with new concepts or tasks unfamiliar to us as a working group. We got into the habit of tackling these with bespoke just-in-time training sessions for the team. When it came to test parts of the code we had developed, we held a unit testing session, facilitated by the BPI Team."
..."We had lots of sessions like this; when we needed to resolve merge conflicts in Git, complete peer reviewing and develop documentation"


doc git branch and comits add

"https://analysisfunction.civilservice.gov.uk/blog/the-nature-of-reproducible-analytical-pipelines-rap/"
- power of git
- was a learning curve
- "It’s a completely new way of working for us so it took some time for us to get to grips with it, but we got there. If we were to do this project again, I would push my team to use this to host their code from the first line they developed. I would also encourage them to get into the habit of committing to our repository much more frequently. When it came to quality assuring our pipeline, time elapsed between commits meant we had large chunks of code to check. This would have been a much more streamlined and manageable process had we used Git little and often from the start."


[cute rap overview friendly read]("https://nhsdigital.github.io/rap-community-of-practice/#what-is-rap")

""Recommendation 7: promote and resource ‘Reproducible Analytical Pipelines’ ... as the minimum standard for academic and NHS data analysis"
Data Saves Lives, 2022 government strategy report"

**Can we do this**
"Our RAP Service
The NHS England Data Science team offers support to NHSE teams looking to implement RAP.

We'll:

Work alongside your team for 6-12 weeks
Work with you to recreate one of your processes in the RAP way
Deliver training in Git, Python, R, Databricks, and anything else you'll need
Learn more:"

[Rap tools git pyspark etc great](https://nhsdigital.github.io/rap-community-of-practice/training_resources/git/introduction-to-git/)
- this is really good
- Learning hub python course 2 days
- https://analysisfunction.civilservice.gov.uk/training/introduction-to-python/
- learning hub 2 day pyspark
- https://analysisfunction.civilservice.gov.uk/training/introduction-to-pyspark/

[Rap 7min explanation 7 min questions vid nhs](https://www.youtube.com/watch?v=npEh7RmdTKM)
- python git etc
- process mapping 5.13
- advise write functions names at the beggining just dont populate them yet
- write the test names for them too but dont create them yet

- DBX git testing, setting us up well for rap


- add this to code quality and the peer review doc
https://nhsdigital.github.io/rap-community-of-practice/training_resources/coding_tips/refactoring-guide/



[Time management starting to use Rap](https://nhsdigital.github.io/rap-community-of-practice/implementing_RAP/rap-readiness/)
- e.g. 15 hours of tool training recommended
- think slice is what we are after as our next step i think!
- target a level of RAP <- we should do this

[Preparing for RAP List of code, docs, tool resource to help get rap ready](https://nhsdigital.github.io/rap-community-of-practice/tags/#preparing-for-rap)
- really useful list here
- coding tips section


Add to unit testing readme and PR
https://nhsdigital.github.io/rap-community-of-practice/training_resources/python/unit-testing/
- provides good guidances
- good all these docs saying python and pytest

- add to PR does python have python doc strings
"""
Converts temperatures in Fahrenheit to Celsius.

Takes the input temperature (in Fahrenheit), calculates the value of
the same temperature in Celsius, and then returns this value.

Args:
temp: a float value representing a temperature in Fahrenheit.

Returns:
The input temperature converted into Celsius

Example:
fahrenheit_to_celsius(temp=77)
>>> 25.0
"""
- this will help us identify when to use other functions, and how to adapt them if they need to change to support more callers


IDE and python over jupyter
"Jupyter notebooks require quite a bit of arduous coding gymnastics to perform what in Python files would be simple imports"
but make notebooks easier with utility functions to turn into python libraries sort of https://jupyter-notebook.readthedocs.io/en/4.x/examples/Notebook/rstversions/Importing%20Notebooks.html


Create Next steps doc
Next Steps. Future task

PT recommends
https://nhsdigital.github.io/rap-community-of-practice/implementing_RAP/thin-slice-strategy/
focus on one ingestion → dashboard process then slice

implement, review as a team, try to maximise RAP good practice using the pipeline

Everyone a reviewer for PR, take detours for simple training dives on anything that feels not right or is a blocker.
There are in the POC project good practice docs, references and some training reference for pyspark and python, unit testing etc.


[RAP massive gov doc maybe useful if looking for somehting specific its long](https://www.gov.uk/guidance/the-aqua-book)

[How to refactor](https://nhsdigital.github.io/rap-community-of-practice/training_resources/coding_tips/refactoring-guide/)

[click newbie for git guide for rap](https://nhsdigital.github.io/rap-community-of-practice/implementing_RAP/skills_for_rap/git_for_rap/)

[Quality coding, e.g doc manual coding in repo?](https://nhsdigital.github.io/rap-community-of-practice/implementing_RAP/workflow/quality-assuring-analytical-outputs/)


[RAP level really shows we are on the right track](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)
- we probably aim for silver as should achieve alot but ensure baseline
- we also are looking at hitting most the gold with cicd, and databricks and dabs


[python style guide if interested, can be useful not just to lint but to help name etc](https://peps.python.org/pep-0008/)

https://nhsdigital.github.io/rap-community-of-practice/training_resources/pyspark/pyspark-style-guide/
"We avoid using pandas or koalas because it adds another layer of learning. The PySpark method chaining syntax is easy to learn, easy to read, and will be familiar for anyone who has used SQL."


pyspark dynamic modular queries lazy.

unit test, expected, error handling, edge cases <- alreay got this comment but make sure i put it in

python > pandas

[pyspark RAP](https://nhsdigital.github.io/rap-community-of-practice/training_resources/pyspark/) put this in PR

[another pyguide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
[another approachable longer doc about RAP and what it involves](https://best-practice-and-impact.github.io/qa-of-code-guidance/principles.html)


[gov rap policies](https://github.com/NHSDigital/rap-community-of-practice/blob/main/docs/introduction_to_RAP/gov-policy-on-rap.md)


# Not really need but nice to cpmment somewhere
[open source 1](https://www.gov.uk/guidance/be-open-and-use-open-source)
[open source 2](https://gds.blog.gov.uk/2017/09/04/the-benefits-of-coding-in-the-open/)
[open source 3](https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md)

2 changes: 1 addition & 1 deletion resources/pipeline/silver/isreporter_dlt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resources:
photon: true
# good practice to specify its something to do with dlt having beta version?
channel: current
continuous: true # maybe triggered for POC once works
continuous: false # maybe triggered for POC once works
# By defining catalog here we set it for all jobs in the pipeline without needing to specify it with the variable when defining a table
catalog: ${var.catalog}
target: ${var.schema_prefix}${var.layer_silver}_${var.domain_reporting}
Expand Down
98 changes: 98 additions & 0 deletions scratch/dlt-ingestdeleteme-cluster.md.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "ce0b4748-c235-4a29-ae6a-487a4f491d59",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
"%sh\n",
"rm -rf /dbfs/device_stream\n",
"mkdir -p /dbfs/device_stream\n",
"curl -L -o /dbfs/device_stream/device_data.csv https://github.com/MicrosoftLearning/mslearn-databricks/raw/main/data/device_data.csv"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "ee5c0ce0-74fe-441a-ba87-b6ffc46f1d22",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
" from pyspark.sql.functions import *\n",
" from pyspark.sql.types import *\n",
"\n",
" # Define the schema for the incoming data\n",
" schema = StructType([\n",
" StructField(\"device_id\", StringType(), True),\n",
" StructField(\"timestamp\", TimestampType(), True),\n",
" StructField(\"temperature\", DoubleType(), True),\n",
" StructField(\"humidity\", DoubleType(), True)\n",
" ])\n",
"\n",
" # Read streaming data from folder\n",
" inputPath = '/device_stream/'\n",
" iotstream = spark.readStream.schema(schema).option(\"header\", \"true\").csv(inputPath)\n",
" print(\"Source stream created...\")\n",
"\n",
" # Write the data to a Delta table\n",
" query = (iotstream\n",
" .writeStream\n",
" .format(\"delta\")\n",
" .option(\"checkpointLocation\", \"/tmp/checkpoints/iot_data\")\n",
" .start(\"/tmp/delta/iot_data\"))"
]
}
],
"metadata": {
"application/vnd.databricks.v1+notebook": {
"computePreferences": null,
"dashboards": [],
"environmentMetadata": {
"base_environment": "",
"environment_version": "4"
},
"inputWidgetPreferences": null,
"language": "python",
"notebookMetadata": {
"mostRecentlyExecutedCommandWithImplicitDF": {
"commandId": 6527703006113399,
"dataframes": [
"_sqldf"
]
},
"pythonIndentUnit": 4
},
"notebookName": "dlt-ingestdeleteme-cluster.md",
"widgets": {}
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
3 changes: 2 additions & 1 deletion tests/Run_Lint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,8 @@
"widgetDisplayType": "Text",
"validationRegex": null
},
"parameterDataType": "String"
"parameterDataType": "String",
"dynamic": false
},
"widgetInfo": {
"widgetType": "text",
Expand Down
9 changes: 6 additions & 3 deletions tests/Run_Tests.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -491,7 +491,8 @@
"widgetDisplayType": "Text",
"validationRegex": null
},
"parameterDataType": "String"
"parameterDataType": "String",
"dynamic": false
},
"widgetInfo": {
"widgetType": "text",
Expand All @@ -517,7 +518,8 @@
"widgetDisplayType": "Text",
"validationRegex": null
},
"parameterDataType": "String"
"parameterDataType": "String",
"dynamic": false
},
"widgetInfo": {
"widgetType": "text",
Expand All @@ -543,7 +545,8 @@
"widgetDisplayType": "Text",
"validationRegex": null
},
"parameterDataType": "String"
"parameterDataType": "String",
"dynamic": false
},
"widgetInfo": {
"widgetType": "text",
Expand Down
Loading