chore(dataflow/gemma): update dependencies and format code#14321
chore(dataflow/gemma): update dependencies and format code#14321XrossFox wants to merge 13 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the Gemma Dataflow sample by changing the Python SDK base image from version 3.14 to 3.11, upgrading TensorFlow to 2.20.0, and updating Apache Beam to 2.74.0. It also upgrades several runtime and test dependencies and applies code formatting across multiple files. Feedback on the changes suggests updating noxfile_config.py to ignore Python versions 3.12, 3.13, and 3.14 to align with the target Python 3.11 environment and prevent redundant test runs.
- Update tensorflow base image to 2.20.0-gpu and beam sdk to 3.11/2.74.0 - Update apache_beam, keras, keras_nlp, and protobuf dependencies - Update test dependencies including google-cloud-aiplatform, storage, and pytest - Format custom_model_gemma.py and e2e_test.py - Update ignored python versions in noxfile_config.py
24624a0 to
7534737
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the project to Python 3.11, bumps dependency versions in the Dockerfile and requirements files, and modifies the end-to-end test configuration. The reviewer identified critical issues where several specified Docker image and PyPI package versions do not exist, which will cause build and installation failures. Additionally, changing the GCS test bucket to a hardcoded personal bucket in the end-to-end tests will break the CI/CD pipeline.
|
|
||
| DATAFLOW_MACHINE_TYPE = "g2-standard-4" | ||
| DATAFLOW_MACHINE_TYPE = "g2-standard-8" | ||
| # NOTE: For local testing, ensure the 'gemma_2b_en' directory is uploaded |
There was a problem hiding this comment.
I believe this message can be improved for better readability.
| # NOTE: For local testing, ensure the 'gemma_2b_en' directory is uploaded | |
| # TODO(developer): For local testing, ensure the 'gemma_2b_en' directory is uploaded | |
| # to your GCS bucket. Update the constant below to point to the root path | |
| # of the uploaded directory (e.g., 'gs://<your-bucket-name>/path/to/gemma_2b_en'). |
| f"--temp_location=gs://{bucket_name}/temp", | ||
| f"--region={location}", | ||
| f"--machine_type={DATAFLOW_MACHINE_TYPE}", | ||
| "--disk_size_gb=100", |
There was a problem hiding this comment.
Is there a specific reason for hardcoding the disk size, rather than passing it as a parameter?
There was a problem hiding this comment.
I dont think there is a need to change this value any more, it was set to 100gb because the default is 30gb and was throwing error due to disk size limitation. 100 gb should be more than enough for now and in the future.

Description
Fixes b/521868825
Checklist
Testing
Compliance & Style
Post-Approval Actions