Refactor download_url test into TestDownloadUrl class#8829
Refactor download_url test into TestDownloadUrl class#8829e-mny wants to merge 6 commits intoProject-MONAI:devfrom
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughUpdates test-data URLs in tests/testing_data/data_config.json to point to the HuggingFace testing_data dataset. Modifies tests/test_utils.py: expands skip logic to treat ValueError messages containing "hash check" as skip-worthy, updates DOWNLOAD_FAIL_MSGS to use "hash check", adds SAMPLE_TIFF constants and a TestDownloadUrl unit test that verifies download success with a correct hash and that an incorrect hash raises RuntimeError. Changes monai/apps/utils.py: download_url now raises ValueError (with an expanded message) when a post-download hash check fails. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/testing_data/data_config.json (1)
88-92:⚠️ Potential issue | 🟡 MinorTwo URLs were missed in the HF migration.
images.nrrd_example(line 89) andconfigs.test_meta_file(line 165) still point atgithub.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/.... Per the PR objective (issue#8510), all test data URLs should move to the HF dataset. Please migrate these too (or note why they're exempt).#!/bin/bash # Confirm remaining GitHub Releases URLs in the config. rg -n 'MONAI-extra-test-data' tests/testing_data/data_config.jsonAlso applies to: 163-168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testing_data/data_config.json` around lines 88 - 92, The two remaining test-data entries images.nrrd_example and configs.test_meta_file still point to the old GitHub release; update their "url" fields to the corresponding HuggingFace dataset URLs used by the rest of the config (replace the github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/... links with the HF dataset path), and if the file content or location changed on HF, update the "hash_val" (keeping "hash_type": "sha256") to the new checksum; after changing images.nrrd_example and configs.test_meta_file in data_config.json, run the repo search used in the PR verification (e.g., rg -n 'MONAI-extra-test-data' tests/testing_data/data_config.json) to ensure no remaining GitHub-release links remain.
🧹 Nitpick comments (1)
tests/test_utils.py (1)
195-215: Consider sourcing URL/hash fromtesting_data_configinstead of hard-coding.
wsi_generic_tiffindata_config.jsonalready holds the same URL/hash_type/hash_val. Usingtesting_data_config("images", "wsi_generic_tiff", ...)would avoid duplication and keep the test in sync if the config is ever updated.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 195 - 215, The test_download_url test currently hard-codes SAMPLE_TIFF, SAMPLE_TIFF_HASH and SAMPLE_TIFF_HASH_TYPE; replace those with values loaded from the shared test config by calling testing_data_config("images", "wsi_generic_tiff", "url"), testing_data_config("images", "wsi_generic_tiff", "hash_val") and testing_data_config("images", "wsi_generic_tiff", "hash_type") (or destructure a single call if helper returns all fields) and pass those into download_url instead of the constants; update imports in tests/test_utils.py to include testing_data_config and keep the existing download_url usage and assertions unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_utils.py`:
- Around line 209-215: Replace the second transient download with a
deterministic hash-mismatch check: first download SAMPLE_TIFF once (guarded by
skip_if_downloading_fails()) into a temp filepath (e.g., model.tiff) using
download_url, then call download_url again with the same filepath but with the
incorrect hash_val to trigger the hash-validation branch and
assertRaises(RuntimeError); this avoids re-downloading to model_bad.tiff and
eliminates network flakiness—use the symbols download_url,
skip_if_downloading_fails, SAMPLE_TIFF, and SAMPLE_TIFF_HASH_TYPE to locate and
modify the test.
---
Outside diff comments:
In `@tests/testing_data/data_config.json`:
- Around line 88-92: The two remaining test-data entries images.nrrd_example and
configs.test_meta_file still point to the old GitHub release; update their "url"
fields to the corresponding HuggingFace dataset URLs used by the rest of the
config (replace the
github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/... links
with the HF dataset path), and if the file content or location changed on HF,
update the "hash_val" (keeping "hash_type": "sha256") to the new checksum; after
changing images.nrrd_example and configs.test_meta_file in data_config.json, run
the repo search used in the PR verification (e.g., rg -n 'MONAI-extra-test-data'
tests/testing_data/data_config.json) to ensure no remaining GitHub-release links
remain.
---
Nitpick comments:
In `@tests/test_utils.py`:
- Around line 195-215: The test_download_url test currently hard-codes
SAMPLE_TIFF, SAMPLE_TIFF_HASH and SAMPLE_TIFF_HASH_TYPE; replace those with
values loaded from the shared test config by calling
testing_data_config("images", "wsi_generic_tiff", "url"),
testing_data_config("images", "wsi_generic_tiff", "hash_val") and
testing_data_config("images", "wsi_generic_tiff", "hash_type") (or destructure a
single call if helper returns all fields) and pass those into download_url
instead of the constants; update imports in tests/test_utils.py to include
testing_data_config and keep the existing download_url usage and assertions
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d4c21ce8-3d39-4944-869e-df48b5adeeb6
📒 Files selected for processing (2)
tests/test_utils.pytests/testing_data/data_config.json
|
Please have a look at the DCO issue, we should be able to do a remedial commit to sort it. |
DCO Remediation Commit for Enoch Mok <enochmokny@gmail.com> I, Enoch Mok <enochmokny@gmail.com>, hereby add my Signed-off-by to this commit: d02da49 Signed-off-by: Enoch Mok <enochmokny@gmail.com>
for more information, see https://pre-commit.ci
|
Thanks @ericspod for your guidance throughout this, I am excited to have made my first contribution (albeit small)! I do wish to be doing this consistently - may I kindly ask if you guys have preference on which issues I should focus on, whether those in the 'Issues' tab or under MONAI v1.6 are more important to you guys? To future collaborations and contributions 🍻 |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/test_utils.py (1)
179-187:⚠️ Potential issue | 🔴 CriticalBroken
exceptchain —rt_eis undefined and non-matchingRuntimeErrors are now swallowed.The new
except ValueErrorblock was inserted between the(RuntimeError, OSError)body and itsraise rt_e, so:
- Line 187
raise rt_enow lives inside theexcept ValueErrorhandler, wherert_eis not in scope (Ruff F821). If aValueErrorwithout"hash check"is raised, this path explodes withNameError.- The original intent — re-raising a
RuntimeError/OSErrorwhose message doesn't matchDOWNLOAD_FAIL_MSGS— is gone. Such errors are now silently swallowed, masking real failures.- Missing
fromon the re-raise (B904).Proposed fix
except (RuntimeError, OSError) as rt_e: err_str = str(rt_e) if any(k in err_str for k in DOWNLOAD_FAIL_MSGS): raise unittest.SkipTest(f"Error while downloading: {rt_e}") from rt_e # incomplete download + raise except ValueError as v_e: if "hash check" in str(v_e): raise unittest.SkipTest(f"Hash value error while downloading: {v_e}") from v_e - - raise rt_e + raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 179 - 187, The except chain currently mis-scopes rt_e and swallows RuntimeError/OSError; move the logic so that the (RuntimeError, OSError) except (the block that defines rt_e) either re-raises the original exception when its message doesn't match DOWNLOAD_FAIL_MSGS (use "raise rt_e from rt_e" or simply "raise" inside that except to preserve traceback), and keep the ValueError except separate (handle "hash check" by raising unittest.SkipTest with v_e, otherwise re-raise the ValueError with "raise" or "raise v_e from v_e"); ensure rt_e is not referenced inside the except ValueError block and that all re-raises include proper "from" chaining where applicable.monai/apps/utils.py (1)
215-276:⚠️ Potential issue | 🔴 CriticalDocstring and breaking change:
RuntimeErrorvsValueErrorLine 223 documents
RuntimeErrorfor downloaded-file hash failure, but line 271 raisesValueError. This breaks both the docstring and downstream code:
tests/apps/test_download_and_extract.pyline 45 catches(ContentTooShortError, HTTPError, RuntimeError)—ValueErrorescapes uncaught.- Line 48 checks
str(e).startswith("md5 check"), but the new message is"md5 hash check ...", so the assertion fails even if caught.- External code catching
RuntimeErrorarounddownload_urlis similarly broken.Fix: Change line 271 back to
RuntimeError(simplest, non-breaking) or update the docstring at line 223 and all call sites together.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/apps/utils.py` around lines 215 - 276, The post-download hash-failure currently raises ValueError (the raise after logger.info and check_hash) which conflicts with the docstring/tests expecting a RuntimeError and a message starting with "{hash_type} check"; change that raise to raise RuntimeError and restore the message prefix to "{hash_type} check of downloaded file failed: URL={url}, filepath={filepath}, expected {hash_type}={hash_val}..." so callers/tests that catch RuntimeError and check str(e).startswith("{hash_type} check") will continue to work; keep the check_hash call and other surrounding logic (functions: check_hash, _download_with_progress, _basename) unchanged.
🧹 Nitpick comments (1)
tests/test_utils.py (1)
195-218: Docstring nit: document the actual class contract.Per the repo's Python guidelines (Google-style docstrings documenting raised exceptions), the class/method docstrings here say
Raises: RuntimeErrorbut post-fix the hash-mismatch path should beValueError. Update once the exception type is settled (see comment on Line 212).As per coding guidelines: "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 195 - 218, Update the docstring and test to reflect the correct exception contract for download_url: change the "Raises: RuntimeError" entry in the test_download_url docstring to "Raises: ValueError" and update the assertion that expects an exception (self.assertRaises(RuntimeError)) to expect ValueError instead; refer to the TestDownloadUrl class and its test_download_url method and the download_url call to ensure the docstring and assertion match the download_url behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_utils.py`:
- Line 82: The test expectation for the hash-failure message doesn't match the
RuntimeError raised in download_url's existing-file branch
(RuntimeError(f"{hash_type} check of existing file failed: ...")) — update the
test in tests/test_utils.py to assert a common substring present in both
messages (e.g., " check" or include both variants like f"{hash_type} check" and
"hash check") or change the assertion to check for f"{hash_type} check" so it
matches the existing-file error and the downloaded-file error consistently;
reference the download_url function and the RuntimeError message formatting when
making the change.
- Around line 212-218: The test currently expects a RuntimeError but
download_url now raises ValueError on the fresh-download/hash-mismatch path:
update the assertion to use self.assertRaises(ValueError) for the download_url
call (identify the call by download_url(..., filepath=os.path.join(tempdir,
"model_bad.tiff"), hash_val="0"*64, hash_type=SAMPLE_TIFF_HASH_TYPE)).
Additionally, wrap that download_url invocation in a narrow network-exception
guard so CI flakes don’t fail the assertion: catch network-related exceptions
(e.g., requests.exceptions.RequestException / urllib.error.URLError / OSError)
and call or re-use skip_if_downloading_fails() (or otherwise skip the test) for
those cases, but do not catch or mask the ValueError path so the assertRaises
still verifies the hash-error behavior.
---
Outside diff comments:
In `@monai/apps/utils.py`:
- Around line 215-276: The post-download hash-failure currently raises
ValueError (the raise after logger.info and check_hash) which conflicts with the
docstring/tests expecting a RuntimeError and a message starting with
"{hash_type} check"; change that raise to raise RuntimeError and restore the
message prefix to "{hash_type} check of downloaded file failed: URL={url},
filepath={filepath}, expected {hash_type}={hash_val}..." so callers/tests that
catch RuntimeError and check str(e).startswith("{hash_type} check") will
continue to work; keep the check_hash call and other surrounding logic
(functions: check_hash, _download_with_progress, _basename) unchanged.
In `@tests/test_utils.py`:
- Around line 179-187: The except chain currently mis-scopes rt_e and swallows
RuntimeError/OSError; move the logic so that the (RuntimeError, OSError) except
(the block that defines rt_e) either re-raises the original exception when its
message doesn't match DOWNLOAD_FAIL_MSGS (use "raise rt_e from rt_e" or simply
"raise" inside that except to preserve traceback), and keep the ValueError
except separate (handle "hash check" by raising unittest.SkipTest with v_e,
otherwise re-raise the ValueError with "raise" or "raise v_e from v_e"); ensure
rt_e is not referenced inside the except ValueError block and that all re-raises
include proper "from" chaining where applicable.
---
Nitpick comments:
In `@tests/test_utils.py`:
- Around line 195-218: Update the docstring and test to reflect the correct
exception contract for download_url: change the "Raises: RuntimeError" entry in
the test_download_url docstring to "Raises: ValueError" and update the assertion
that expects an exception (self.assertRaises(RuntimeError)) to expect ValueError
instead; refer to the TestDownloadUrl class and its test_download_url method and
the download_url call to ensure the docstring and assertion match the
download_url behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e6a9e321-e574-4979-b315-d9e2ce06b551
📒 Files selected for processing (2)
monai/apps/utils.pytests/test_utils.py
| "network issue", | ||
| "gdown dependency", # gdown not installed | ||
| "md5 check", | ||
| "hash check", # check hash value of downloaded file |
There was a problem hiding this comment.
"hash check" no longer matches the existing-file failure message.
The existing-file branch in download_url raises RuntimeError(f"{hash_type} check of existing file failed: ...") — which does not contain the substring "hash check". Only the new downloaded-file message ("{hash_type} hash check of downloaded file failed") matches. Consider using " check" or keeping both "md5 check" and "hash check", or aligning both error messages to share a common phrase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_utils.py` at line 82, The test expectation for the hash-failure
message doesn't match the RuntimeError raised in download_url's existing-file
branch (RuntimeError(f"{hash_type} check of existing file failed: ...")) —
update the test in tests/test_utils.py to assert a common substring present in
both messages (e.g., " check" or include both variants like f"{hash_type} check"
and "hash check") or change the assertion to check for f"{hash_type} check" so
it matches the existing-file error and the downloaded-file error consistently;
reference the download_url function and the RuntimeError message formatting when
making the change.
| with self.assertRaises(RuntimeError): | ||
| download_url( | ||
| url=SAMPLE_TIFF, | ||
| filepath=os.path.join(tempdir, "model_bad.tiff"), | ||
| hash_val="0" * 64, | ||
| hash_type=SAMPLE_TIFF_HASH_TYPE, | ||
| ) |
There was a problem hiding this comment.
assertRaises(RuntimeError) will not match — download_url now raises ValueError on this path.
model_bad.tiff does not pre-exist, so this goes through the fresh-download branch, which per the companion change in monai/apps/utils.py now raises ValueError (not RuntimeError). The test will fail deterministically whenever the network is healthy enough to complete the download.
- with self.assertRaises(RuntimeError):
+ with self.assertRaises(ValueError):
download_url(
url=SAMPLE_TIFF,
filepath=os.path.join(tempdir, "model_bad.tiff"),
hash_val="0" * 64,
hash_type=SAMPLE_TIFF_HASH_TYPE,
)Also worth wrapping network exceptions here so CI flakes don't masquerade as assertion failures — but per @ericspod's earlier comment you want the second download to be a real attempt, so only catch non-hash errors (e.g., let skip_if_downloading_fails() handle transients while still asserting the hash path raises).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_utils.py` around lines 212 - 218, The test currently expects a
RuntimeError but download_url now raises ValueError on the
fresh-download/hash-mismatch path: update the assertion to use
self.assertRaises(ValueError) for the download_url call (identify the call by
download_url(..., filepath=os.path.join(tempdir, "model_bad.tiff"),
hash_val="0"*64, hash_type=SAMPLE_TIFF_HASH_TYPE)). Additionally, wrap that
download_url invocation in a narrow network-exception guard so CI flakes don’t
fail the assertion: catch network-related exceptions (e.g.,
requests.exceptions.RequestException / urllib.error.URLError / OSError) and call
or re-use skip_if_downloading_fails() (or otherwise skip the test) for those
cases, but do not catch or mask the ValueError path so the assertRaises still
verifies the hash-error behavior.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/test_utils.py (2)
198-218:⚠️ Potential issue | 🔴 CriticalExpect
ValueErrorfor the bad-hash download.
download_urlnow raisesValueErroron the fresh-download hash-mismatch path, so this assertion and docstring are stale.Proposed fix
- Raises: - RuntimeError: When the downloaded file's hash does not match. + Raises: + AssertionError: If the bad hash does not fail validation. @@ - with self.assertRaises(RuntimeError): + with self.assertRaises(ValueError): download_url( url=SAMPLE_TIFF, filepath=os.path.join(tempdir, "model_bad.tiff"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 198 - 218, Update the test and docstring to expect ValueError instead of RuntimeError for a hash-mismatch on a fresh download: change the test_download_url docstring "Raises" section and the assertion using self.assertRaises(RuntimeError) to self.assertRaises(ValueError), referencing the download_url function and the test_download_url test that triggers the bad-hash path.
78-87:⚠️ Potential issue | 🟡 MinorKeep a sentinel for existing-file hash failures too.
"hash check"matches downloaded-file failures, but not the existing-file message pattern likesha256 check of existing file failed, so cached hash failures may no longer be skipped.Proposed fix
- "hash check", # check hash value of downloaded file + "hash check", # check hash value of downloaded file + "check of existing file failed", # check hash value of existing file🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 78 - 87, The DOWNLOAD_FAIL_MSGS tuple used to detect download/cache failures misses the cached-file hash-failure pattern, so add an additional sentinel string that matches messages like "sha256 check of existing file failed" (or a broader phrase such as "check of existing file failed" or "sha256 check") to DOWNLOAD_FAIL_MSGS so cached hash failures are recognized and skipped; update the DOWNLOAD_FAIL_MSGS definition (the DOWNLOAD_FAIL_MSGS symbol) to include that sentinel alongside the existing "hash check" entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_utils.py`:
- Around line 185-187: The except block in tests/test_utils.py catches
ValueError as v_e and only raises SkipTest when the message contains "hash
check", but swallows all other ValueError instances; update the except
ValueError handler (the branch handling v_e) so that if "hash check" is not in
str(v_e) you re-raise the original ValueError (raise v_e) instead of letting it
pass silently—keep the existing SkipTest behavior when the message matches.
---
Duplicate comments:
In `@tests/test_utils.py`:
- Around line 198-218: Update the test and docstring to expect ValueError
instead of RuntimeError for a hash-mismatch on a fresh download: change the
test_download_url docstring "Raises" section and the assertion using
self.assertRaises(RuntimeError) to self.assertRaises(ValueError), referencing
the download_url function and the test_download_url test that triggers the
bad-hash path.
- Around line 78-87: The DOWNLOAD_FAIL_MSGS tuple used to detect download/cache
failures misses the cached-file hash-failure pattern, so add an additional
sentinel string that matches messages like "sha256 check of existing file
failed" (or a broader phrase such as "check of existing file failed" or "sha256
check") to DOWNLOAD_FAIL_MSGS so cached hash failures are recognized and
skipped; update the DOWNLOAD_FAIL_MSGS definition (the DOWNLOAD_FAIL_MSGS
symbol) to include that sentinel alongside the existing "hash check" entry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e8f5b280-b860-48e2-8ba3-4aef3a1f0fe9
📒 Files selected for processing (1)
tests/test_utils.py
| except ValueError as v_e: | ||
| if "hash check" in str(v_e): | ||
| raise unittest.SkipTest(f"Hash value error while downloading: {v_e}") from v_e |
There was a problem hiding this comment.
Re-raise unrelated ValueErrors.
A non-hash ValueError currently falls through and gets swallowed, which can make failing tests pass.
Proposed fix
except ValueError as v_e:
if "hash check" in str(v_e):
raise unittest.SkipTest(f"Hash value error while downloading: {v_e}") from v_e
+ raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_utils.py` around lines 185 - 187, The except block in
tests/test_utils.py catches ValueError as v_e and only raises SkipTest when the
message contains "hash check", but swallows all other ValueError instances;
update the except ValueError handler (the branch handling v_e) so that if "hash
check" is not in str(v_e) you re-raise the original ValueError (raise v_e)
instead of letting it pass silently—keep the existing SkipTest behavior when the
message matches.
Fixes #8510
Description
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests --disttests.