Skip to content

Validate downloaded file integrity and raise ValueError on hash mismatch#8833

Open
e-mny wants to merge 11 commits intoProject-MONAI:devfrom
e-mny:8832-hashvalueerror
Open

Validate downloaded file integrity and raise ValueError on hash mismatch#8833
e-mny wants to merge 11 commits intoProject-MONAI:devfrom
e-mny:8832-hashvalueerror

Conversation

@e-mny
Copy link
Copy Markdown

@e-mny e-mny commented Apr 18, 2026

Fixes #8832 .

Description

Adds a check on downloaded files to verify their hash against the expected value and raises a ValueError if there is a mismatch, indicating possible corruption or tampering.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality)..
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.

e-mny and others added 8 commits April 17, 2026 13:04
DCO Remediation Commit for Enoch Mok <enochmokny@gmail.com>

I, Enoch Mok <enochmokny@gmail.com>, hereby add my Signed-off-by to this commit: d02da49

Signed-off-by: Enoch Mok <enochmokny@gmail.com>
Signed-off-by: Enoch Mok <enochmokny@gmail.com>
…tch; used ./runtests.sh autofix

Signed-off-by: Enoch Mok <enochmokny@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

monai/apps/utils.py: changed hash-failure exceptions from RuntimeError to ValueError for both newly downloaded and existing files; moved post-download hash validation to validate the temporary download file before renaming and removed the later final-file check; expanded failure message to include URL, expected hash, and corruption/tamper guidance. tests/test_utils.py: updated download-failure sentinel to "hash check", added ValueError handling in skip_if_downloading_fails(), introduced SAMPLE_TIFF, SAMPLE_TIFF_HASH, SAMPLE_TIFF_HASH_TYPE, and added TestDownloadUrl covering success and hash-mismatch failure. tests/apps/test_download_and_extract.py: accept ValueError/"hash check" for bad-hash assertions. tests/testing_data/data_config.json: updated all artifact URLs to Hugging Face.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main change: replacing RuntimeError with ValueError for hash mismatches on downloaded files.
Description check ✅ Passed Description covers required sections but has formatting issues (double period after 'functionality..') and missing details on new test additions.
Linked Issues check ✅ Passed PR successfully addresses #8832: raises ValueError instead of RuntimeError on hash mismatch, making it visible and not suppressed by skip_if_download_fails.
Out of Scope Changes check ✅ Passed All changes are in-scope: exception type change, error messaging improvements, test additions, and test data URL migration are all necessary for implementing hash validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
monai/apps/utils.py (2)

263-276: ⚠️ Potential issue | 🟠 Major

Validate the temporary file before caching it.

The file is moved to filepath before the hash check, so a mismatch leaves the corrupted/tampered file in the cache and later retries hit the bad existing file.

Proposed fix
-            file_dir = filepath.parent
-            if file_dir:
-                os.makedirs(file_dir, exist_ok=True)
-            shutil.move(f"{tmp_name}", f"{filepath}")  # copy the downloaded to a user-specified cache.
+            if not check_hash(tmp_name, hash_val, hash_type):
+                raise ValueError(
+                    f"{hash_type} hash check of downloaded file failed: URL={url}, "
+                    f"filepath={filepath}, expected {hash_type}={hash_val}, "
+                    f"The file may be corrupted or tampered with. "
+                    "Please retry the download or verify the source."
+                )
+            file_dir = filepath.parent
+            if file_dir:
+                os.makedirs(file_dir, exist_ok=True)
+            shutil.move(f"{tmp_name}", f"{filepath}")  # copy the downloaded to a user-specified cache.
     except (PermissionError, NotADirectoryError):  # project-monai/monai issue `#3613` `#3757` for windows
         pass
     logger.info(f"Downloaded: {filepath}")
-    if not check_hash(filepath, hash_val, hash_type):
-        raise ValueError(
-            f"{hash_type} hash check of downloaded file failed: URL={url}, "
-            f"filepath={filepath}, expected {hash_type}={hash_val}, "
-            f"The file may be corrupted or tampered with. "
-            "Please retry the download or verify the source."
-        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/apps/utils.py` around lines 263 - 276, The code moves the downloaded
temporary file (tmp_name) to the final cache path (filepath) before calling
check_hash, which can leave a corrupted file cached; instead, run check_hash
against the temporary file (use check_hash(tmp_name, hash_val, hash_type)), only
create the target directory and shutil.move(tmp_name, filepath) after the hash
passes, and if the hash fails delete the temporary file and raise the ValueError
(preserve the same error message). Update the logic around tmp_name, filepath
and the check_hash call accordingly in the same function where tmp_name and
filepath are used so that corrupted downloads are never moved into the cache.

215-224: ⚠️ Potential issue | 🟡 Minor

Document the new ValueError contract.

Line 223 still documents RuntimeError, but downloaded hash mismatches now raise ValueError.

Proposed fix
-        RuntimeError: When the hash validation of the ``url`` downloaded file fails.
+        ValueError: When the hash validation of the ``url`` downloaded file fails.

As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/apps/utils.py` around lines 215 - 224, Update the docstring Raises
section for the download routine in monai/apps/utils.py so it documents the new
ValueError contract: replace the RuntimeError entry that states "When the hash
validation of the ``url`` downloaded file fails." (and any other RuntimeError
entry referring to downloaded hash mismatch) with ValueError, and ensure the
entry that mentions the existing ``filepath`` hash validation also reflects the
correct exception type if it changed; keep other urllib-related exceptions
(URLError, HTTPError, ContentTooShortError, IOError) as-is and keep messaging
consistent with the parameters ``filepath`` and ``url``.
tests/test_utils.py (1)

78-87: ⚠️ Potential issue | 🔴 Critical

Do not skip or swallow ValueError hash mismatches.

This defeats the new ValueError behavior, and non-matching ValueErrors fall through silently because the branch does not re-raise.

Proposed fix
 DOWNLOAD_FAIL_MSGS = (
     "unexpected EOF",  # incomplete download
     "network issue",
     "gdown dependency",  # gdown not installed
-    "hash check",  # check hash value of downloaded file
     "limit",  # HTTP Error 503: Egress is over the account limit
     "authenticate",
     "timed out",  # urlopen error [Errno 110] Connection timed out
     "HTTPError",  # HTTPError: 429 Client Error: Too Many Requests for huggingface hub
 )
@@
-    except ValueError as v_e:
-        if "hash check" in str(v_e):
-            raise unittest.SkipTest(f"Hash value error while downloading: {v_e}") from v_e

Also applies to: 179-187

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils.py` around lines 78 - 87, The code currently swallows
ValueError exceptions when matching messages in DOWNLOAD_FAIL_MSGS; change the
exception handling where DOWNLOAD_FAIL_MSGS is consulted so that if the caught
exception is a ValueError it is re-raised immediately (do not treat it as a
benign download failure), while preserving the existing fallback logic for other
exception types; update both places that check DOWNLOAD_FAIL_MSGS (the exception
handlers that inspect exception messages around the DOWNLOAD_FAIL_MSGS use) to
first if isinstance(e, ValueError): raise before matching messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_utils.py`:
- Around line 198-218: The test asserts the wrong exception type: update
test_download_url to expect ValueError (not RuntimeError) when download_url
detects a hash mismatch, and also update the test's docstring "Raises:" section
to list ValueError instead of RuntimeError; locate the assertion block that
calls download_url with a bad hash and change the exception class accordingly
and adjust the docstring text to reflect ValueError as the raised exception from
download_url.

---

Outside diff comments:
In `@monai/apps/utils.py`:
- Around line 263-276: The code moves the downloaded temporary file (tmp_name)
to the final cache path (filepath) before calling check_hash, which can leave a
corrupted file cached; instead, run check_hash against the temporary file (use
check_hash(tmp_name, hash_val, hash_type)), only create the target directory and
shutil.move(tmp_name, filepath) after the hash passes, and if the hash fails
delete the temporary file and raise the ValueError (preserve the same error
message). Update the logic around tmp_name, filepath and the check_hash call
accordingly in the same function where tmp_name and filepath are used so that
corrupted downloads are never moved into the cache.
- Around line 215-224: Update the docstring Raises section for the download
routine in monai/apps/utils.py so it documents the new ValueError contract:
replace the RuntimeError entry that states "When the hash validation of the
``url`` downloaded file fails." (and any other RuntimeError entry referring to
downloaded hash mismatch) with ValueError, and ensure the entry that mentions
the existing ``filepath`` hash validation also reflects the correct exception
type if it changed; keep other urllib-related exceptions (URLError, HTTPError,
ContentTooShortError, IOError) as-is and keep messaging consistent with the
parameters ``filepath`` and ``url``.

In `@tests/test_utils.py`:
- Around line 78-87: The code currently swallows ValueError exceptions when
matching messages in DOWNLOAD_FAIL_MSGS; change the exception handling where
DOWNLOAD_FAIL_MSGS is consulted so that if the caught exception is a ValueError
it is re-raised immediately (do not treat it as a benign download failure),
while preserving the existing fallback logic for other exception types; update
both places that check DOWNLOAD_FAIL_MSGS (the exception handlers that inspect
exception messages around the DOWNLOAD_FAIL_MSGS use) to first if isinstance(e,
ValueError): raise before matching messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9b3e7460-f1d0-4896-8762-66d40df2812f

📥 Commits

Reviewing files that changed from the base of the PR and between 851054c and 51a49b0.

📒 Files selected for processing (3)
  • monai/apps/utils.py
  • tests/test_utils.py
  • tests/testing_data/data_config.json

Comment thread tests/test_utils.py
Signed-off-by: Enoch Mok <enochmokny@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/apps/utils.py (1)

258-272: ⚠️ Potential issue | 🔴 Critical

🐛 Critical: hash check runs against the wrong path — always fails for new downloads.

At line 262 you call check_hash(filepath, ...), but the downloaded bytes live at tmp_name. filepath isn't created until the shutil.move on line 272. check_hash will hit its open(...) except branch, log an error, and return False, so every successfully-downloaded file will raise ValueError("... hash check of downloaded file failed ...") regardless of whether the hash actually matches.

The reason the new TestDownloadUrl.test_download_url "passes" today is that skip_if_downloading_fails() converts this ValueError into a SkipTest — the success path isn't actually being exercised.

🔧 Proposed fix
             if not tmp_name.exists():
                 raise RuntimeError(
                     f"Download of file from {url} to {filepath} failed due to network issue or denied permission."
                 )
-            if not check_hash(filepath, hash_val, hash_type):
+            if not check_hash(tmp_name, hash_val, hash_type):
                 raise ValueError(
                     f"{hash_type} hash check of downloaded file failed: URL={url}, "
                     f"filepath={filepath}, expected {hash_type}={hash_val}, "
                     f"The file may be corrupted or tampered with. "
                     "Please retry the download or verify the source."
                 )
             file_dir = filepath.parent
             if file_dir:
                 os.makedirs(file_dir, exist_ok=True)
             shutil.move(f"{tmp_name}", f"{filepath}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/apps/utils.py` around lines 258 - 272, The hash check is being done
against filepath which doesn't exist yet; change the code to run check_hash on
the temporary download path tmp_name (i.e., call check_hash(tmp_name, hash_val,
hash_type)) before creating/moving to filepath, then ensure the destination
directory (file_dir) is created and finally shutil.move(tmp_name, filepath);
update any related error messages if needed to reference the tmp_name when
reporting the downloaded file being validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_utils.py`:
- Around line 212-218: The failing test is network-fragile because the second
download_url call expecting a ValueError can instead raise network errors; wrap
the bad-hash scenario in the same network-skip guard or separate network from
hash checking: first call download_url under skip_if_downloading_fails() with
the correct hash to ensure the file is present, then call the pure hash checker
(e.g., check_hash or whatever validation routine used by download_url) against
that file to assert it raises ValueError for a bad hash, or re-enter
skip_if_downloading_fails() around the bad-hash download and convert network
exceptions to skips while asserting that a returned/created file triggers the
ValueError path. Ensure you reference download_url and skip_if_downloading_fails
(and the module's hash validation function) when making the change.
- Around line 185-187: The except ValueError handler in tests/test_utils.py
currently only raises SkipTest when the exception message contains "hash check"
and otherwise swallows the ValueError; modify the except ValueError as v_e block
to re-raise v_e when the message does not indicate a hash-check failure (i.e.,
keep the unittest.SkipTest(f"Hash value error while downloading: {v_e}") from
v_e branch for "hash check" and add a plain raise v_e for all other cases) so
genuine ValueErrors from the test body are propagated.

---

Outside diff comments:
In `@monai/apps/utils.py`:
- Around line 258-272: The hash check is being done against filepath which
doesn't exist yet; change the code to run check_hash on the temporary download
path tmp_name (i.e., call check_hash(tmp_name, hash_val, hash_type)) before
creating/moving to filepath, then ensure the destination directory (file_dir) is
created and finally shutil.move(tmp_name, filepath); update any related error
messages if needed to reference the tmp_name when reporting the downloaded file
being validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b3be07b5-74d1-45d2-abef-997ad1617025

📥 Commits

Reviewing files that changed from the base of the PR and between 51a49b0 and 98ccecc.

📒 Files selected for processing (2)
  • monai/apps/utils.py
  • tests/test_utils.py

Comment thread tests/test_utils.py
Comment thread tests/test_utils.py Outdated
Signed-off-by: Enoch Mok <enochmokny@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/apps/utils.py (1)

215-223: ⚠️ Potential issue | 🟡 Minor

Update the stale raised-exception docs.

Line 216 still says existing-file hash validation raises RuntimeError, but Line 231 now raises ValueError.

Proposed fix
     Raises:
-        RuntimeError: When the hash validation of the ``filepath`` existing file fails.
+        ValueError: When the hash validation of the ``filepath`` existing file fails.
         RuntimeError: When a network issue or denied permission prevents the
             file download from ``url`` to ``filepath``.

As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/apps/utils.py` around lines 215 - 223, The docstring's Raises section
is inconsistent: it still claims a RuntimeError is raised for an existing-file
hash mismatch (refers to ``filepath``) while the implementation now raises
ValueError for the downloaded-URL hash check (refers to ``url``). Edit the
Raises block in the function's Google-style docstring (the one mentioning
filepath and url) so the exception types and descriptions match the current
behavior: replace the stale RuntimeError mention with the correct exception
type(s) and ensure the entries for the filepath/hash check and the
url/download/hash check accurately state ValueError (or RuntimeError only where
actually raised), and keep other URL/HTTP/IO exceptions unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/apps/utils.py`:
- Around line 231-233: Tests expecting a RuntimeError for hash-mismatch should
be updated to expect ValueError and the revised message text; change the
assertions in tests/apps/test_download_and_extract.py (the block that calls
download_url(... wrong_md5) and the other occurrence around the same check) to
catch ValueError instead of RuntimeError and assert the message contains "md5
hash check" (or more generically "{hash_type} hash check") to match the new
raise in monai/apps/utils.py (the ValueError raised in the hash check path).

In `@tests/test_utils.py`:
- Around line 213-215: The test currently calls
download_url(filepath=model_path, hash_val="0"*64,
hash_type=SAMPLE_TIFF_HASH_TYPE) without the required url, causing a TypeError
before the intended hash check; update the test to pass a valid url (e.g., the
same model_path or a known test URL) as the url argument when calling
download_url so the function runs to the hash validation and raises ValueError
as expected (adjust the call in tests/test_utils.py where download_url is
invoked inside the with self.assertRaises(ValueError) block).

---

Outside diff comments:
In `@monai/apps/utils.py`:
- Around line 215-223: The docstring's Raises section is inconsistent: it still
claims a RuntimeError is raised for an existing-file hash mismatch (refers to
``filepath``) while the implementation now raises ValueError for the
downloaded-URL hash check (refers to ``url``). Edit the Raises block in the
function's Google-style docstring (the one mentioning filepath and url) so the
exception types and descriptions match the current behavior: replace the stale
RuntimeError mention with the correct exception type(s) and ensure the entries
for the filepath/hash check and the url/download/hash check accurately state
ValueError (or RuntimeError only where actually raised), and keep other
URL/HTTP/IO exceptions unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a321ffa5-076e-4483-8541-5bec4e0da138

📥 Commits

Reviewing files that changed from the base of the PR and between 98ccecc and 29ec2fe.

📒 Files selected for processing (2)
  • monai/apps/utils.py
  • tests/test_utils.py

Comment thread monai/apps/utils.py
Comment thread tests/test_utils.py Outdated
Signed-off-by: Enoch Mok <enochmokny@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_utils.py (1)

78-83: ⚠️ Potential issue | 🟠 Major

Do not suppress hash-integrity failures here.

This reintroduces the behavior the PR is meant to avoid: hash mismatches become SkipTest instead of visible failures under skip_if_downloading_fails().

Proposed fix
 DOWNLOAD_FAIL_MSGS = (
     "unexpected EOF",  # incomplete download
     "network issue",
     "gdown dependency",  # gdown not installed
-    "hash check",  # check hash value of downloaded file
     "limit",  # HTTP Error 503: Egress is over the account limit
     "authenticate",
     "timed out",  # urlopen error [Errno 110] Connection timed out
     "HTTPError",  # HTTPError: 429 Client Error: Too Many Requests for huggingface hub
 )
@@
-    except ValueError as v_e:
-        if "hash check" in str(v_e):
-            raise unittest.SkipTest(f"Hash value error while downloading: {v_e}") from v_e
-
-        raise v_e
-
-

Also applies to: 179-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils.py` around lines 78 - 83, The DOWNLOAD_FAIL_MSGS list used
by skip_if_downloading_fails() is incorrectly treating hash-integrity failures
as transient download failures; remove the "hash check" entry from
DOWNLOAD_FAIL_MSGS so hash mismatches raise test failures rather than being
converted to SkipTest, and make the same change to the other identical
failure-message tuple used later in the file (the second
DOWNLOAD_FAIL_MSGS-style list referenced by skip_if_downloading_fails()). Ensure
skip_if_downloading_fails() continues to match only genuine
download/network/gdown/limit messages and not integrity/hash errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_utils.py`:
- Around line 197-217: The test reuses model_path so download_url sees the
existing file and skips re-downloading, preventing the hash-mismatch path from
being exercised; update TestDownloadUrl.test_download_url to either delete
model_path (os.remove(model_path)) before calling download_url for the mismatch
case or use a new filepath (e.g., model_path_mismatch) and wrap that mismatch
download call with skip_if_downloading_fails() so the test forces a fresh
download and still skips in quick/network-failing environments; reference
download_url, model_path, TestDownloadUrl.test_download_url, and
skip_if_downloading_fails when making the change.

---

Outside diff comments:
In `@tests/test_utils.py`:
- Around line 78-83: The DOWNLOAD_FAIL_MSGS list used by
skip_if_downloading_fails() is incorrectly treating hash-integrity failures as
transient download failures; remove the "hash check" entry from
DOWNLOAD_FAIL_MSGS so hash mismatches raise test failures rather than being
converted to SkipTest, and make the same change to the other identical
failure-message tuple used later in the file (the second
DOWNLOAD_FAIL_MSGS-style list referenced by skip_if_downloading_fails()). Ensure
skip_if_downloading_fails() continues to match only genuine
download/network/gdown/limit messages and not integrity/hash errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7b5d0b2c-e70c-4e3d-bd08-86cee7116a1d

📥 Commits

Reviewing files that changed from the base of the PR and between 29ec2fe and 606e7ca.

📒 Files selected for processing (2)
  • tests/apps/test_download_and_extract.py
  • tests/test_utils.py

Comment thread tests/test_utils.py
Comment on lines +197 to +217
class TestDownloadUrl(unittest.TestCase):
"""Exercise ``download_url`` success and hash-mismatch paths."""

def test_download_url(self):
"""
Download a sample TIFF and validate hash handling.
"""
with tempfile.TemporaryDirectory() as tempdir:
model_path = os.path.join(tempdir, "model.tiff")

with skip_if_downloading_fails():
download_url(
url=SAMPLE_TIFF, filepath=model_path, hash_val=SAMPLE_TIFF_HASH, hash_type=SAMPLE_TIFF_HASH_TYPE
)

# Verify file exists before testing hash mismatch
if not os.path.exists(model_path):
self.skipTest("File was not downloaded successfully")

with self.assertRaises(ValueError):
download_url(url=SAMPLE_TIFF, filepath=model_path, hash_val="0" * 64, hash_type=SAMPLE_TIFF_HASH_TYPE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exercise the downloaded-file mismatch path and skip in quick mode.

Line 217 reuses model_path, so download_url() validates the existing file and never reaches the new temp-download hash check. This also adds a network test to quick discovery.

Proposed fix
+@unittest.skipIf(os.environ.get(quick_test_var, "").lower() == "true", "Skipping slow tests")
 class TestDownloadUrl(unittest.TestCase):
     """Exercise ``download_url`` success and hash-mismatch paths."""
@@
-            with self.assertRaises(ValueError):
-                download_url(url=SAMPLE_TIFF, filepath=model_path, hash_val="0" * 64, hash_type=SAMPLE_TIFF_HASH_TYPE)
+            bad_model_path = os.path.join(tempdir, "model_bad.tiff")
+            with self.assertRaisesRegex(ValueError, "hash check"):
+                download_url(
+                    url=Path(model_path).resolve().as_uri(),
+                    filepath=bad_model_path,
+                    hash_val="0" * 64,
+                    hash_type=SAMPLE_TIFF_HASH_TYPE,
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils.py` around lines 197 - 217, The test reuses model_path so
download_url sees the existing file and skips re-downloading, preventing the
hash-mismatch path from being exercised; update
TestDownloadUrl.test_download_url to either delete model_path
(os.remove(model_path)) before calling download_url for the mismatch case or use
a new filepath (e.g., model_path_mismatch) and wrap that mismatch download call
with skip_if_downloading_fails() so the test forces a fresh download and still
skips in quick/network-failing environments; reference download_url, model_path,
TestDownloadUrl.test_download_url, and skip_if_downloading_fails when making the
change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raise different exception when hash value of download is wrong

1 participant