Skip to content

block: eliminate byte concat copies in peek() with a memoryview#19

Merged
Eeems merged 5 commits into
Eeems:mainfrom
kroketio:perf/peek-memoryview
Mar 10, 2026
Merged

block: eliminate byte concat copies in peek() with a memoryview#19
Eeems merged 5 commits into
Eeems:mainfrom
kroketio:perf/peek-memoryview

Conversation

@kroketio
Copy link
Copy Markdown
Contributor

@kroketio kroketio commented Mar 10, 2026

When writing a 19mb file to disk (Volume(...); volume.inode_at("/test_file"); f.write(inode.open().read()) I noticed the file writing was slow, so I profiled it, and most time was spent in block.py:peek() (21 seconds)

python3 -m pstats profile.out
Welcome to the profile statistics browser.
profile.out% sort cumulative
profile.out% stats 20
Tue Mar 10 10:21:09 2026    profile.out

         195853 function calls (194557 primitive calls) in 21.915 seconds

   Ordered by: cumulative time
   List reduced from 1057 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     88/1    0.002    0.000   21.915   21.915 {built-in method builtins.exec}
        1    0.000    0.000   21.915   21.915 /home/dsc/foo/bar/test.py:1(<module>)
       42    0.000    0.000   21.848    0.520 /home/dsc/foo/bar/ext4/block.py:93(read)
       42   21.759    0.518   21.848    0.520 /home/dsc/foo/bar/ext4/block.py:105(peek)
     4689    0.008    0.000    0.088    0.000 /home/dsc/foo/bar/ext4/block.py:38(__getitem__)
     4689    0.012    0.000    0.061    0.000 /home/dsc/foo/bar/ext4/extent.py:42(__getitem__)
     98/2    0.000    0.000    0.040    0.020 <frozen importlib._bootstrap>:1165(_find_and_load)
     98/2    0.000    0.000    0.040    0.020 <frozen importlib._bootstrap>:1120(_find_and_load_unlocked)
     89/3    0.000    0.000    0.040    0.013 <frozen importlib._bootstrap>:666(_load_unlocked)
     71/3    0.000    0.000    0.040    0.013 <frozen importlib._bootstrap_external>:934(exec_module)
    207/5    0.000    0.000    0.040    0.008 <frozen importlib._bootstrap>:233(_call_with_frames_removed)
     4731    0.005    0.000    0.027    0.000 /home/dsc/foo/bar/ext4/volume.py:167(read)
        1    0.000    0.000    0.020    0.020 /home/dsc/foo/bar/admp_src/magic.py:1(<module>)
        1    0.000    0.000    0.019    0.019 /home/dsc/foo/bar/ext4/__init__.py:1(<module>)
     9930    0.008    0.000    0.018    0.000 /home/dsc/foo/bar/ext4/extent.py:39(__contains__)
     4815    0.015    0.000    0.015    0.000 {method 'read' of '_io.BufferedReader' objects}
       77    0.013    0.000    0.013    0.000 {method '__exit__' of '_io._IOBase' objects}
        1    0.011    0.011    0.011    0.011 {method 'write' of '_io.BufferedWriter' objects}
        1    0.000    0.000    0.011    0.011 /home/dsc/foo/bar/ext4/volume.py:1(<module>)
  370/364    0.003    0.000    0.009    0.000 {built-in method builtins.__build_class__}

With this commit file writing is a lot faster, 100ms for 50mb on my machine.

Summary by CodeRabbit

  • Refactor

    • More efficient block-data assembly and safer handling of missing/partial blocks, improving memory use and read/peek reliability for large or sparse files.
  • Tests

    • Expanded test coverage and stress scenarios: increased iteration counts, more generated test files and attribute operations, and additional runtime validations for block I/O peek/seek behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 10, 2026

Warning

Rate limit exceeded

@Eeems has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24519102-9ab0-4137-8d26-487d2b43fd8a

📥 Commits

Reviewing files that changed from the base of the PR and between 7d8f8b7 and 14b0d0a.

📒 Files selected for processing (2)
  • _test_image.sh
  • test.py
📝 Walkthrough

Walkthrough

Adds a typed constructor and internal _null_block to BlockIOBlocks, changes block lookup to fall back to _null_block, refactors BlockIO.peek to build results from memoryview slices, and extends tests and the test image script to exercise larger ranges and more attribute operations.

Changes

Cohort / File(s) Summary
BlockIO memoryview refactor
ext4/block.py
Adds def __init__(self, blockio: "BlockIO") and self.blockio: BlockIO, introduces internal _null_block, updates __getitem__ to return _null_block when missing, and refactors BlockIO.peek/read logic to accumulate memoryview slices and concatenate them into final bytes. Review attention: correctness of _null_block semantics, memoryview slicing/concatenation, offset handling, and potential edge-case performance.
Expanded test script
_test_image.sh
Increases initial loop range from 1..100 to 1..1000 and adds a new loop creating 100 test$i.txt files with repeated setfattr calls, increasing IO and attr operations to exercise filesystem behaviors. Review attention: test runtime, disk usage, and idempotency.
Test assertions & seeks
test.py
Adds runtime checks for BlockIO behavior: asserts b.peek(0) falsy, computes size = volume.block_size + 1, asserts len(b.peek(size)) == size, and adds a seek to volume.block_size - 5. Review attention: correctness of new assertions, seek semantics relative to block boundaries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Fix #14 #16 — also modifies ext4/block.py's BlockIO.peek; related work on block accumulation and EOF/clamping behavior.

Poem

🐰 I hopped on blocks and nudged a null,
Collected views where bytes were full,
Sliced the start and joined the rest,
Overflow tested, attributes pressed—
A tiny rabbit's file-system lullaby. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: replacing byte concatenation copies in peek() with a memoryview-based approach for performance improvement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

@kroketio kroketio force-pushed the perf/peek-memoryview branch from a6f1b47 to b631cd4 Compare March 10, 2026 10:18
@kroketio kroketio changed the title ext4/block: eliminate byte concat copies in peek() with a memoryview block: eliminate byte concat copies in peek() with a memoryview Mar 10, 2026
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@Eeems

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

@Eeems
Copy link
Copy Markdown
Owner

Eeems commented Mar 10, 2026

When writing a 19mb file to disk (Volume(...); volume.inode_at("/test_file"); f.write(inode.open().read()) I noticed the file writing was slow, so I profiled it, and most time was spent in block.py:peek() (21 seconds

I assume you mean file reading was slow, unless you are working on something rather interesting?

coderabbitai[bot]

This comment was marked as resolved.

@Eeems Eeems force-pushed the perf/peek-memoryview branch from 7d8f8b7 to 14b0d0a Compare March 10, 2026 18:03
@Eeems Eeems merged commit 797cd94 into Eeems:main Mar 10, 2026
1 check passed
@kroketio
Copy link
Copy Markdown
Contributor Author

When writing a 19mb file to disk (Volume(...); volume.inode_at("/test_file"); f.write(inode.open().read()) I noticed the file writing was slow, so I profiled it, and most time was spent in block.py:peek() (21 seconds

I assume you mean file reading was slow, unless you are working on something rather interesting?

sorry yes, reading, working on something boring; https://git.maemo.org/sanderfoobar/admp

cheers!

@Eeems
Copy link
Copy Markdown
Owner

Eeems commented Mar 12, 2026

... working on something boring; https://git.maemo.org/sanderfoobar/admp

cheers!

@kroketio You should include the license files in the vendored dependency folders to be compliant with the MIT license terms for each of them. Interesting project though!

@kroketio
Copy link
Copy Markdown
Contributor Author

@Eeems done, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants