Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976
Open
tautschnig wants to merge 1 commit into
Open
Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976tautschnig wants to merge 1 commit into
tautschnig wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves hash distribution and reduces collisions by switching string hashing to FNV-1a with a MurmurHash3-style finalization mix, and enabling MurmurHash3 for irep hash combining.
Changes:
- Replace the existing djb2-variant string hash with FNV-1a and add
fmix64finalization. - Enable
IREP_HASH_MURMURHASH3instead of the BASIC combiner. - Add comments describing the rationale for improved avalanche properties.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/util/string_hash.cpp | Updates string hashing algorithm and introduces MurmurHash3 finalization mix. |
| src/util/irep_hash.h | Switches irep hashing configuration to MurmurHash3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8976 +/- ##
========================================
Coverage 80.57% 80.57%
========================================
Files 1708 1709 +1
Lines 189217 189267 +50
Branches 73 73
========================================
+ Hits 152460 152502 +42
- Misses 36757 36765 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a which has better avalanche properties for strings with common prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3 finalization mix for additional distribution quality. Pick the FNV-1a variant width to match `std::size_t` so that 32-bit builds use 32-bit constants and arithmetic (FNV-1a-32 prime 0x01000193, offset basis 0x811c9dc5, finalised with Murmur fmix32), avoiding libgcc 64-bit multiplication helpers; 64-bit builds use the wider FNV-1a-64 path with Murmur fmix64. Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3 which was already implemented in irep_hash.h but not enabled. MurmurHash3 provides better mixing of sub-expression hashes; the existing 32-bit and 64-bit specialisations of murmurhash3_hash_combine / _finalize already cover both build configurations. Add preprocessor checks that enforce exactly one of IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, IREP_HASH_MURMURHASH3 is defined, so that future contributors do not accidentally enable multiple strategies (or none) by editing the comment toggles. Both changes reduce hash collisions in the dstring table (36% -> ~30%) and the prop_conv expression cache, improving hash table lookup performance throughout the pipeline. Impact on Collections-C monolithic: 51s -> 47s (8% faster) Add unit tests in unit/util/string_hash.cpp covering determinism, avalanche on minimally-different inputs, prefix-distinguishing via length-mixing, and a sanity-level distribution check on a batch of CBMC-style SSA-renamed names. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
0706180 to
52a2b89
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a which has better avalanche properties for strings with common prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3 finalization mix for additional distribution quality.
Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3 which was already implemented in irep_hash.h but not enabled. MurmurHash3 provides better mixing of sub-expression hashes.
Both changes reduce hash collisions in the dstring table (36% -> ~30%) and the prop_conv expression cache, improving hash table lookup performance throughout the pipeline.
Impact on Collections-C monolithic: 51s -> 47s (8% faster)