Skip to content

Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976

Open
tautschnig wants to merge 1 commit into
diffblue:developfrom
tautschnig:change-hash-functions
Open

Improve hash functions: FNV-1a for strings, MurmurHash3 for ireps#8976
tautschnig wants to merge 1 commit into
diffblue:developfrom
tautschnig:change-hash-functions

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a which has better avalanche properties for strings with common prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3 finalization mix for additional distribution quality.

Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3 which was already implemented in irep_hash.h but not enabled. MurmurHash3 provides better mixing of sub-expression hashes.

Both changes reduce hash collisions in the dstring table (36% -> ~30%) and the prop_conv expression cache, improving hash table lookup performance throughout the pipeline.

Impact on Collections-C monolithic: 51s -> 47s (8% faster)

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Apr 27, 2026
Copilot AI review requested due to automatic review settings April 27, 2026 10:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves hash distribution and reduces collisions by switching string hashing to FNV-1a with a MurmurHash3-style finalization mix, and enabling MurmurHash3 for irep hash combining.

Changes:

  • Replace the existing djb2-variant string hash with FNV-1a and add fmix64 finalization.
  • Enable IREP_HASH_MURMURHASH3 instead of the BASIC combiner.
  • Add comments describing the rationale for improved avalanche properties.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/util/string_hash.cpp Updates string hashing algorithm and introduces MurmurHash3 finalization mix.
src/util/irep_hash.h Switches irep hashing configuration to MurmurHash3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/util/string_hash.cpp Outdated
Comment thread src/util/string_hash.cpp Outdated
Comment thread src/util/string_hash.cpp Outdated
Comment thread src/util/irep_hash.h
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.57%. Comparing base (df24632) to head (52a2b89).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8976   +/-   ##
========================================
  Coverage    80.57%   80.57%           
========================================
  Files         1708     1709    +1     
  Lines       189217   189267   +50     
  Branches        73       73           
========================================
+ Hits        152460   152502   +42     
- Misses       36757    36765    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Replace the djb2-variant string hash (h = h*31 + c) with FNV-1a
which has better avalanche properties for strings with common
prefixes (typical of CBMC's SSA-renamed symbols). Add a MurmurHash3
finalization mix for additional distribution quality.

Pick the FNV-1a variant width to match `std::size_t` so that 32-bit
builds use 32-bit constants and arithmetic (FNV-1a-32 prime
0x01000193, offset basis 0x811c9dc5, finalised with Murmur fmix32),
avoiding libgcc 64-bit multiplication helpers; 64-bit builds use
the wider FNV-1a-64 path with Murmur fmix64.

Switch the irep hash combiner from BASIC (rotate+xor) to MurmurHash3
which was already implemented in irep_hash.h but not enabled.
MurmurHash3 provides better mixing of sub-expression hashes; the
existing 32-bit and 64-bit specialisations of
murmurhash3_hash_combine / _finalize already cover both build
configurations. Add preprocessor checks that enforce exactly one of
IREP_HASH_BASIC, IREP_HASH_MURMURHASH2A, IREP_HASH_MURMURHASH3 is
defined, so that future contributors do not accidentally enable
multiple strategies (or none) by editing the comment toggles.

Both changes reduce hash collisions in the dstring table (36% -> ~30%)
and the prop_conv expression cache, improving hash table lookup
performance throughout the pipeline.

Impact on Collections-C monolithic: 51s -> 47s (8% faster)

Add unit tests in unit/util/string_hash.cpp covering determinism,
avalanche on minimally-different inputs, prefix-distinguishing via
length-mixing, and a sanity-level distribution check on a batch of
CBMC-style SSA-renamed names.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the change-hash-functions branch from 0706180 to 52a2b89 Compare May 27, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants