Hi Dirk, sorry if this is not exactly as much effort as you expect for an issue, I just wanted to flag something reported to collapse (#648), which is present in both my hash functions written in C and your hash functions, and that is the following:
library(collapse)
#> Warning: package 'collapse' was built under R version 4.3.3
#> collapse 2.0.17, see ?`collapse-package` or ?`collapse-documentation`
x = round(rnorm(100))
unique(x) # R
#> [1] 1 0 2 -1 -2
funique(x) # My hash function in C
#> [1] 1 0 0 2 -1 -2
funique(x, sort = TRUE) # Rcpp::sugar::sort_unique()
#> [1] -2 -1 0 0 1 2
# More explicit proof
collapse:::sortuniqueCpp(x)
#> [1] -2 -1 0 0 1 2
# The solution
y = x + 0L
funique(y)
#> [1] 1 0 2 -1 -2
collapse:::sortuniqueCpp(y)
#> [1] -2 -1 0 1 2
Created on 2024-10-31 with reprex v2.0.2
In words: R functions like round() create signed and unsigned zeros, whose hashes differ. A quite efficient remedy is to add an integer zero (gives like a 3% slower execution on my very efficient C hash). I'm considering to roll this out, but of course cannot control your code. So just pushing it to you as food for thought.
Hi Dirk, sorry if this is not exactly as much effort as you expect for an issue, I just wanted to flag something reported to collapse (#648), which is present in both my hash functions written in C and your hash functions, and that is the following:
Created on 2024-10-31 with reprex v2.0.2
In words: R functions like
round()create signed and unsigned zeros, whose hashes differ. A quite efficient remedy is to add an integer zero (gives like a 3% slower execution on my very efficient C hash). I'm considering to roll this out, but of course cannot control your code. So just pushing it to you as food for thought.