Cross-checking hashing algorithm
Using an established hash functions over the raw bytes of x
has a few disadvantages:
- C/Rust structures contain padding bytes between consecutive fields (due to alignment requirements), and we must not include this padding in the hash.
- Pointer addresses are non-deterministic due to ASLR and other factors, so we must hash them by dereference instead of address.
For these reasons, we have chosen to design our own type-aware hashing algorithms.The algorithms hash each value differently depending on its type, and are implemented by functions with the following signature:
We use recursive hashing algorithms for complex types.To prevent infinite recursion and long hashing times, we limit the recursion depth to a fixed value.When recursion reaches this limit, the hash function returns a constant hash instead of going deeper.
Aggregate (or non-trivial) types:
Structures. We hash the contents of each structure by recursively hashing each field (with depth increased by one), then aggregating all the hashes into one. We currently use the JodyHash function for the latter.
Pointers. We avoid hashing pointers by address for the reasons listed above.Instead, we hash each pointer by recursively hashing its dereferenced value (with depth increased by one).We have two special cases here that we need to handle:
- Null pointers, which our hash functions check and return a special hard-coded hash value for.
Other data types, e.g., unions and structures containing bitfields, are difficult to hash programatically and require the user to specify a manual hash function.
The can be used to specify different hashing algorithm separately for simple and aggregate types.