MurmurHash3 behaviour

Fri Aug 19 14:03:22 PDT 2016

On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
> Regarding the MurmurHash3 implementation in core.internal.hash, 
> it is my understanding that:
>     // assuming a and b are uints
>     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
> Is this correct?
> I'm just not quite certain of this property when I try to read 
> the code myself, and I don't know much about hash algorithms.

Phobos nightly contains MurmurHash3 in std.digest (I don't know 
why there are two implementations though), but I think this one 
will be easier to use ;-)

http://dlang.org/phobos-prerelease/std_digest_murmurhash.html

The API for bytesHash is

size_t bytesHash(const(void)* buf, size_t len, size_t seed)

so shouldn't be?

auto arr = [a,b];
bytesHash(&arr, arr.length, 42);

> I'm just not quite certain of this property when I try to read

AFAIK the output of a hash isn't generally equal to the its 
upcoming seed as pre & postpreprocessing may apply. In the case 
of Murmurhash3 its post-processing (h1 ^= len):

void main() {
     import std.stdio;
     import core.internal.hash;
     uint a = 4, b = 23;
     auto arr = [a, b];

     auto b1 = bytesHash(&arr, arr.length, 42);

     auto l = arr[0..1], r = arr[1..$];
     auto b2a = bytesHash(&l, 1, 42);
     auto b2b= bytesHash(&r, 1, b2a);
     assert(b1 == b2b);
}

However with std.digest you can "save" the current state of a 
hash function and continue to give new items to the hasher:

void main()
{

     import std.digest.murmurhash; // required Phobos nightly
     uint a = 4, b = 23;
     auto arr = [a, b];
     auto l = arr[0..1], r = arr[1..$];

     MurmurHash3!32 hasher;
     hasher.put(cast(ubyte[]) l);
     hasher.put(cast(ubyte[]) r);

     auto v1 = hasher.finish();

     hasher = MurmurHash3!32();
     hasher.put(cast(ubyte[]) arr);
     auto v2 = hasher.finish();

     assert(v1 == v2);
}