"Rolling Hash computation" or "Content Defined Chunking"
notna via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Tue May 9 11:17:45 PDT 2017
On Saturday, 6 May 2017 at 07:21:51 UTC, Johannes Pfau wrote:
> Am Mon, 01 May 2017 21:01:43 +0000
> schrieb notna <notna.remove.this at ist-einmalig.de>:
>
>> Hi Dlander's.
>>
>> Found some interesting reads ([1] [2] [3]) about the $SUBJECT
>> and wonder if there is anything available in the Dland?!
>>
>> If yes, pls. share.
>> If not, how could it be done (D'ish)
>>
>> [1] -
>> https://moinakg.wordpress.com/2013/06/22/high-performance-content-defined-chunking/
>> -
>> https://github.com/moinakg/pcompress/blob/master/rabin/rabin_dedup.c
>>
>> [2] -
>> https://restic.github.io/blog/2015-09-12/restic-foundation1-cdc
>>
>> [3] - http://www.infoarena.ro/blog/rolling-hash
>>
>> Thanks & regards
>
> Interesting concept. I'm not aware of any D implementation but
> it shouldn't be difficult to implement this in D:
> https://en.wikipedia.org/wiki/Rolling_hash#Cyclic_polynomial
>
> There's a BSD licensed haskell implementation, so a BSD
> licensed port
> would be very easy to implement:
> https://hackage.haskell.org/package/optimal-blocks-0.1.0
> https://hackage.haskell.org/package/optimal-blocks-0.1.0/docs/src/Algorithm-OptimalBlocks-BuzzHash.html
>
> To make an implementation D'ish it could integrate with either
> std.digest or process input ranges. If you want to use it
> exclusively for chunking your code can be more efficient
> (process InputRange until a boundary condition is met). When
> using input ranges, prefer some kind of buffered approach,
> Range!ubyte[] instead of Range!ubyte for better performance.
>
> If you really want the rolling hash value for each byte in a
> sequence this will be less efficient as you'll have to enter
> data byte-by-byte. In this case it's extremely important for
> performance that your function can be inlined, so use templates:
>
> ubyte[] data;
> foreach(b; data)
> {
> // This needs to be inlined for performance reasons
> rollinghash.put(b);
> }
>
> -- Johannes
Thanks for the feedback, Johannes.
I hoped there may already be something in Mir or Weka.io or
somewhere else... Will read the Golang, C and C++ source and see
if my Dlang is good enough for ranges and the like magic...
More information about the Digitalmars-d-learn
mailing list