Chunker - Content-Defined Chunking based on Rabin Checksums
Vladimir Panteleev
thecybershadow.lists at gmail.com
Sat Sep 21 03:11:11 UTC 2019
Hi,
This is a D port of a Go package implementing Content-Defined
Chunking:
https://github.com/CyberShadow/chunker
The package contains the following modules:
- chunker.polynomials - implements Pol, a type which represents a
polynomial from F_2[X]. I'm not quite sure what that is, but they
seem to be very useful.
- chunker.rabin - implements RabinHash, which calculates a
rolling Rabin Fingerprint.
- chunker - implements Chunker, an adapter range which accepts
chunks of bytes (such as from File.byChunk) and emits
variable-size content-defined chunks, which are split when the
local Rabin Fingerprint reaches a certain value.
Links
-----
- Wikipedia:
https://en.wikipedia.org/wiki/Rolling_hash#Rabin_fingerprint
- Original Go version: https://github.com/restic/chunker
- Dub package: https://code.dlang.org/packages/chunker
- Documentation: https://chunker.dpldocs.info/chunker.html
(courtesy of Adam Ruppe's dpldocs service)
- Example:
https://github.com/cybershadow/chunker/blob/master/src/chunker/example.d
Differences from the Go version
-------------------------------
- Chunker was adapted to be a D range and accept D ranges as
input.
- The Rabin Fingerprint implementation was extracted out of
Chunker and into its own module. It is usable stand-alone.
- Significant refactorings and simplifications of the
implementation. The original code made some sacrifices in code
readability to work around limitations of the language and
compiler optimization to achieve reasonable performance.
- 20% faster than the Go version (LDC release build).
- Improved test coverage and symbol documentation.
The original package was written by Alexander Neumann and is used
in the restic backup program.
More information about the Digitalmars-d-announce
mailing list