Stream-Based Processing of Range Chunks in D
qznc
qznc at web.de
Tue Dec 10 02:28:39 PST 2013
On Tuesday, 10 December 2013 at 09:57:44 UTC, Nordlöw wrote:
> I'm looking for an elegant way to perform chunk-stream-based
> processing of arrays/ranges. I'm building a file
> indexing/search engine in D that calculates various kinds of
> statistics on files such as histograms and SHA1-digests. I want
> these calculations to be performed in a single pass with
> regards to data-access locality.
>
> Here is an excerpt from the engine
>
> /** Process File in Cache Friendly Chunks. */
> void calculateCStatInChunks(immutable (ubyte[]) src,
> size_t chunkSize, bool doSHA1,
> bool doBHist8) {
> if (!_cstat.contentsDigest[].allZeros) { doSHA1 =
> false; }
> if (!_cstat.bhist8.allZeros) { doBHist8 = false; }
>
> import std.digest.sha;
> SHA1 sha1;
> if (doSHA1) { sha1.start(); }
>
> import std.range: chunks;
> foreach (chunk; src.chunks(chunkSize)) {
> if (doSHA1) { sha1.put(chunk); }
> if (doBHist8) { /*...*/ }
> }
>
> if (doSHA1) {
> _cstat.contentsDigest = sha1.finish();
> }
> }
>
> Seemingly this is not a very elegant (functional) approach as I
> have to spread logic for each statistics (reducer) across three
> different places in the code, namely `start`, `put` and
> `finish`.
>
> Does anybody have suggestions/references on Haskell-monad-like
> stream based APIs that can make this code more D-style
> component-based?
You could make a range step for each kind of statistic, which
outputs the input range unchanged and does its job as a side
effect.
SHA1 sha1;
src.chunks(chunkSize)
.add_sha1(doSHA1, &sha1)
.add_bhist(doBHist8)
.strict_consuming();
You could try to use constructor/destructor mechanisms for
sha1.start and sha1.finish. Or at least scope guards:
SHA1 sha1;
if (doSHA1) { sha1.start(); }
scope(exit) if (doSHA1) { _cstat.contentsDigest = sha1.finish(); }
More information about the Digitalmars-d-learn
mailing list