Stream-Based Processing of Range Chunks in D

"Nordlöw" per.nordlow at gmail.com
Tue Dec 10 01:57:42 PST 2013


I'm looking for an elegant way to perform chunk-stream-based 
processing of arrays/ranges. I'm building a file indexing/search 
engine in D that calculates various kinds of statistics on files 
such as histograms and SHA1-digests. I want these calculations to 
be performed in a single pass with regards to data-access 
locality.

Here is an excerpt from the engine

     /** Process File in Cache Friendly Chunks. */
     void calculateCStatInChunks(immutable (ubyte[]) src,
                                 size_t chunkSize, bool doSHA1, 
bool doBHist8) {
         if (!_cstat.contentsDigest[].allZeros) { doSHA1 = false; }
         if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

         import std.digest.sha;
         SHA1 sha1;
         if (doSHA1) { sha1.start(); }

         import std.range: chunks;
         foreach (chunk; src.chunks(chunkSize)) {
             if (doSHA1) { sha1.put(chunk); }
             if (doBHist8) { /*...*/ }
         }

         if (doSHA1) {
             _cstat.contentsDigest = sha1.finish();
         }
     }

Seemingly this is not a very elegant (functional) approach as I 
have to spread logic for each statistics (reducer) across three 
different places in the code, namely `start`, `put` and `finish`.

Does anybody have suggestions/references on Haskell-monad-like 
stream based APIs that can make this code more D-style 
component-based?


More information about the Digitalmars-d-learn mailing list