Efficiently streaming data to associative array

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Aug 8 09:00:17 PDT 2017


On 8/8/17 11:28 AM, Guillaume Chatelet wrote:
> Let's say I'm processing MB of data, I'm lazily iterating over the 
> incoming lines storing data in an associative array. I don't want to 
> copy unless I have to.
> 
> Contrived example follows:
> 
> input file
> ----------
> a,b,15
> c,d,12
> ....
> 
> Efficient ingestion
> -------------------
> void main() {
> 
>    size_t[string][string] indexed_map;
> 
>    foreach(char[] line ; stdin.byLine) {
>      char[] a;
>      char[] b;
>      size_t value;
>      line.formattedRead!"%s,%s,%d"(a,b,value);
> 
>      auto pA = a in indexed_map;
>      if(pA is null) {
>        pA = &(indexed_map[a.idup] = (size_t[string]).init);
>      }
> 
>      auto pB = b in (*pA);
>      if(pB is null) {
>        pB = &((*pA)[b.idup] = size_t.init
>      }
> 
>      // Technically unneeded but let's say we have more than 2 dimensions.
>      (*pB) = value;
>    }
> 
>    indexed_map.writeln;
> }
> 
> 
> I qualify this code as ugly but fast. Any idea on how to make this less 
> ugly? Is there something in Phobos to help?

I wouldn't use formattedRead, as I think this is going to allocate 
temporaries for a and b.

Note, this is very close to Jon Degenhardt's blog post in May: 
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

-Steve


More information about the Digitalmars-d-learn mailing list