Efficiently streaming data to associative array

Guillaume Chatelet via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Aug 8 09:10:08 PDT 2017


On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven Schveighoffer 
wrote:
> On 8/8/17 11:28 AM, Guillaume Chatelet wrote:
>> Let's say I'm processing MB of data, I'm lazily iterating over 
>> the incoming lines storing data in an associative array. I 
>> don't want to copy unless I have to.
>> 
>> Contrived example follows:
>> 
>> input file
>> ----------
>> a,b,15
>> c,d,12
>> ....
>> 
>> Efficient ingestion
>> -------------------
>> void main() {
>> 
>>    size_t[string][string] indexed_map;
>> 
>>    foreach(char[] line ; stdin.byLine) {
>>      char[] a;
>>      char[] b;
>>      size_t value;
>>      line.formattedRead!"%s,%s,%d"(a,b,value);
>> 
>>      auto pA = a in indexed_map;
>>      if(pA is null) {
>>        pA = &(indexed_map[a.idup] = (size_t[string]).init);
>>      }
>> 
>>      auto pB = b in (*pA);
>>      if(pB is null) {
>>        pB = &((*pA)[b.idup] = size_t.init
>>      }
>> 
>>      // Technically unneeded but let's say we have more than 2 
>> dimensions.
>>      (*pB) = value;
>>    }
>> 
>>    indexed_map.writeln;
>> }
>> 
>> 
>> I qualify this code as ugly but fast. Any idea on how to make 
>> this less ugly? Is there something in Phobos to help?
>
> I wouldn't use formattedRead, as I think this is going to 
> allocate temporaries for a and b.
>
> Note, this is very close to Jon Degenhardt's blog post in May: 
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>
> -Steve

I haven't yet dug into formattedRead but thx for letting me know 
: )
I was mostly speaking about the pattern with the AA. I guess the 
best I can do is a templated function to hide the ugliness.


ref Value GetWithDefault(Value)(ref Value[string] map, const 
(char[]) key) {
   auto pValue = key in map;
   if(pValue) return *pValue;
   return map[key.idup] = Value.init;
}

void main() {

   size_t[string][string] indexed_map;

   foreach(char[] line ; stdin.byLine) {
     char[] a;
     char[] b;
     size_t value;
     line.formattedRead!"%s,%s,%d"(a,b,value);

     indexed_map.GetWithDefault(a).GetWithDefault(b) = value;
   }

   indexed_map.writeln;
}


Not too bad actually !


More information about the Digitalmars-d-learn mailing list