Efficiently streaming data to associative array
Guillaume Chatelet via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Tue Aug 8 09:10:08 PDT 2017
On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven Schveighoffer
wrote:
> On 8/8/17 11:28 AM, Guillaume Chatelet wrote:
>> Let's say I'm processing MB of data, I'm lazily iterating over
>> the incoming lines storing data in an associative array. I
>> don't want to copy unless I have to.
>>
>> Contrived example follows:
>>
>> input file
>> ----------
>> a,b,15
>> c,d,12
>> ....
>>
>> Efficient ingestion
>> -------------------
>> void main() {
>>
>> size_t[string][string] indexed_map;
>>
>> foreach(char[] line ; stdin.byLine) {
>> char[] a;
>> char[] b;
>> size_t value;
>> line.formattedRead!"%s,%s,%d"(a,b,value);
>>
>> auto pA = a in indexed_map;
>> if(pA is null) {
>> pA = &(indexed_map[a.idup] = (size_t[string]).init);
>> }
>>
>> auto pB = b in (*pA);
>> if(pB is null) {
>> pB = &((*pA)[b.idup] = size_t.init
>> }
>>
>> // Technically unneeded but let's say we have more than 2
>> dimensions.
>> (*pB) = value;
>> }
>>
>> indexed_map.writeln;
>> }
>>
>>
>> I qualify this code as ugly but fast. Any idea on how to make
>> this less ugly? Is there something in Phobos to help?
>
> I wouldn't use formattedRead, as I think this is going to
> allocate temporaries for a and b.
>
> Note, this is very close to Jon Degenhardt's blog post in May:
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>
> -Steve
I haven't yet dug into formattedRead but thx for letting me know
: )
I was mostly speaking about the pattern with the AA. I guess the
best I can do is a templated function to hide the ugliness.
ref Value GetWithDefault(Value)(ref Value[string] map, const
(char[]) key) {
auto pValue = key in map;
if(pValue) return *pValue;
return map[key.idup] = Value.init;
}
void main() {
size_t[string][string] indexed_map;
foreach(char[] line ; stdin.byLine) {
char[] a;
char[] b;
size_t value;
line.formattedRead!"%s,%s,%d"(a,b,value);
indexed_map.GetWithDefault(a).GetWithDefault(b) = value;
}
indexed_map.writeln;
}
Not too bad actually !
More information about the Digitalmars-d-learn
mailing list