d word counting approach performs well but has higher mem usage

Stanislav Blinov stanislav.blinov at gmail.com
Sun Nov 4 00:41:46 UTC 2018


On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote:

> Assoc array allocations?

Yup. AAs do keep their memory around (supposedly for reuse). You 
can insert calls to GC.stats (import core.memory) at various 
points to see actual GC heap usage. If you don't touch that AA at 
all you'll only use up some Kb of the GC heap when reading the 
file.
Why it consumes so much is a question to the implementation.

> What did I do wrong?

Well, you didn't actually put the keys into the AA ;) I'm 
guessing you didn't look closely at the output, otherwise you 
would've noticed that something was wrong.

AAs want immutable keys. .byLine returns (in this case) a char[]. 
It's a slice of it's internal buffer that is reused on reading 
each line; it gets overwritten on every iteration. This way the 
reading loop only consumes as much as the longest line requires. 
But a `char[]` is not a `string` and you wouldn't be able to 
index the AA with it:

```
Error: associative arrays can only be assigned values with 
immutable keys, not char[]
```

But by putting `const` in `foreach` you tricked the compiler into 
letting you index the AA with a (supposed) const key. Which, of 
course, went fine as far as insertion/updates went, since hashes 
still matched. But when you iterate later, pretty much every key 
is in fact a reference to some older memory, which is still 
somewhere on the GC heap; you don't get a segfault, but neither 
do you get correct "words".

You *need* to have an actual `string` when you first insert into 
the AA.

> ```d 
> ===================================================================
> void main()
> {
>     import std.stdio, std.algorithm, std.range;
       import core.memory;
>
>     int[string] count;

       void updateCount(char[] word) {
           auto ptr = word in count;
           if (!ptr)
               // note the .idup!
               count[word.idup] = 1;
           else
               (*ptr)++;
       }

       // no const!
>     foreach(word; stdin.byLine.map!splitter.joiner) {
           updateCount(word);
>     }
>
>     //or even:
>     //foreach(line; stdin.byLine) {
             // no const!
>     //    foreach(word; line.splitter) {
       //        updateCount(word);
>     //    }
>     //}

       writeln(GC.stats);
       GC.collect;
       writeln(GC.stats);

>     count.byKeyValue
>         .array
>         .sort!((a, b) => a.value > b.value)
>         .each!(a => writefln("%d %s", a.value, a.key));

       writeln(GC.stats);
       GC.collect;
       writeln(GC.stats);
> }
> ```

Note that if you .clear() and even .destroy() the AA, it'll still 
keep a bunch of memory allocated. I guess built-in AAs just love 
to hoard.


More information about the Digitalmars-d-learn mailing list