d word counting approach performs well but has higher mem usage
Stanislav Blinov
stanislav.blinov at gmail.com
Sun Nov 4 00:41:46 UTC 2018
On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote:
> Assoc array allocations?
Yup. AAs do keep their memory around (supposedly for reuse). You
can insert calls to GC.stats (import core.memory) at various
points to see actual GC heap usage. If you don't touch that AA at
all you'll only use up some Kb of the GC heap when reading the
file.
Why it consumes so much is a question to the implementation.
> What did I do wrong?
Well, you didn't actually put the keys into the AA ;) I'm
guessing you didn't look closely at the output, otherwise you
would've noticed that something was wrong.
AAs want immutable keys. .byLine returns (in this case) a char[].
It's a slice of it's internal buffer that is reused on reading
each line; it gets overwritten on every iteration. This way the
reading loop only consumes as much as the longest line requires.
But a `char[]` is not a `string` and you wouldn't be able to
index the AA with it:
```
Error: associative arrays can only be assigned values with
immutable keys, not char[]
```
But by putting `const` in `foreach` you tricked the compiler into
letting you index the AA with a (supposed) const key. Which, of
course, went fine as far as insertion/updates went, since hashes
still matched. But when you iterate later, pretty much every key
is in fact a reference to some older memory, which is still
somewhere on the GC heap; you don't get a segfault, but neither
do you get correct "words".
You *need* to have an actual `string` when you first insert into
the AA.
> ```d
> ===================================================================
> void main()
> {
> import std.stdio, std.algorithm, std.range;
import core.memory;
>
> int[string] count;
void updateCount(char[] word) {
auto ptr = word in count;
if (!ptr)
// note the .idup!
count[word.idup] = 1;
else
(*ptr)++;
}
// no const!
> foreach(word; stdin.byLine.map!splitter.joiner) {
updateCount(word);
> }
>
> //or even:
> //foreach(line; stdin.byLine) {
// no const!
> // foreach(word; line.splitter) {
// updateCount(word);
> // }
> //}
writeln(GC.stats);
GC.collect;
writeln(GC.stats);
> count.byKeyValue
> .array
> .sort!((a, b) => a.value > b.value)
> .each!(a => writefln("%d %s", a.value, a.key));
writeln(GC.stats);
GC.collect;
writeln(GC.stats);
> }
> ```
Note that if you .clear() and even .destroy() the AA, it'll still
keep a bunch of memory allocated. I guess built-in AAs just love
to hoard.
More information about the Digitalmars-d-learn
mailing list