Trying to reduce memory usage

Fri Feb 19 00:13:19 UTC 2021

On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote:
> I spent some time experimenting with this problem, and here is 
> the best solution I found, assuming that perfect de-duplication 
> is required. (I'll put the code up on GitHub / dub if anyone 
> wants to have a look.)

It would be interesting to see how the performance compares to 
tsv-uniq 
(https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The 
prebuilt binaries turn on all the optimizations 
(https://github.com/eBay/tsv-utils/releases).

tsv-uniq wasn't included in the different comparative benchmarks 
I published, but I did run my own benchmarks and it holds up 
well. However, it should not be hard to beat it. What might be 
more interesting is what the delta is.

tsv-uniq is using the most straightforward approach of popping 
things into an associate array. No custom data structures. Enough 
memory is required to hold all the unique keys in memory, so it 
won't handle arbitrarily large data sets. It would be interesting 
to see how the straightforward approach compares with the more 
highly tuned approach.

--Jon