Command line tool for weighted reservoir sampling
Jon Degenhardt via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Sun Jan 22 16:06:00 PST 2017
I released a new tool for weighted random sampling of tabular
data files: tsv-sample. It's one of several tools recently added
to tsv file toolkit I released last year. These tools are
especially useful when data files are larger than is desirable to
read entirely into memory in R and similar apps.
I'll publish an announcement of broader set of tools updates in
the next few weeks. I have some performance benchmarks to finish
first. However, weighted reservoir sampling algorithms are
interesting, I thought there might be enough interest to warrant
a separate announcement.
Repo: https://github.com/eBay/tsv-utils-dlang
tsv-sample code:
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d
--Jon
More information about the Digitalmars-d-announce
mailing list