Command line tool for weighted reservoir sampling

Jon Degenhardt via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Sun Jan 22 16:06:00 PST 2017


I released a new tool for weighted random sampling of tabular 
data files: tsv-sample. It's one of several tools recently added 
to tsv file toolkit I released last year. These tools are 
especially useful when data files are larger than is desirable to 
read entirely into memory in R and similar apps.

I'll publish an announcement of broader set of tools updates in 
the next few weeks. I have some performance benchmarks to finish 
first. However, weighted reservoir sampling algorithms are 
interesting, I thought there might be enough interest to warrant 
a separate announcement.

Repo: https://github.com/eBay/tsv-utils-dlang
tsv-sample code: 
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d

--Jon


More information about the Digitalmars-d-announce mailing list