Updates to the tsv-utils toolkit

Jon Degenhardt via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Wed Feb 22 10:12:50 PST 2017


It's not quite a year since the open-sourcing of eBay's tsv 
utilities. Since then there have been a number of additions and 
updates, and the tools form a more complete package. The tools 
assist with manipulation of tabular data files common in machine 
learning and data mining environments. They work alongside 
traditional Unix command line tools like 'cut', and 'sort'. They 
also fit well with data mining and stats packages like R and 
Pandas.

The tools include filtering, slicing, joins and other 
manipulation, sampling, and statistical calculations. If you find 
yourself working with large data files from a unix shell, you may 
like these tools.

Speed matters when processing large data files, and these tools 
are fast. I've published new benchmarks comparing the tools to 
similar tools written in several native compiled programming 
languages. The tools are the fastest on five of the six 
benchmarks run, generally by significant margins. It's a good 
result for the D programming language. The benchmarks may be of 
interest regardless of your interest in the tools themselves.

Repository: https://github.com/eBay/tsv-utils-dlang
Performance benchmarks: 
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md

--Jon



More information about the Digitalmars-d-announce mailing list