Updates to the tsv-utils toolkit
Jon Degenhardt via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Sat Mar 4 11:48:21 PST 2017
On Wednesday, 22 February 2017 at 18:12:50 UTC, Jon Degenhardt
wrote:
> It's not quite a year since the open-sourcing of eBay's tsv
> utilities. Since then there have been a number of additions and
> updates, and the tools form a more complete package. The tools
> assist with manipulation of tabular data files common in
> machine learning and data mining environments. They work
> alongside traditional Unix command line tools like 'cut', and
> 'sort'. They also fit well with data mining and stats packages
> like R and Pandas.
>
> The tools include filtering, slicing, joins and other
> manipulation, sampling, and statistical calculations. If you
> find yourself working with large data files from a unix shell,
> you may like these tools.
>
> Speed matters when processing large data files, and these tools
> are fast. I've published new benchmarks comparing the tools to
> similar tools written in several native compiled programming
> languages. The tools are the fastest on five of the six
> benchmarks run, generally by significant margins. It's a good
> result for the D programming language. The benchmarks may be of
> interest regardless of your interest in the tools themselves.
>
> Repository: https://github.com/eBay/tsv-utils-dlang
> Performance benchmarks:
> https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md
>
> --Jon
One more update: Schveiguy helped identify the performance
bottleneck in the csv2tsv tool, now the tools are the fastest on
all six benchmarks. Benchmarks have been updated (and reformatted
a bit). Summary table here:
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#top-four-in-each-benchmark
More information about the Digitalmars-d-announce
mailing list