tsv-utils 2.0 release: Named field support

Jon Degenhardt jond at noreply.com
Sun Jul 26 20:28:56 UTC 2020


Hi all,

I'm happy to announce a new major release of eBay's TSV 
Utilities. The 2.0 release supports named field selection in all 
of the tools, a significant usability enhancement.

For those not familiar, tsv-utils is a set of command line tools 
for manipulating tabular data files of the type commonly found in 
machine learning and data mining environments. Filtering, 
statistics, sampling, joins, etc. The tools are patterned after 
traditional Unix common line tools like 'cut', 'grep', 'sort', 
etc., and are intended to work with these tools. Each tool is a 
standalone executable. Most people will only care about a subset 
of the tools. It is not necessary to learn the entire toolkit to 
get value from the tools.

The tools are all written in D and are the fastest tools of their 
type available (benchmarks are on the GitHub repository).

Previous versions of the tools referenced fields by field number, 
same as traditional Unix tools like 'cut'. In version 2.0, 
tsv-utils tools take fields either by field number or by field 
name, for files with header lines. A few examples using 
'tsv-select', a tool similar to 'cut' that also supports field 
reordering and dropping fields:

$ # Field numbers: Output fields 2 and 1, in that order.
$ tsv-select -f 2,1 data.tsv

$ # Field names: Output the 'Name' and 'RecordNum' fields.
$ tsv-select -H -f Name,RecordNum data.tsv

$ # Drop the 'Color' field, keep everything else.
$ tsv-select -H --exclude Color file.tsv

$ # Drop all the fields ending in '_time'
$ tsv-select -H -e '*_time' data.tsv

More information is available on the tsv-utils GitHub repository, 
including documentation and pre-built binaries: 
https://github.com/eBay/tsv-utils

--Jon


More information about the Digitalmars-d-announce mailing list