dmd command line scripting experiments and observations
Witold Baryluk
witold.baryluk at
Mon Dec 25 11:59:35 UTC 2023
For a very long time I have been using bash, grep, sed, awk,
usual suspects on Unix, as they are super quick to type,
incremental, etc. Once complexity is to big I usually switch to
Python (decades ago it might have been Perl or PHP).
I often will embed small snippets of grep or awk in some other
tools that just need to do something with some text files. For
example do some pre-processing for plotting in Gnuplot.
I even wrote my custom line-column processing "language", called
`kolumny` over a decade ago). To help with similar tasks. And
while it does work well, I rarely use it (once a year these days
sadly), because it is not really a full language.
Yesterday I had a need to some simple processing before before
doing plotting in gnuplot:
set ylabel "locking rate [M/s]"
plot "<grep ^mx1 foo.txt" using 3:($3*$4/$9/1e6) title "RWMutex",
"<grep ^mx2 foo.txt" using 3:($3*$4/$9/1e6) title "drwMutex"
where a file `foo.txt` has things like this:
mx1 32 1 10000000 0.0001 1 100 100 0.552091302 552.091302ms
mx1 32 1 10000000 0.0001 1 100 100 0.552518653 552.518653ms
mx1 32 1 10000000 0.0001 1 100 100 0.562133796 562.133796ms
mx2 32 1 10000000 0.0001 1 100 100 0.613519317 613.519317ms
mx2 32 1 10000000 0.0001 1 100 100 0.602255619 602.255619ms
mx1 32 2 10000000 0.0001 1 100 100 1.489152483 1.489152483s
mx1 32 2 10000000 0.0001 1 100 100 1.469110205 1.469110205s
mx2 32 64 10000000 0.0001 1 100 100 8.84282034 8.84282034s
Ok, so my gnuplot script works, but now I have a lot of points
for each x.
I would like to take max throughput (lowest time, in column 9),
and only use that. Or maybe median. (Definitively not average
And I didn't feel like doing this in `awk`
So started exploring rdmd a bit:
First or second attempt:
`rdmd --eval='float[][int] g; foreach (line;
stdin.byLine.filter!(x=>x.matchFirst("^mx1"))) { auto a =
line.split; auto c=a[2].to!int; auto rate=c * a[3].to!float /
a[8].to!float; g[c] ~= rate; } foreach (c, values; g) {
writeln(c, " ", values.reduce!max); }' < foo.txt`
Removed redundant `()` to make code shorter.
45 2.29471e+07
26 2.25617e+07
52 2.26505e+07
43 2.30352e+07
17 2.32184e+07
34 2.33697e+07
60 2.26649e+07
61 2.25918e+07
Ok. "Works"
That is not good for few reasons.
1) Still kind of long
2) Cannot easily embed into gnuplot script, because of usage of
both `'`, and `"`
3) I do group by `c`, using map (associative array), but that
means during print, it will be unordered. If I switch to plotting
using line instead of default point, I want ascending order,
otherwise plot will be a chaos of lines. This could be fixed by
piping output to `sort -n -k 1`, but a) is less efficient, b)
makes things even longer. Obvious way would be to remember
previous `c`, and aggregate on a fly. Faster, ordered by design
(because input is ordered), less memory usage.
Next attempt (not fully correct), trying to rectify few things
incrementally, not shooting for the perfect solution yet, just
exploring a bit more:
`rdmd --eval='auto prev_c = 0; auto max_rate=0.0; foreach (a;
stdin.byLine.filter!(x=>x.matchFirst(``^mx1``)).map!split) { auto
c=a[2].to!int; auto rate=c * a[3].to!float / a[8].to!float; if
(prev_c != c) { writeln(prev_c, `` ``, max_rate); max_rate=0;}
prev_c=c;max_rate=max(max_rate,rate);}' < foo.txt`
0 0
1 1.81999e+07
2 1.3897e+07
3 1.68113e+07
4 1.77501e+07
5 1.77466e+07
6 2.00162e+07
7 2.00754e+07
8 2.24083e+07
9 2.43998e+07
63 2.24421e+07
Some progress, but not quite there (obviously). We do not output
line for 64, because check for `prev_c!=c` is only in a loop, but
we should have a `writeln` again after a loop.
Lets fix this then.
`rdmd --eval='auto prev_c = 0; auto max_rate=0.0; foreach (a;
stdin.byLine.filter!(x=>x.matchFirst(``^mx1``)).map!split) { auto
c=a[2].to!int; auto rate=c * a[3].to!float / a[8].to!float; if
(prev_c != c) { writeln(prev_c, `` ``, max_rate); max_rate=0;}
prev_c=c;max_rate=max(max_rate,rate);} writeln(prev_c, `` ``,
max_rate); max_rate=0;' < foo.txt`
A bit hairy but does the job. (Still prints 0, but that is easy
to fix with something like `if (prev_c != c && prev_c)`
Lets reimplement in awk, for an unfair comparison:
`awk 'BEGIN{prev_c = 0; max_rate=0.0;} /^mx1/{ c=$3;
rate=c*$4/$9; if (prev_c != c) { print prev_c, max_rate;
max_rate=0;} prev_c=c;if(rate>max_rate)max_rate=rate;} END{print
prev_c, max_rate;}' < foo.txt`
Quite a bit shorter.
There things that would be hard to do in D, but still possible.
`auto x = ...`, replace with `x:=...` (like in Go). This could be
done with a simple preprocessor (even just a `sed -E -e
's/([a-zA-Z0-9_]+) *:=/auto \1=/g'` before passing to `gdmd`.
`/regexp/{} /regexp/{}`, and `foreach (a......)`, replace with a
an abstraction for doing this for us.
Should be possible to implement, probably with API like this:
each( // implicitly on stdin.byLine()
(a, m) => { // a is just line split on whitespaces,
// m is regexp match groups (optional)
c := a[2].to!int;
..., // more matchers.
..., // All matching matchers are executed in order, not
just the first one.
..., // delegate with no preceding matcher, is equivalent to
".*" matching.
We can accept both `void` delegates, or ones returning `int`,
i.e. if we want to do something like loop `break`. But in
scripting, instead of `break` in main loop, you will usually just
exit whole script. So not super useful. (`continue` works by just
returning from void delegate, so not a concern).
More advance `each` could allow multiple predicates, multiple
regexps, and possibly some conditions (`&&`, `||`). Can invent a
mini DSL for this, or use operator overloading for this (maybe,
as not all operators are overloadable in D, i.e. overloading
comparison operators is very problematic in D, it was possible in
D1, but not in D2).
We can also add original full line (unsplitted) as a first
element of the `a`, so `a[0]` is just like awk `$0` (whole line),
and `a[1]` is just like `$1` (columns, with first one being `$1`).
Note: We do not want to put this `each` implicitly into a runner
script, because often we want to do things before it. This could
be done with something like `--begin`, and `--end`, but is more
verbose. Plus `--begin` and `--end`, would make it harder to port
command line code to file based script.
On the other front of `to!int`, we can do better too. Either
provide helper functions to common type conversions like to!int,
So instead of:
c := a[2].to!int;
we do
c := a[2].INT;
rate:=c * a[3].F32 / a[8].F32;
Ok, how about `each` is smarter, and not only just does input
line split into column of strings (`string[]`), but instead puts
each value into a custom library type, that provides a dynamic
typing. Something like `DynamicTypeValue[]`, but operator
overloading for arithmetic, comparison and toString functions.
c := a[2];
rate := c * a[3] / a[8];
Surely possible.
Lets also add a awk-like print (similar to Python `print`), which
puts space between each argument for us, and for a good measure,
lets use old PHP, `echo` construct, to save one extra character.
How this would look:
`./dm 'prev_c := 0;max_rate:=0.0; each("^mx1", (DT[] a){ c:=a[2];
rate:=c * a[3] / a[8]; if (prev_c != c) { echo(prev_c, max_rate);
max_rate=0;} prev_c=c;max_rate=max(max_rate,rate);});
echo(prev_c, max_rate);' ./foo.txt`
That looks pretty nice. Not optimal, but not too bad. Only 14
more characters than awk (203 bytes, vs 189).
Note: I do not quite have a full solution to `DynamicTypeValue`,
(missing hashing support, so it can be used as a key in
associative array), but prototype is kind of working.
Unfortunately it is not quite working, even with some tries:
$ ./dm ....
Error: cannot implicitly convert expression `c` of type `DT` to
Error: cannot implicitly convert expression `rate` of type `DT`
to `double`
Failed: ["/usr/bin/dmd", "-d", "-v", "-o-",
This boils down to:
int prev_c = 0;
prev_c = DT("1");
not compiling. I defined `opCast`, but this is only for explicit
If I would be able to allow semi-implicit casts for my type, that
would work perfectly.
There was also a small issue with `max`,
`std.algorithm.comparison.max` complains a bit about comparing
`DT` and `double`:
Error: none of the overloads of template
`std.algorithm.comparison.max` are callable using argument types
`!()(double, DT)`
Candidates are: `max(T...)(T args)`
with `T = (double, DT)`
whose parameters have the following constraints:
` T.length >= 2
> !is(CommonType!T == void)
` `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
`max(T, U)(T a, U b)`
with `T = double,
U = DT`
whose parameters have the following constraints:
` > is(T == U)
- is(typeof(a < b))
` `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Tip: not satisfied constraints are marked with `>`
Fair enough, I could provide my own `max` and `min`, and possible
few more functions (i.e. functions like std.math.sqrt, abs, etc),
to operate easily on DT. Hard to do it fully transparently for
everything, but should be possible to cover at least everything
that `awk` has too.
Doing `$1` -> `a[0]`, translation is trivial using some regular
expressions. It could save 2 characters, but that is not a lot.
In summary:
So, in pure form, D language and rdmd, are usable, but rather
verbose (mostly due to `auto`, long function names like writeln,
and extra arguments they require for putting spaces between
argument). But still usable. The script I wrote would probably be
a close to the limit of what would be acceptable, which is not
great, because the example script does very little.
With some hacks, preprocessing, and extra library type and
functions, it is possible to make usage way easier, code way
shorter, and very comparable to awk. (I didn't test other
functions like open, and operating on files), but it should not
be too dissimilar).
Some operator overloading facilities of D programming language
are lacking to fully make it usable tho.
Inability to opt-in to allow implicit opCast casting are making
it not possible to develop fully dynamic and easy to use solution.
What do you think?
For reference, `dm` script
#!/usr/bin/env python3
import os
import re
import subprocess
import sys
code = sys.argv[1]
filenames = sys.argv[2:]
header = """
struct DT {
string x_;
this(string x) { x_ = x; }
this(float x) { x_ = to!string(x); }
this(int x) { x_ = to!string(x); }
// string toString() const { return to!string(x_); }
string toString() const { return x_; }
bool can(T)() const {
try { to!T(x_); } catch { return false; } return true;
bool numeric() const { return can!double(); }
double number() const { return to!double(x_); }
auto opBinary(string op)(const ref DT other) const {
if (numeric() && other.numeric()) {
const n = number();
const m = other.number();
return DT(to!string(mixin("n " ~ op ~ " m")));
throw new Exception("cannot perform " ~ op ~ " on string");
auto opBinary(string op, Other)(const ref Other other) const {
if (numeric()) {
// static assert(is(other : float, double, int, uint));
// TODO(baryluk): We could maybe support adding string too.
Not super useful tho.
// I want dynamic typing, but still to be strong typing.
Not weak like PHP or JavaScript.
return DT(to!string(mixin("number() " ~ op ~ " other")));
// We could possibly allow number + string, and string +
string, and string * int
throw new Exception("cannot perform " ~ op ~ " on string");
// opUnary, -, ~
// negation, ! - i.e. !c, where c is string repreenting
integer, then we for !c we if c == "0", it will be true.
// todo support some bool?
int opCmp(const ref const(DT) other) const {
if (numeric() && other.numeric()) {
const n = number();
const m = other.number();
return (n > m) - (n < m);
if (!numeric() && !other.numeric()) {
return x_ < other.x_;
throw new Exception("cannot compare string with other");
int opCmp(Other)(const ref Other other) const {
// static if (is(Other: int, float, ...));
if (numeric()) {
const n = number();
return (n > other) - (n < other); // Quick hack
static if (is(Other == string)) {
return x_ < other;
} else {
throw new Exception("cannot compare string with other");
bool opEquals(const ref DT other) const {
return this.opCmp(other) == 0;
bool opEquals(Other)(const ref Other other) const {
return this.opCmp(other) == 0;
// This also handled !value
bool opCast(T)() const if (is(T == bool)) {
if (numeric()) {
return !number();
return !x_;
auto opCast(T)() const if (is(T == string)) {
return x_;
auto opCast(T)() const { // if T is numeric, i.e. int, double
pragma(msg, "casting to", T);
return x_.number();
auto opAssign(const ref DT other) {
x_ = other.x_;
return this;
auto opAssign(Other)(const ref Other other) {
x_ = to!string(other);
return this;
void echo(T...)(T args) {
foreach (arg; args[0..$-1]) {
write(' ');
void each(D)(string re, D dg) { // just an initial prototype
foreach (line; stdin.byLine) {
if (line.matchFirst(re)) {
dg(!split().map!(x=>new DT(x))());
code = re.sub(r"([a-zA-Z_][a-zA-Z0-9_]*) *:=", r" auto \1=", code)
# print(header+code)
with subprocess.Popen(["rdmd", f"--eval={header}{code}"],
stdin=subprocess.PIPE, text=True) as p:
for filename in filenames:
with open(filename) as f:
for line in f:
More information about the Digitalmars-d
mailing list