More on "Component Programming"
bearophile
bearophileHUGS at lycos.com
Mon May 27 14:36:10 PDT 2013
This simple task on Rosettacode site is useful to show some uses
of Phobos and the "component programming" recently discussed by
Walter (other languages use a different name to denote the same
idea).
Given a dictionary file of different words, it asks to find any
of the longest anagram pairs, that also share no equal chars in
the same position (so they are named deranged anagrams):
http://rosettacode.org/wiki/Anagrams/Deranged_anagrams#D
There are many ways to do this in D+Phobos. The following
solution is long, but it's quite fast (the "warmed up" run-time
is only about 0.03 seconds with a dictionary of about 200 KB, on
an old CPU core), I have chosen it over simple solutions because
it gives me a chance to discuss certain things:
import std.stdio, std.file, std.algorithm, std.string,
std.typecons, std.range, std.functional;
auto findDeranged(in string[] words) pure /*nothrow*/ {
//return words.pairwise.filter!(ww=> ww[].zip.all!q{a[0] !=
a[1]});
Tuple!(string, string)[] result;
foreach (immutable i, const w1; words)
foreach (const w2; words[i + 1 .. $])
if (zip(w1, w2).all!q{ a[0] != a[1] })
result ~= tuple(w1, w2);
return result;
}
void main() {
Appender!(string[])[30] wClasses;
foreach (word;
std.algorithm.splitter("unixdict.txt".readText))
wClasses[$ - word.length] ~= word;
"Longest deranged anagrams:".writeln;
foreach (words; wClasses[].map!q{ a.data
}.filter!(not!empty)) {
string[][const ubyte[]] anags; // Assume ASCII input.
foreach (w; words)
anags[w.dup.representation.sort().release.idup] ~= w;
auto pairs = anags.byValue.map!findDeranged.join;
if (!pairs.empty)
return writefln(" %s, %s", pairs.front[]);
}
}
- - - - - - - - - - - -
That program contains five foreach loops. Foreach loops are not
evil and I like them, but for a certain kind of programming
(discussed recently by Walter, and also common in F# and other
languages) every time you use a for/foreach it's one small
"failure" for the standard library :-)
The following weird (untested and maybe buggy) program replaces
all the foreach loops with higher order functions and other
library functions. It can't be compiled because it uses some
things not yet present in Phobos (on the Rosettacode page there
is also a slower and simpler D solution of this problem that uses
only one foreach):
void main() {
import std.stdio, std.file, std.algorithm, std.string,
std.typecons, std.range, std.functional;
"unixdict.txt"
.readText
.splitter
.classify!q{ a.length }
.map!q{ a.values } // .byValue is almost OK.
.array
.schwartzSort!q{ -a[0].length }
.release
.map!(words => words
.classify!q{ a
.dup
.representation
.sort()
.release
.idup }
.byValue
.map!(words => words
.pairwise
.filter!(ww => ww[]
.zip
.all!q{ a[0] !=
a[1] }))
.join)
.filter(not!empty)
.front[]
.binaryReverseArgs!writefln(" %s, %s");
}
A copy of the same code if the newsgroup has messed up the
formatting and indents, turning that code into a soup:
http://codepad.org/L4TyDkcQ
I am not suggesting you to write whole D script-like programs in
this strange style. But I think Phobos should offer all the tools
to write a program like this, because even if you don't want to
write a whole little program in this style, you sometimes want to
use some parts of it or some other parts of it, so I think all
the most common and distinct micro-patterns should be contained
in Phobos.
- - - - - - - - - - - -
"binaryReverseArgs" is in the std.functional module. Here it
allows the use of writefln in UFCS style, inverting the
formatting string position. I think I'd like a shorter and more
handy name for it. In Haskell it's named "flip", and its usage is
not uncommon.
- - - - - - - - - - - -
"classify" is a simple function, that given a forward range of T
and an optional function T->K, returns an associative array
T[][K]. (Arrays are used by default as values. But maybe you can
optionally specify a different type of values, like Appenders,
Arrays, sets, etc). (Currently in Phobos the only function to
build an associative array is std.array.assocArray, but here we
need something different).
(http://d.puremagic.com/issues/show_bug.cgi?id=5502 ).
[1, 7, 6, 3, 2].classify!(x => x % 2 ? "odd": "even").writeln;
==>
["odd": [1, 7, 3], "even": [6, 2]]
- - - - - - - - - - - -
"pairwise" is a very useful lazy range similar to
cartesianProduct, but it yields only the ordered pairs, so they
cover only about half (a triangle) of the square matrix of the
possibilities.
(http://d.puremagic.com/issues/show_bug.cgi?id=6788 ).
This simple example shows the difference:
import std.stdio, std.algorithm;
void main() {
auto data = [1, 2, 3, 4];
foreach (xy; cartesianProduct(data, data))
writeln(xy);
}
Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)
While:
import std.stdio, std.range;
void main() {
auto data = [1, 2, 3, 4];
foreach (tup; pairwise(data))
writeln(tup);
}
Should generate:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
In the Python standard library there is a lazy generator that's
more general than pairwise:
>>> from itertools import combinations
>>> list(combinations([1, 2, 3, 4], 2))
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
So if you prefer that more general solution the D code becomes:
...
.map!(words => words
.combinations(2)
.filter!(ww => ww[]
...
Bye,
bearophile
More information about the Digitalmars-d
mailing list