Recommendations on avoiding range pipeline type hell

H. S. Teoh hsteoh at quickfur.ath.cx
Sun May 16 15:27:43 UTC 2021


On Sat, May 15, 2021 at 11:25:10AM +0000, Chris Piker via Digitalmars-d-learn wrote:
[...]
> Basically the issue is that if one attempts to make a range based
> pipeline aka:
> 
> ```d
> auto mega_range = range1.range2!(lambda2).range3!(lambda3);
> ```
> Then the type definition of mega_range is something in the order of:
> 
> ```d
>   TYPE_range3!( TYPE_range2!( TYPE_range1, TYPE_lamba2 ), TYPE_lambda3));
> ```
> So the type tree builds to infinity and the type of `range3` is very
> much determined by the lambda I gave to `range2`.  To me this seems
> kinda crazy.

Perhaps it's crazy, but as others have mentioned, these are Voldemort
types; you're not *meant* to know what the concrete type is, merely that
it satisfies the range API. It's sorta kinda like the compile-time
functional analogue of a Java-style interface: you're not meant to know
what the concrete derived type is, just that it implements that
interface.


[...]
> But, loops are bad.  On the D blog I've seen knowledgeable people say
> all loops are bugs.

I wouldn't say all loops are bugs. If they were, why does D still have
looping constructs? :D  But it's true that most loops should be
refactored into functional-style components instead. Nested loops are
especially evil if written carelessly or uncontrollably.


> But how do you get rid of them without descending into Type Hell(tm).

Generally, when using ranges you just let the compiler infer the type
for you, usually with `auto`:

	auto mySuperLongPipeline = inputData
		.map!(...)
		.filter!(...)
		.splitter!(...)
		.joiner!(...)
		.whateverElseYouGot();


> Is there anyway to get some type erasure on the stack?

You can wrap a range in a heap-allocated OO object using the helpers in
std.range.interfaces, e.g., .inputRangeObject.  Then you can use the
interface as a handle to refer to the range.

Once I wrote a program almost entirely in a single pipeline.  It started
from a math function, piped into an abstract 2D array (a generalization
of ranges), filtered, transformed, mapped into a color scheme,
superimposed on top of some rendered text, then piped to a
pipeline-based implementation of PNG-generation code that produced a
range of bytes in a PNG file that's then piped into
std.stdio.File.bufferedWrite.  The resulting type of the main pipeline
was so hilariously huge, that in an older version of dmd it produced a
mangled symbol several *megabytes* long (by that I mean the *name* of
the symbol was measured in MB), not to mention tickled several O(N^2)
algorithms in dmd that caused it to explode in memory consumption and
slow down to an unusable crawl.

The mangled symbol problem was shortly fixed, probably partly due to my
complaint about it :-P -- kudos to Rainer for the fix!

Eventually I inserted this line into my code:

        .arrayObject    // Behold, type erasure!

(which is my equivalent of .inputRangeObject) and immediately observed a
significant speedup in compilation time and reduction in executable
size. :-D

The pipeline-based PNG emitter also leaves a lot to be desired in terms
of runtime speed... if I were to do this again, I'd go for a traditional
imperative-style PNG generator with hand-coded loops instead of the
fancy pipeline-based one I wrote.

Pipelines are all good and everything, but sometimes you *really* just
need a good ole traditional OO-style heap allocation and hand-written
loop.  Don't pick a tool just because of the idealism behind it, I say,
pick the tool best suited for the job.


T

-- 
Computers aren't intelligent; they only think they are.


More information about the Digitalmars-d-learn mailing list