[Issue 6256] [patch] std.algorithm.map does not support static arrays and has 'length' for narrow strings.

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Fri Dec 5 17:31:33 PST 2014


https://issues.dlang.org/show_bug.cgi?id=6256

--- Comment #5 from hsteoh at quickfur.ath.cx ---
Sure, open another enhancement request for this.

Loop-unrolling is already done by GDC (and probably LDC) anyway, so it's not
clear that having many copies of a function, each with a different static array
length but otherwise identical, which is a lot of template bloat, would give
any clear advantages over a single, general purpose function that takes length
as a runtime parameter. Advanced loop unrolling, of the kind that I've observed
gdc do, can do things like unrolling a loop into blocks of n iterations, then
if the incoming array length is exactly n, it just runs through the entire
block with no conditionals until the end, or if the length is < n, it branches
to the middle of the block so that no conditionals are evaluated until the end.
This reduces pressure on the CPU cache (no risk of the entire code page
containing the function being filled with other instantiations of the same
function with other static array lengths, which means an RAM roundtrip between
calling/exiting the function, when the single function could have fit on the
same page as its caller, thus eliminating a RAM roundtrip).

In any case, the picture is complicated by the complexity of modern hardware,
so it's hard to say one way or another which approach will give better
performance. If you have many different-sized static arrays in your program,
the template bloat can become quite horrendous (e.g., if you have static arrays
of all lengths from 1 to 10, and all of them were unrolled, you'd have 55
copies of the loop body), whereas an advanced loop unroller could accomplish
basically the same thing plus just 1 branch, and without any of the template
bloat (at most 10 copies of the loop body would be necessary to ensure no
conditionals for all loops over the arrays until the end). Exactly where to
draw the line is quite machine-dependent, and I would rather let the optimizer
do its job instead of trying to force template instantiations that may not wind
up actually doing any better.

--


More information about the Digitalmars-d-bugs mailing list