range behaviour

Wed May 14 14:05:21 PDT 2014

On Wednesday, 14 May 2014 at 19:32:57 UTC, Marc Schütz wrote:
> Compile this with:
>
>     ldc2 -O3 -release -output-s demo.d
>
> and have a look a the generated assembly code for the 
> Filter.sum() functions. All of them have been inlined as much 
> as possible, and in particular the variable `is_initialized` 
> has disappeared, even in the version that uses an external 
> (unknown to the compiler) predicate.

I only have DMD installed, but I trust your word for it. However 
I don't feel confident that compilers will ever catch up, even 
for simple loops.

Take for instance floating point. Floating point math is 
inaccurate, that means the compiler will have to guess what kind 
of accuracy you are happy with… It can't, so it can't optimize 
real well, even when loop unrolling. Why is that? Well, because 
even if it can create SIMD instructions it cannot decouple the 
dependencies between consecutive elements without creating drift 
between the calculations over time. It has to assume the worst 
case.

If you have a simple generator like this:

sample[n] = f( sample[n-1] )

you could in theory do

sample[n] = f(f(f(f( sample[n-4] ))))
sample[n+1] = f(f(f(f( sample[n-3] ))))
sample[n+2] = f(f(f(f( sample[n-2] ))))
sample[n+3] = f(f(f(f( sample[n-1] ))))

But because of floating point inaccuracies you would then risk 
that the sample[BIGNUMBER] and sample[BIGNUMBER+1] is completely 
disconnected which could be a disaster. So only hand optimization 
and analysis of the math and stability is sufficient to get the 
SIMD speed-up.

> This means that we can have implementations even without a 
> guaranteed call to `empty`, and still have comparable 
> performance to eagerly initialized ranges where it matters most.

Maybe you can. :-) I will have to get my hands on ldc and try to 
create a counter example… Hm.