range behaviour
via Digitalmars-d
digitalmars-d at puremagic.com
Wed May 14 14:05:21 PDT 2014
On Wednesday, 14 May 2014 at 19:32:57 UTC, Marc Schütz wrote:
> Compile this with:
>
> ldc2 -O3 -release -output-s demo.d
>
> and have a look a the generated assembly code for the
> Filter.sum() functions. All of them have been inlined as much
> as possible, and in particular the variable `is_initialized`
> has disappeared, even in the version that uses an external
> (unknown to the compiler) predicate.
I only have DMD installed, but I trust your word for it. However
I don't feel confident that compilers will ever catch up, even
for simple loops.
Take for instance floating point. Floating point math is
inaccurate, that means the compiler will have to guess what kind
of accuracy you are happy with… It can't, so it can't optimize
real well, even when loop unrolling. Why is that? Well, because
even if it can create SIMD instructions it cannot decouple the
dependencies between consecutive elements without creating drift
between the calculations over time. It has to assume the worst
case.
If you have a simple generator like this:
sample[n] = f( sample[n-1] )
you could in theory do
sample[n] = f(f(f(f( sample[n-4] ))))
sample[n+1] = f(f(f(f( sample[n-3] ))))
sample[n+2] = f(f(f(f( sample[n-2] ))))
sample[n+3] = f(f(f(f( sample[n-1] ))))
But because of floating point inaccuracies you would then risk
that the sample[BIGNUMBER] and sample[BIGNUMBER+1] is completely
disconnected which could be a disaster. So only hand optimization
and analysis of the math and stability is sufficient to get the
SIMD speed-up.
> This means that we can have implementations even without a
> guaranteed call to `empty`, and still have comparable
> performance to eagerly initialized ranges where it matters most.
Maybe you can. :-) I will have to get my hands on ldc and try to
create a counter example… Hm.
More information about the Digitalmars-d
mailing list