My ACCU 2016 keynote video available online
Jens Müller via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Thu May 19 01:12:13 PDT 2016
On Monday, 16 May 2016 at 13:46:11 UTC, Andrei Alexandrescu wrote:
> Uses D for examples, showcases Design by Introspection, and
> rediscovers a fast partition routine. It was quite well
> received. https://www.youtube.com/watch?v=AxnotgLql0k
>
> Andrei
Nice presentation.
The code applying the sentinel optimization assumes mutability of
the input.
That needs to be checked for. That's fine for partition because
that is assumed
to be in-place. But for other algorithms it's not so obvious.
It's sad that the
optimization works only for non-const input. It is in conflict
with the advice
to make input const if the function doesn't change it. This makes
the
optimization less likely to be applicable. One might though relax
the const
requirement to mean "the input is identical at return of the
function to its
beginning". But that's a different story, I'll guess. Coming up
with another
implementation might also work, using chain or so. But typically
the sentinel
optimization assumes mutability.
I didn't get the idea behind sentinels for sparse dot product. I
picked the
smallest of the last elements (so you need bidirectional ranges)
and fix up as
needed. For gdc I get a speedup (baseline over new
implementation) of 1.2 in
best case and >1.0 in worst case. On average it's about 1.1 I
would say. I
expected more. How would you approach sentinels with the sparse
dot product. Can
you elaborate the idea from the video? I didn't get it.
The base line (dot1 in the graphs) is the straightforward version
---
size_t i,j = 0;
double sum = 0;
while (i < a.length && j < b.length)
{
if (a[i].index < b[j].index) i++;
else if (a[i].index > b[j].index) j++;
else
{
assert(a[i].index == b[j].index);
sum += a[i].value * b[j].value;
i++;
j++;
}
}
return sum;
---
BTW the effects vary greatly for different compilers.
For example with dmd the optimized version is slowest. The
baseline is
best. Weird. With gdc the optimized is best and gdc's code is
always
faster than dmd's code. With ldc it's really strange. Slower than
dmd. I
assume I'm doing something wrong here.
Used compiler flags
dmd v2.071.0
-wi -dw -g -O -inline -release -noboundscheck
gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
-Wall -g -O3 -fomit-frame-pointer -finline-functions -frelease
-fno-bounds-check -ffast-math
ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
-wi -dw -g -O3 -enable-inlining -release -boundscheck=off
Am I missing some flags?
I uploaded my plots.
- running time https://www.scribd.com/doc/312951947/Running-Time
- speed up https://www.scribd.com/doc/312951964/Speedup
*Disclaimer*
I hope most of this makes sense but take it with a grain of salt.
Jens
PS
It seems the mailinglist interface does not work. I cannot send
replies anymore via mail. I wrote Brad Roberts but no answer yet.
More information about the Digitalmars-d-announce
mailing list