My ACCU 2016 keynote video available online
Andrei Alexandrescu via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Thu May 19 05:04:31 PDT 2016
On 5/19/16 4:12 AM, Jens Müller wrote:
> The code applying the sentinel optimization assumes mutability of
> the input. That needs to be checked for.
Indeed. As I mentioned after discussing find, I didn't worry about those
checks assuming they were obvious.
> That's fine for partition
> because that is assumed to be in-place. But for other algorithms it's
> not so obvious. It's sad that the optimization works only for
> non-const input.
Several optimizations only apply to mutable data. Others apply to
immutable data. It's the way things go.
> I didn't get the idea behind sentinels for sparse dot product. I picked the
> smallest of the last elements (so you need bidirectional ranges) and fix
> up as
> needed. For gdc I get a speedup (baseline over new implementation) of
> 1.2 in
> best case and >1.0 in worst case. On average it's about 1.1 I would say. I
> expected more. How would you approach sentinels with the sparse dot
> product. Can
> you elaborate the idea from the video? I didn't get it.
That's the idea - to only need to bounds check on one of the three
possibilities.
What test data did you use?
10%-20% win on dot product is significant because for many algorithms
dot product is a kernel and dominates everything else. For those any win
goes straight to the bottom line.
> The base line (dot1 in the graphs) is the straightforward version
>
> ---
> size_t i,j = 0;
> double sum = 0;
> while (i < a.length && j < b.length)
> {
> if (a[i].index < b[j].index) i++;
> else if (a[i].index > b[j].index) j++;
> else
> {
> assert(a[i].index == b[j].index);
> sum += a[i].value * b[j].value;
> i++;
> j++;
> }
> }
> return sum;
> ---
That does redundant checking. There's a better baseline:
---
if (a.length == 0 || b.length == 0)
return 0;
const amax = a.length - 1, bmax = b.length - 1;
size_t i,j = 0;
double sum = 0;
for (;;)
{
if (a[i].index < b[j].index) {
if (i++ == amax) break;
}
else if (a[i].index > b[j].index) {
bumpJ: if (j++ == bmax) break;
}
else
{
assert(a[i].index == b[j].index);
sum += a[i].value * b[j].value;
if (i++ == amax) break;
goto bumpJ;
}
}
return sum;
---
Then if you add the sentinel you only need the bounds tests in the third
case.
> BTW the effects vary greatly for different compilers.
> For example with dmd the optimized version is slowest. The baseline is
> best. Weird. With gdc the optimized is best and gdc's code is always
> faster than dmd's code. With ldc it's really strange. Slower than dmd. I
> assume I'm doing something wrong here.
>
> Used compiler flags
> dmd v2.071.0
> -wi -dw -g -O -inline -release -noboundscheck
> gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
> -Wall -g -O3 -fomit-frame-pointer -finline-functions -frelease
> -fno-bounds-check -ffast-math
> ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
> -wi -dw -g -O3 -enable-inlining -release -boundscheck=off
>
> Am I missing some flags?
These look reasonable.
> I uploaded my plots.
> - running time https://www.scribd.com/doc/312951947/Running-Time
> - speed up https://www.scribd.com/doc/312951964/Speedup
What is dot2? Could you please redo the experiments with the modified
code as well?
Thanks!
Andrei
More information about the Digitalmars-d-announce
mailing list