[GSoC] 'Independency of D from the C Standard Library' progress and update thread
Stefanos Baziotis
sdi1600105 at di.uoa.gr
Sat Jul 6 16:10:28 UTC 2019
On Saturday, 6 July 2019 at 15:33:44 UTC, Piotrek wrote:
>
>
> I used the old repo for Dmemset. With Dmemutils it works now. I
> removed static foreach from benchmark.d in order to run gdc.
> Text results:
> https://github.com/PiotrekDlang/Dmemutils/tree/master/Dmemset/output
>
Great, earlier today I realized that there were problems with
static foreach,
so now it's only using mixin in the main repo.
Basically, I should have been able to do:
version (GNU)
{
// mixin
}
else
{
static foreach
}
but that didn't work, meaning GDC tried to compile static foreach
Anyway, the benchmarks look good. In DMD, small sizes are not so
good but the big
ones are better. But DMD is not the focus, since it now changed
to GDC, LDC.
If you're interested, there are a lot of things to say regarding
optimization for DMD. Some have been said in this thread as
initially the project was focused on DMD. I'm actually thinking
of writing an article so that maybe I can help the next guy that
tries to optimize for DMD. I don't think it's a good decision to
care at all about optimization in DMD, but one might do. And it's
a hard road.
A tl;dr is that, for me at least, the only way to reach parity
with libc is using (inline) ASM.
But the important benchmarks are for GDC, LDC, which agree with
my benchmarks
on AMD and the result is that Dmemset reaches total parity with
libc memset().
That's great to have from an Intel user as well, thanks for your
time!
>
> It seems it wasn't related to this change. Looks like heisen
> optimization.
>
Again, DMD. Quite an unexpected compiler.
>
> Funnily enough, DMD (with Dmemset) holds the speed record, over
> 50 GB/s, copying some big block sizes.
>
DMD might have been able to get these results
due to inlining that was unrelated to the actual function (i.e.
the benchmark code got inlined).
>
> However, aren't smaller sizes more important?
>
Again, fortunately DMD is not the focus but I guess one way one
can somewhat answer this question is to do a report of the sizes
used in the D runtime, since this is targeted to the D runtime.
Something like this:
https://forum.dlang.org/post/jdfiqpronazgglrkmwfq@forum.dlang.org
But this is not enough. A big part of optimization is to know the
most
common cases (which could be the data format, size, hardware
etc.) and optimize
for that first. And this is not adequate to show us the most
common cases.
- For one, eventually different sizes might be added or removed
and so the
common cases might change.
- Someone might want to use this function outside of the D
runtime.
So, Dmemset() should be even or better than libc, which is
(currently) achieved.
Note something interesting. GDC gets these results with the naive
version. This
version is literally a 8-lines for loop.
>
> One issue is it should be tested on all variation of HW and OS.
> At least it can be placed in experimental module.
Right, it's currently PR'd to the D runtime:
https://github.com/dlang/druntime/pull/2662
Just like you said, in an experimental module. :P
Best regards,
Stefanos
More information about the Digitalmars-d
mailing list