[GSoC] 'Independency of D from the C Standard Library' progress and update thread

Thu Sep 5 21:50:04 UTC 2019

On Thursday, 5 September 2019 at 21:17:07 UTC, H. S. Teoh wrote:

Thanks for the descriptive comment! Some comments from me:

>
> Read the discussion that Stefanos referred to. Here are some of 
> the key blocking issues:
>
> - C library APIs like memcpy, memset, etc., are not only in the 
> C
>   library, but are often implemented as *intrinsics* in 
> compilers. One
>   of the most important effects of this is that optimizers 
> recognize
>   them and understand their semantics, and can sometimes 
> produce better
>   code because of that. For example:
>
> 	int x, y=5;
> 	memcpy(&x, &y, int.sizeof); // C version
> 	... // optimizer knows that now x==5.
>
>   Using a D version of memcpy in the above code can mean that 
> the
>   optimizer does *not* recognize that x==5, which can lead to 
> poorer
>   performance.
>
> - Even if the previous point isn't an issue, there's still the 
> problem
>   of maintenance: the D version of mem* needs to be 
> continuously updated
>   because hardware is constantly evolving, and it takes 
> significant
>   manpower to (1) port the implementation to every supported
>   architecture, (2) make sure they take maximum advantage of 
> the quirks
>   of the targeted platform, and (3) checking that they are 
> actually
>   faster than the C implementations (which is available on 
> basically any
>   new platform anyway).
>

- For the first 2, let me thank again Manu and Johan helped who 
me realize them! Note also that we don't currently know of a way 
of informing LLVM or GCC
about the semantics and thus get this optimization. The closest 
thing
we have is LLVM  recognizing that a function does what e.g. 
memcpy() does
by name. Which is a bad assumption to build upon.

> - D already has syntax for abstractly representing a memcpy 
> operation:
>   a[] = b[]. This syntax is type-safe, memory-safe, and the 
> compiler can
>   lower it to whatever it likes, including memcpy, or a custom
>   implementation specialized for the target platform. That's 
> where such
>   primitives really belong, actually. (Historically they went 
> into the C
>   library, but these days compilers are more and more building 
> them into
>   intrinsics that can drive various codegen strategies 
> (inlining,
>   arch-specific optimizations, etc). They're gradually becoming 
> more
>   like low-level compiler primitives than your average C library
>   functions.)
>

AFAIK, this is implemented in the druntime. And the druntime
calls memcpy(). Essentially the goal of this project was to create
versions that would be used from the druntime, not the user. 
Other than that,
I agree!

> The current work Stefanos has produced has a big performance 
> impact mainly only in DMD, which is known to have a weak 
> optimizer,

Actually, when I was optimizing for DMD, I used assembly mainly 
because
I had to reach libc in performance. And using DMD, the only way 
to do
that is using assembly. A more useful goal would be to not try to 
reach
libc (certainly not in x86_64). Rather, create optimized versions
but using generic D. Meaning, to optimize purely based on 
algorithms,
with very few assumptions about the hardware. Much like MUSL.

> and anyone who cares about runtime performance ought to be 
> using GDC or LDC anyway. In GDC/LDC using these custom D 
> implementations wind up being worse because they defeat the 
> respective optimizers (they no longer recognize memcpy/etc. 
> semantics from these functions, so can't optimize based on 
> that).

Actually, this project reached libc in LDC, GDC in 1-1 benchmarks 
using D
and SIMD functions (but not ASM). The problem is when used in 
context exactly
for the reasons you described.

> So lot of the effort ended up being directed towards working 
> around flaws in DMD's optimizer rather than producing *actual* 
> improvement over C's mem* primitives.

Yes essentially that was one of my first objections. To 
counter-act
the DMD flaws, you have to write ASM (if you want parity) which 
in turn
brings the question: Then why do it ? This is what libc already 
does.

> This is really the wrong way to go about things IMO; we should 
> rather be fixing DMD's optimizer instead. But once that's done 
> there's even less reason to implement mem* ourselves.

IMHO, I don't think that fixing the DMD optimizer is a good way 
to go.
Rather, as I said above, aim for generic D implementation, 
_without_ SIMD,
based purely on algorithms. This can be useful for systems that 
don't
have libc and since the DMD optimizer does not use intrinsics as 
LLVM / GCC,
the aforementioned problems, are not problems. Essentially, it's 
a win-win
situation.

- Stefanos