[GSoC] 'Independency of D from the C Standard Library' progress and update thread

Thu Sep 5 21:17:07 UTC 2019

On Thu, Sep 05, 2019 at 08:16:24PM +0000, 12345swordy via Digitalmars-d wrote:
[...]
> - It is easier to debug and read in the d langauge then in the c language.
> - I was shown faster memory allocation speed compared to libc.
> - other memory allocator are not part of d langauge standard library.
> 
> Most importantly a yet another disappointed development I seen in
> regards to the development of the d language.
[...]

Read the discussion that Stefanos referred to. Here are some of the
key blocking issues:

- C library APIs like memcpy, memset, etc., are not only in the C
  library, but are often implemented as *intrinsics* in compilers. One
  of the most important effects of this is that optimizers recognize
  them and understand their semantics, and can sometimes produce better
  code because of that. For example:

	int x, y=5;
	memcpy(&x, &y, int.sizeof); // C version
	... // optimizer knows that now x==5.

  Using a D version of memcpy in the above code can mean that the
  optimizer does *not* recognize that x==5, which can lead to poorer
  performance.

- Even if the previous point isn't an issue, there's still the problem
  of maintenance: the D version of mem* needs to be continuously updated
  because hardware is constantly evolving, and it takes significant
  manpower to (1) port the implementation to every supported
  architecture, (2) make sure they take maximum advantage of the quirks
  of the targeted platform, and (3) checking that they are actually
  faster than the C implementations (which is available on basically any
  new platform anyway).

- D already has syntax for abstractly representing a memcpy operation:
  a[] = b[]. This syntax is type-safe, memory-safe, and the compiler can
  lower it to whatever it likes, including memcpy, or a custom
  implementation specialized for the target platform. That's where such
  primitives really belong, actually. (Historically they went into the C
  library, but these days compilers are more and more building them into
  intrinsics that can drive various codegen strategies (inlining,
  arch-specific optimizations, etc). They're gradually becoming more
  like low-level compiler primitives than your average C library
  functions.)

The current work Stefanos has produced has a big performance impact
mainly only in DMD, which is known to have a weak optimizer, and anyone
who cares about runtime performance ought to be using GDC or LDC anyway.
In GDC/LDC using these custom D implementations wind up being worse
because they defeat the respective optimizers (they no longer recognize
memcpy/etc. semantics from these functions, so can't optimize based on
that).  So lot of the effort ended up being directed towards working
around flaws in DMD's optimizer rather than producing *actual*
improvement over C's mem* primitives. This is really the wrong way to go
about things IMO; we should rather be fixing DMD's optimizer instead.
But once that's done there's even less reason to implement mem*
ourselves.

Note that this does not preclude the D compiler from, e.g., translating
statements like `a[] = b[]` into target-optimized instructions instead
of calling a function named 'memcpy'.  I'd argue that it's the
compiler's job (more specifically, the optimizer's job) to do the best
translation of a[] = b[] into machine code, not the standard library's
problem to account for N versions of M platforms in a gigantic
unmaintainable block of static if'd (or version'd) custom
implementation, whose only real value is to be able to pat ourselves in
the back that yes, we have our own memcpy/memset/etc., implementation
that we wrote in D, just because we can.  Porting the D compiler to a
new architecture already requires codegen work anyway, and work on
memory-copying/moving primitives really should be included under that
umbrella, rather than poorly reinvented in the runtime library.

T

-- 
Curiosity kills the cat. Moral: don't be the cat.