[GSoC] 'Independency of D from the C Standard Library' progress and update thread

Tue Jun 4 01:11:44 UTC 2019

On Monday, 3 June 2019 at 22:45:28 UTC, Andrei Alexandrescu wrote:

> At 512 lines including tests, it seems on the involved side. 
> The benchmarks ought to show a hefty improvement to match. Are 
> there benchmark results available?

I did some initial benchmarks at 
https://github.com/JinShil/memcpyD when I made the first 
feasibility study to see if this project was worth pursuing.  The 
initial results were encouraging, which is why we're taking it 
further in this project.

I'll work with Stefanos to get a more polished implementation 
that users can download and run for themselves.

> Quoting the rationale from the motivation in another thread:
>
> 1) C’s implementations are not type-safe and memory-safe.
> 2) C’s implementations have accumulated a lot of cruft over the 
> years.
> 3) Cross-compiling is more difficult as now one should have 
> available and configured a C runtime and toolchain apart from 
> the D runtime. This makes it difficult for D to create 
> freestanding software.

> 4) Type-safety and memory safety (bounds-checking etc.)
> 5) Templates to branch to an optimal implementation at 
> compile-time.
> 6) Inlining, as the branching in C happens at runtime.
> 7) Compile-Time Function Execution (CTFE) and introspection 
> (type info).
>
> My view on formulating motivation is simple: do it like a 
> scientist. Argue the facts. If facts are not available, argue 
> fundaments and universal principles. If such are not available, 
> the motivation is too weak.

Yes, the motivation could be improved, but the time for 
motivating this project was 2 months ago, not now.  Now the 
project is underway, and we need to see it to completion.  The 
focus now should be on providing feedback on the implementations 
not the rationale/motivation.

> (1) checks the "facts" box but has the obvious comeback "then 
> how about a 2-line trusted wrapper over memcpy?" that needs to 
> be explained. Related, obviously people who reach for memcpy() 
> are often not looking for a safe primitive. a[] = b[] is safe, 
> syntactically simple, and could lower to anything including 
> memcpy.

Part of the motivation is so druntime no longer has a hard 
intrinsic dependency on libc.  If you just wrap the libc function 
you're not acheiving that goal.

Now, that being said, it is way out of the scope of this project 
to provide a D implementation of memcpy for all platforms, 
architectures and mircoarchitectures that D supports.  So, we 
need to deal with that.

Before I elaborate further, it's important to understand that 
druntime is currently a monolith that is not architected or 
structures properly.  druntime is supposed to be the language 
implementation, not libc bindings, libc++ bindings, windows 
bindings, linux bindings, low-level code (whatever that means), 
etc.

The language implementation *will* require certain features of 
the underlying operating system and hardware. Some of those 
features may be provided by libc, but that decision should be 
made on a platform-by-platform basis.  So what we hope to achieve 
with this project is an idiomatic-D memory copy/compare 
interface.  That interface may simply forward to libc for those 
features that don't have an optimized D implementation.  Other 
platforms may choose to implement a highly optimized 
implementation in D.  Other platforms may choose to mix the two 
(e.g. an optimized D implementation for small copies, and forward 
to libc for large copies).  Others may choose to just implement a 
simple while-loop because they either don't want to obtain a C 
toolchain (those cross-compiling to embedded targets) or because 
there isn't C implementation available (new platforms like WASM). 
  This project aims to remove druntime's dependency on libc, but 
the platform port of druntime may still choose to depend on it.

That being said you might be wondering why we are bothering to 
implement an entire memcpy in D for the x86_64 architecture.
1) because DMD's implementation is suboptimal,
2) to help motivate the entire project
3) to demonstrate D as a first-class systems programming language
4) to set an example and precedent for other plaforms to 
potentially follow

Please keep in mind we're trying to expand D to more platforms 
include resource-constrained embedded systems, OS programming, 
bare-metal applications, and new platforms such as WASM.  We want 
D to be more easily portable, and that is partically achieved by 
making a platform abstraction, independent of libc.  libc is a 
platform implementation detail.

> (2) is quite specious and really needs some evidence. Is cruft 
> in memcpy really an issue? I looked memcpy() implementations a 
> while ago but didn't save bookmarks. Did a google search just 
> now and found 
> https://github.com/gcc-mirror/gcc/blob/master/libgcc/memcpy.c, 
> which is very far from cruft-ridden.

That is not the memcpy that is actually on your machine.  You can 
find the more elaborate implementations here:  
https://sourceware.org/git/?p=glibc.git;a=tree;f=sysdeps/x86_64/multiarch;h=14ec2285c0f82b570bf872c5b9ff0a7f25724dfd;hb=HEAD

Another from intel:  
https://github.com/DPDK/dpdk/blob/master/lib/librte_eal/common/include/arch/x86/rte_memcpy.h

> I do remember elaborate implementations of memcpy but so are 
> (somewhat ironically) the 512 lines of the proposed 
> implementation. I found one here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/lib/memcpy_64.S?id=HEAD
>
> No idea of its level of cruftiness, where it's used etc. The 
> right way to argue (2) is to provide links to implementations 
> that people can look at and decide without doubt, "yep, crufty".

The more elaborate C implementations are typically written in 
assembly.  They are difficult to follow due to all of the various 
techniques to handle misalignment and the cleverness typically 
required to achieve the best performance.

It is my hope that this project will explore how D can improve 
such implementations by reducing the cleverness to small isolated 
inline assembly blocks surrounded by D to make it easier to see 
the flow control.  I think D can do that.

> (3) is... odd. Doesn't every machine ever come with a C 
> implementation including a ready-to-link standard library? If 
> not, isn't that a rarity? Again, that should be argued 
> preemptively by the motivation section.

Yes its a rarity, but nevertheless an artificial dependency for 
druntime.

druntime does not sufficiently utilize libc to justify the hard 
dependency.  It just needs a few memory utilities and an 
allocator.  I think it's worthwhile to see if D can do just as 
well without libc.  In fact, if I had my druthers, I'd remove 
libc's malloc altogether today and just add jemalloc to the 
druntime repository.  Maybe it could even be mechanically 
translated to D.

> (4) brings again the wrapper argument

For some platforms, it may just be a wrapper.

> (5) is nice if and only if confirmed by benchmarks

We've already demonstrated this with benchmarks, I'll work with 
Stefanos to get them made available, but 
https://github.com/JinShil/memcpyD already shows the benefit.

> (6) is also nice under the same conditions as (5)

Yep, see my response to (5)

> (7) again... what's wrong with a wrapper that does if (__ctfe)

I think Stefanos is probably arguing in general about the 
design-by-introspection features of D which include CTFE and 
other metaprogramming features which is more-or-less the same as 
(5).  Those benefits have been demonstrated, and we'll work to 
make those more apparent in the near future.

That being said, there's nothing ruling out an `if (__ctfe)` 
block in the implementation if that's what is determined to be 
best.

> With malloc() we're looking at a completely different ballgame. 
> Implementing malloc() from scratch is a very serious project 
> that needs almost overwhelming motivation. The goal of 
> std.experimental.allocator was to offer a flexible framework 
> for implementing general and specialized allocators, but simply 
> replacing malloc() is more difficult to argue. Also, achieving 
> comparable performance will be difficult.

I agree to all of that, but we're going to try it anyway and see 
how it does.  If all we achieve in the end is just a wrapper that 
forwards to libc's malloc and friends, it will still be better 
than what we have now, because libc will then be simply an 
implementation detail.

Mike