System programming in D (Was: The God Language)

Thu Dec 29 03:19:57 PST 2011

On Thursday, 29 December 2011 at 09:16:23 UTC, Walter Bright 
wrote:
> Are you a ridiculous hacker? Inline x86 assembly that the 
> compiler actually understands in 32 AND 64 bit code, hex string 
> literals like x"DE ADB EEF" where spacing doesn't matter, the 
> ability to set data alignment cross-platform with type.alignof 
> = 16, load your shellcode verbatim into a string like so: auto 
> str = import("shellcode.txt");

I would like to talk about this for a bit. Personally, I think 
D's system programming abilities are only half-way there. Note 
that I am not talking about use cases in high-level application 
code, but rather low-level, widely-used framework code, where 
every bit of performance matters (for example: memory copy 
routines, string builders, garbage collectors).

In-line assembler as part of the language is certainly neat, and 
in fact coming from Delphi to C++ I was surprised to learn that 
C++ implementations adopted different syntax for asm blocks. 
However, compared to some C++ compilers, it has severe 
limitations and is D's only trick in this alley.

For one thing, there is no way to force the compiler to inline a 
function (like __forceinline / __attribute((always_inline)) ). 
This is fine for high-level code (where users are best left with 
PGO and "the compiler knows best"), but sucks if you need a 
guarantee that the function must be inlined. The guarantee isn't 
just about inlining heuristics, but also implementation 
capabilities. For example, some implementations might not be able 
to inline functions that use certain language features, and your 
code's performance could demand that such a short function must 
be inlined. One example of this is inlining functions containing 
asm blocks - IIRC DMD does not support this. The compiler should 
fail the build if it can't inline a function tagged with 
@forceinline, instead of shrugging it off and failing silently, 
forcing users to check the disassembly every time.

You may have noticed that GCC has some ridiculously complicated 
assembler facilities. However, they also open the way to the 
possibilities of writing optimal code - for example, creating 
custom calling conventions, or inlining assembler functions 
without restricting the caller's register allocation with a 
predetermined calling convention. In contrast, DMD is very 
conservative when it comes to mixing D and assembler. One time I 
found that putting an asm block in a function turned what were 
single instructions into blocks of 6 instructions each.

D's lacking in this area makes it impossible to create language 
features that are on the level of D's compiler built-ins. For 
example, I have tested three memcpy implementations recently, but 
none of them could beat DMD's standard array slice copy (despite 
that in release mode it compiles to a simple memcpy call). Why? 
Because the overhead of using a custom memcpy routine negated its 
performance gains.

This might have been alleviated with the presence of sane macros, 
but no such luck. String mixins are not the answer: trying to 
translate macro-heavy C code to D using string mixins is string 
escape hell, and we're back to the level of shell scripts.

We've discussed this topic on IRC recently. From what I 
understood, Andrei thinks improvements in this area are not 
"impactful" enough, which I find worrisome.

Personally, I don't think D qualifies as a true "system 
programming language" in light of the above. It's more of a 
compiled language with pointers and assembler. Before you 
disagree with any of the above, first (for starters) I'd like to 
invite you to translate Daniel Vik's C memcpy implementation to 
D: http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It 
doesn't even use inline assembler or compiler intrinsics.