Ready for review: new std.uni

Fri Jan 11 21:17:13 PST 2013

On Friday, 11 January 2013 at 20:57:57 UTC, Dmitry Olshansky 
wrote:
> You can print total counts after each bench, there is a TLS 
> varaible written at the end of it. But anyway I like your 
> numbers! :)

Okay, I couldn't resist having a short look at the results, 
specifically the benchmark of the new isSymbol implementation, 
where LDC beats DMD by roughly 10x. The reason for the nice 
performance results is mainly that LDC optimizes the classifyCall 
loop containing the trie lookup down to the following fairly 
optimal piece of code (eax is the overall counter that gets 
stored to lastCount):

---
   40bc90:       8b 55 00                mov    edx,DWORD PTR 
[rbp+0x0]
   40bc93:       89 d6                   mov    esi,edx
   40bc95:       c1 ee 0d                shr    esi,0xd
   40bc98:       40 0f b6 f6             movzx  esi,sil
   40bc9c:       0f b6 34 31             movzx  esi,BYTE PTR 
[rcx+rsi*1]
   40bca0:       48 83 c5 04             add    rbp,0x4
   40bca4:       0f b6 da                movzx  ebx,dl
   40bca7:       c1 e6 05                shl    esi,0x5
   40bcaa:       c1 ea 08                shr    edx,0x8
   40bcad:       83 e2 1f                and    edx,0x1f
   40bcb0:       09 f2                   or     edx,esi
   40bcb2:       41 0f b7 14 50          movzx  edx,WORD PTR 
[r8+rdx*2]
   40bcb7:       c1 e2 08                shl    edx,0x8
   40bcba:       09 da                   or     edx,ebx
   40bcbc:       48 c1 ea 06             shr    rdx,0x6
   40bcc0:       4c 01 ca                add    rdx,r9
   40bcc3:       48 8b 14 d1             mov    rdx,QWORD PTR 
[rcx+rdx*8]
   40bcc7:       48 0f a3 da             bt     rdx,rbx
   40bccb:       83 d0 00                adc    eax,0x0
   40bcce:       48 ff cf                dec    rdi
   40bcd1:       75 bd                   jne    40bc90
---

The code DMD generates for the lookup, on the other hand, is 
pretty ugly, including several values being spilled to the stack, 
and also doesn't get inlined.

This is, of course, just a microbenchmark, but it is cases like 
this which make me wish that we would just use LLVM (or GCC, for 
that matter) for the reference compiler – and I'm not talking 
about the slightly Frankensteinian endeavor that LDC is here. 
Walter, my intention is not at all to doubt your ability at a 
compiler writer; we all know the stories of how you used to annoy 
the team leads at the big companies by beating their performance 
numbers single-handedly, and I'm sure you could e.g. fix your 
backend to match the performance of the LDC-generated code for 
Dmitry's benchmark in no time. The question is just: Are we as a 
community big, resourceful enough to justify spending time on 
that?

Sure, there would still be things we will have to fix ourselves 
when using another backend, such as SEH support in LLVM. But 
performance will always be a central selling point of a language 
like D, and do we really want to take the burden of keeping up 
with the competition ourselves, when we can just draw on the work 
of full-time backend developers at Intel, AMD, Apple and others 
for free? Given the current developments in microprocessors and 
given that applications such as graphics and scientific computing 
are naturally a good fit for D, what's next? You taking a year 
off from active language development to implement an 
auto-vectorizer for your backend?

I know this question has been brought up before (if never really 
answered), and I don't want to start another futile discussion, 
but given the developments in the compiler/languages landscape 
over the last few years, it strikes me as an increasingly bad 
decision to stick with an obscure, poorly documented backend 
which nobody knows how to use – and nobody wants to learn how to 
use either, because, oops, they couldn't even redistribute their 
own work.

Let's put aside all the other arguments (most of which I didn't 
even mention) for a moment, even the performance aspect; I think 
that the productivity aspect alone, both regarding duplicated 
work and accessibility of the project to new developers, makes it 
hard to justify forging leveraging the momentum of an established 
backend project like LLVM. [1]

Maybe it is naïve to think that the situation could ever change 
for DMD. But I sincerely hope that the instant a promising 
self-hosted (as far as the frontend goes) compiler project shows 
up at the horizon, it will gain the necessary amount of official 
endorsement – and manpower, especially in the form of your 
(Walter's) expertise – to make that final, laborious stretch to 
release quality. If we just sit there and wait for somebody to 
come along with a new production-ready compiler which is better, 
faster and shinier than DMD, we will wait for a long, long time – 
this might happen for a Lisp dialect, but not for D.

Sorry for the rant, [2]
David

[1] The reasons for which I'm focusing on LLVM here are not so 
much its technical qualities as its liberal BSD-like license – if 
it is good enough for Apple, Intel (also a compiler vendor) and 
their lawyer teams, it is probably also for us. The code could 
even be integrated into commercial products such as DMC without 
problems.

[2] And for any typos which might undermine my credibility – it 
is way too early in the morning here.