Ready for review: new std.uni
Dmitry Olshansky
dmitry.olsh at gmail.com
Sat Jan 12 00:57:07 PST 2013
12-Jan-2013 09:17, David Nadlinger пишет:
> On Friday, 11 January 2013 at 20:57:57 UTC, Dmitry Olshansky wrote:
>> You can print total counts after each bench, there is a TLS varaible
>> written at the end of it. But anyway I like your numbers! :)
>
> Okay, I couldn't resist having a short look at the results, specifically
> the benchmark of the new isSymbol implementation, where LDC beats DMD by
> roughly 10x. The reason for the nice performance results is mainly that
> LDC optimizes the classifyCall loop containing the trie lookup down to
> the following fairly optimal piece of code (eax is the overall counter
> that gets stored to lastCount):
So these are legit? Coooooool!
BTW I'm having about 2-3 times better numbers on DMD 32bits with oldish
AMD K10. Can you test 32bit versions also, could it be some glitch in
64bit codegen?
>
> ---
> 40bc90: 8b 55 00 mov edx,DWORD PTR [rbp+0x0]
> 40bc93: 89 d6 mov esi,edx
> 40bc95: c1 ee 0d shr esi,0xd
> 40bc98: 40 0f b6 f6 movzx esi,sil
> 40bc9c: 0f b6 34 31 movzx esi,BYTE PTR [rcx+rsi*1]
> 40bca0: 48 83 c5 04 add rbp,0x4
> 40bca4: 0f b6 da movzx ebx,dl
> 40bca7: c1 e6 05 shl esi,0x5
> 40bcaa: c1 ea 08 shr edx,0x8
> 40bcad: 83 e2 1f and edx,0x1f
> 40bcb0: 09 f2 or edx,esi
> 40bcb2: 41 0f b7 14 50 movzx edx,WORD PTR [r8+rdx*2]
> 40bcb7: c1 e2 08 shl edx,0x8
> 40bcba: 09 da or edx,ebx
> 40bcbc: 48 c1 ea 06 shr rdx,0x6
> 40bcc0: 4c 01 ca add rdx,r9
> 40bcc3: 48 8b 14 d1 mov rdx,QWORD PTR [rcx+rdx*8]
> 40bcc7: 48 0f a3 da bt rdx,rbx
> 40bccb: 83 d0 00 adc eax,0x0
> 40bcce: 48 ff cf dec rdi
> 40bcd1: 75 bd jne 40bc90
> ---
This looks quite nice indeed.
>
> The code DMD generates for the lookup, on the other hand, is pretty
> ugly, including several values being spilled to the stack, and also
> doesn't get inlined.
To be honest one of the major problems I see with DMD is a lack of
principled reliable inliner. Currently it may inline or not 2 equivalent
pieces of code just because one of it has early return, or switch
statement or whatever. And it's about to time to start inlining
functions with loops as it's not 90-s anymore.
> [1] The reasons for which I'm focusing on LLVM here are not so much its
> technical qualities as its liberal BSD-like license – if it is good
> enough for Apple, Intel (also a compiler vendor) and their lawyer teams,
> it is probably also for us. The code could even be integrated into
> commercial products such as DMC without problems.
>
I like LLVM, and next to everybody in industry like it. Another example
is AMD. They are building their compiler infrastructure for GPUs on top
of LLVM.
> [2] And for any typos which might undermine my credibility – it is way
> too early in the morning here.
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list