Article: Increasing the D Compiler Speed by Over 75%
Walter Bright
newshound2 at digitalmars.com
Fri Aug 2 09:47:25 PDT 2013
On 8/2/2013 6:16 AM, Dmitry Olshansky wrote:
> 31-Jul-2013 22:20, Walter Bright пишет:
>> On 7/31/2013 8:26 AM, Dmitry Olshansky wrote:
>>> Ouch... to boot it's always aligned by word size, so
>>> key % sizeof(size_t) == 0
>>> ...
>>> rendering lower 2-3 bits useless, that would make straight slice lower
>>> bits
>>> approach rather weak :)
>>
>> Yeah, I realized that, too. Gotta shift it right 3 or 4 bits.
>
> And that helped a bit... Anyhow after doing a bit more pervasive integer hash
> power of 2 tables stand up to their promise.
>
> The pull that reaps the minor speed benefit over the original (~2% speed gain!):
> https://github.com/D-Programming-Language/dmd/pull/2436
2% is worth taking.
> Not bad given that _aaGetRValue takes only a fraction of time itself.
>
> I failed to see much of any improvement on Win32 though, allocations are
> dominating the picture.
>
> And sharing the joy of having a nice sampling profiler, here is what AMD
> CodeAnalyst have to say (top X functions by CPU clocks not halted).
>
> Original DMD:
>
> Function CPU clocks DC accesses DC misses
> RTLHeap::Alloc 49410 520 3624
> Obj::ledata 10300 1308 3166
> Obj::fltused 6464 3218 6
> cgcs_term 4018 1328 626
> TemplateInstance::semantic 3362 2396 26
> Obj::byte 3212 506 692
> vsprintf 3030 3060 2
> ScopeDsymbol::search 2780 1592 244
> _pformat 2506 2772 16
> _aaGetRvalue 2134 806 304
> memmove 1904 1084 28
> strlen 1804 486 36
> malloc 1282 786 40
> Parameter::foreach 1240 778 34
> StringTable::search 952 220 42
> MD5Final 918 318
>
> Variation of DMD with pow-2 tables:
>
> Function CPU clocks DC accesses DC misses
> RTLHeap::Alloc 51638 552 3538
> Obj::ledata 9936 1346 3290
> Obj::fltused 7392 2948 6
> cgcs_term 3892 1292 638
> TemplateInstance::semantic 3724 2346 20
> Obj::byte 3280 548 676
> vsprintf 3056 3006 4
> ScopeDsymbol::search 2648 1706 220
> _pformat 2560 2718 26
> memcpy 2014 1122 46
> strlen 1694 494 32
> _aaGetRvalue 1588 658 278
> Parameter::foreach 1266 658 38
> malloc 1198 758 44
> StringTable::search 970 214 24
> MD5Final 866 274 2
>
>
> This underlies the point that DMC RTL allocator is the biggest speed detractor.
> It is "followed" by ledata (could it be due to linear search inside?) and
> surprisingly the tiny Obj::fltused is draining lots of cycles (is it called that
> often?).
It's not fltused() that is taking up time, it is the static function following
it. The sampling profiler you're using is unaware of non-global function names.
More information about the Digitalmars-d-announce
mailing list