Software Assurance Reference Dataset

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Wed Jul 23 13:06:37 PDT 2014


21-Jul-2014 02:29, Walter Bright пишет:
> On 7/20/2014 3:10 PM, Dmitry Olshansky wrote:
>>> The computed goto is faster for two reasons, according to the article:
>>>
>>> 1.The switch does a bit more per iteration because of bounds checking.
>>
>> Now let's consider proper implementation of thread-code interpreter.
>> where *code pointer points to an array of addresses. We've been
>> through this
>> before and it turns out switch is slower because of an extra load.
>>
>> a) Switch does 1 load for opcode, 1 load for the jump table, 1
>> indirect jump to
>> advance
>> (not even counting bounds checking of the switch)
>>
>> b) Threaded-code via (say) computed goto does 1 load of opcode and 1
>> indirect
>> jump, because opcode is an address already (so there is no separate
>> jump table).
>
> True, but I'd like to find a way that this can be done as an optimization.
>
I found a way but that relies on tail-call optimization, otherwise it 
would overflow stack. I would rather find some way that works without -O 
flag.

In fact it brings another unrelated problem with Phobos: any 
template-heavy libraries have amazingly awful speeds w/o inlining & 
optimization enabled _by the client_. It should be the same with C++ though.

>> I'm certain that forced tail call would work just fine instead of
>> computed goto
>> for this scenario. In fact I've measured this with LDC and the results
>> are
>> promising but only work with -O2/-O3 (where tail call is optimized).
>> I'd gladly
>> dig them up if you are interested.
>
> I'm pretty reluctant to add language features that can be done as
> optimizations.

The point is - software that only works in release build is kind of hard 
to develop, even more so with libraries. Thus I'm in opposition to 
labeling such things as optimization when they, in fact, change semantics.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list