Software Assurance Reference Dataset

Sun Jul 20 15:29:54 PDT 2014

On 7/20/2014 3:10 PM, Dmitry Olshansky wrote:
>> The computed goto is faster for two reasons, according to the article:
>>
>> 1.The switch does a bit more per iteration because of bounds checking.
>
> Now let's consider proper implementation of thread-code interpreter.
> where *code pointer points to an array of addresses. We've been through this
> before and it turns out switch is slower because of an extra load.
>
> a) Switch does 1 load for opcode, 1 load for the jump table, 1 indirect jump to
> advance
> (not even counting bounds checking of the switch)
>
> b) Threaded-code via (say) computed goto does 1 load of opcode and 1 indirect
> jump, because opcode is an address already (so there is no separate jump table).

True, but I'd like to find a way that this can be done as an optimization.

> I'm certain that forced tail call would work just fine instead of computed goto
> for this scenario. In fact I've measured this with LDC and the results are
> promising but only work with -O2/-O3 (where tail call is optimized). I'd gladly
> dig them up if you are interested.

I'm pretty reluctant to add language features that can be done as optimizations.