GDC review process.

Manu turkeyman at gmail.com
Wed Jun 20 03:35:09 PDT 2012


On 20 June 2012 03:58, Walter Bright <newshound2 at digitalmars.com> wrote:

>    Do a grep for "asm" across the druntime library sources. Can you
>> justify all
>>    of that with some other scheme?
>>
>>
>> I think almost all the blocks I just browsed through could be easily
>> written
>> with nothing more than the register alias feature I suggested, and
>> perhaps a
>> couple of opcode intrinsics.
>>
>
> But I see nothing gained by that.


The gain is that by not using IA, the compiler could much better optimise
and inline your code. Your code is likely more readable by more people.
Also, since Iain is proposing removing the inline assembler from GDC, it's
clearly hard to maintain across different compilers. A higher level
language defined construct may be simpler...


And as a bonus, they would also be readable.
>>
>
> I don't agree. The point of IA to me is so I can specify exactly what I
> want. If I wanted to do it at a higher level, I'd use normal D syntax.


In many cases, you need to write a big block of asm to do one single
operation that's not expressible at the higher level... and in my
experience, most of the time, that operation is addressing a register
directly; most commonly, dealing with the stack pointer, or argument
registers direcetly.


I can imagine cases where the
>> optimiser would have more freedom too.
>>
>
> But if I'm writing IA, I want to do it my way. Not the optimizer's way,
> which may or may not be able to give me what I want.
>

I think you typically want to do one very small detail your way, the rest
of the function, let the optimiser make it the best of.
The result is very much comparable to the use of intrinsics in high level
code.


Yes. C has a register keyword, and nobody uses it anymore. The troubles are
>> many, starting with people always "register"ed the wrong variables, and it
>> really didn't work out too well when compilers started doing live range
>> register assignments. It's ignored by modern C compilers, and hasn't been
>> carried forward into other languages.
>>
>
You miss the point of the suggestion; as a mechanism to directly address
particular registers in high level code, allowing you do eliminate many
small asm blocks. C's failing is unrelated, the goal was totally different.


Really? I've never seen that. What about it was fail?
>>
>
> It's actually in DMC, believe it or not. It was a giant failure because
> nobody used it. It was in Borland's TurboC, too. It pretty much just throws
> a wrench into the gears of more sophisticated code generators.
>

I'm not surprised nobody used it in a niche compiler like DMC, especially
when it's not supported by major compilers like GCC or MSC... It's not a
feature of C, so most people wouldn't ever consider it, or even realise
it's possible.

Of course it throws a gear in the works, it's a reasonably complex feature,
but IA blocks themselves throw an equally large (and rather similar) gear
in the works. The most naive implementation could probably do precisely
what IA does, that is, to stop reordering across the IA block.
That should be just as safe when using intrinsics or explicit register
aliasing as it is with inline asm. And that's only a start, I think the
compiler could do better with time.
The compiler doesn't have much opportunity for improvement with IA, unless
the compiler attempts to understand the IA block, which is in a totally
different language, and architecture specific. Well defined high-level
constructs help the compiler with the understanding it needs to do a
good/safe job.
It's the same logic that supports opcode intrinsics, which became almost
universally preferred to IA in appropriate situations, and are an
undeniable success.


   I really don't understand preferring all these rather convoluted
>>    enhancements to avoid something simple and straightforward like the
>> inline
>>    assembler. The use of IA in the D runtime library, for example, has
>> been
>>    quite successful.
>>
>>
>> I agree, IA is useful and has been successful, but it has drawbacks too.
>>   * IA ruins optimisation around the IA block
>>
>
> dmd's optimizer is not so sensitive to that.


How can you safely reorder across an IA block? Is there a well defined
mechanism to determine it's safe?
GCC has been failing at that forever. It takes a very conservative approach.
I guess the main problem is because GCC doesn't attempt to understand the
asm block, it just pastes it in the output.


This one seems trivial, you just need one intrinsic:
>>
>>   size_t reqsize = size * newcapacity;
>>   __jc(&Loverflow);
>>
>
> That's highly risky. The optimizer knows nothing at all about the state of
> the flags register, and does not take into account a dependency on the C
> flag when doing code motion. Nor would the compiler guarantee that the C
> flag is even set by however it chose to do the previous multiply (for
> example, the LEA instruction is often used to do multiplies, which leaves
> the C flag untouched. Oops!). Nothing connects the __jc intrinsic to that
> multiply operation.


True, but you could also perform the multiply explicitly with another
intrinsic.
This reordering problem is perhaps the most difficult issue, but not
necessarily insurmountable. And it's only really relevant where explicit
interaction with the flags are involved.
I suspect it wouldn't be too much trouble to make that intrinsic encode
some information that fuses it with the preceding operation as written in
the source.
Alternatively use a __noreorder {} scope block or something surrounding the
mul and jc..
Another possibility might be to make the intrinsic combine both operations
as a compound: if(__mul_getc(T a, T b, ref in T res)) goto blah; // <-
eliminates the need to take the address of a label
There are lots of different approaches, I'm sure an elegant solution is
possible.


 Although it depends on a '&codeLabel' mechanism to get the label address
>> (GCC
>> supports this in C, I'd love to see this in D too).
>>
>
> Note that supporting such will wind up disabling a lot of the data flow
> analysis, which is not set up to handle unknown edges between basic blocks.
>

No doubt, but it only affects code where that operation appears, which
would be rather rare.


To summarize, I see a lot of complex new features, a significant rewrite of
> the optimizer, and a rewrite of a lot of existing code, and at the end of
> all that we're pretty much at the same state we are at now.
>

I agree, it's not trivial. It was just something to think about.
It's not quite the same place. The examples that have come up here are
relatively trivial, so it doesn't add so much to those. It would add an
awful lot to larger uses of asm, where it's really nice to be able to mix
the explicit pseudo-asm code with regular automatic register assignments,
and use of standard control structures (if/for/etc)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120620/99af1a02/attachment.html>


More information about the Digitalmars-d mailing list