GDC review process.

Wed Jun 20 05:51:40 PDT 2012

On 20 June 2012 14:44, Don Clugston <dac at nospam.com> wrote:

> On 20/06/12 13:04, Manu wrote:
>
>> On 20 June 2012 13:51, Don Clugston <dac at nospam.com
>>
>> <mailto:dac at nospam.com>> wrote:
>>
>>    On 19/06/12 20:19, Iain Buclaw wrote:
>>
>>        Hi,
>>
>>        Had round one of the code review process, so I'm going to post
>>        the main
>>        issues here that most affect D users / the platforms they want
>>        to run on
>>        / the compiler version they want to use.
>>
>>
>>
>>        1) D Inline Asm and naked function support is raising far too
>>        many alarm
>>        bells. So would just be easier to remove it and avoid all the other
>>        comments on why we need middle-end and backend headers in gdc.
>>
>>
>>    You seem to be conflating a couple of unrelated issues here.
>>    One is the calling convention. The other is inline asm.
>>
>>    Comments in the thread about "asm is mostly used for short things
>>    which get inlined" leave me completely baffled, as it is completely
>>    wrong.
>>
>>    There are two uses for asm, and they are very different:
>>    (1) Functionality. This happens when there are gaps in the language,
>>    and you get an abstraction inversion. You can address these with
>>    intrinsics.
>>    (2) Speed. High-speed, all-asm functions. These _always_ include a
>> loop.
>>
>>
>>    You seem to be focusing on (1), but case (2) is completely different.
>>
>>    Case (2) cannot be replaced with intrinsics. For example, you can't
>>    write asm code using MSVC intrinsics (because the compiler rewrites
>>    your code).
>>    Currently, D is the best way to write (2). It is much, much better
>>    than an external assembler.
>>
>>
>> Case 1 has no alternative to inline asm. I've thrown out some crazy
>> ideas to think about (but nobody seems to like them). I still think it
>> could be addressed though.
>>
>> Case 2; I'm not convinced. These such long functions are the type I'm
>> generally interested in aswell, and have the most experience with. But
>> in my experience, they're almost always best written with intrinsics.
>> If they're small enough to be inlined, then you can't afford not to use
>> intrinsics. If they are truly big functions, then you begin to sacrifice
>> readability and maintain-ability, and certainly limit the number of
>> programmers that can maintain the code.
>>
>
> I don't agree with that. In the situations I'm used to, using intrinsics
> would not make it easier to read, and would definitely not make it easier
> to maintain. I find it inconceivable that somebody could understand the
> processor well enough to maintain the code, and yet not understand asm.

These functions of yours are 100% asm, that's not really what I would
usually call 'inline asm'. That's really just 'asm' :)
I think you've just illustrated one of my key points actually; that is that
you can't just insert small inline asm blocks within regular code, the
optimiser can't deal with it in most cases, so inevitably, the entire
function becomes asm from start to end.

I find I can typically produce equivalent code using carefully crafted
intrinsics within regular C language structures. Also, often enough, the
code outside the hot loop can be written in normal C for readability, since
it barely affects performance, and trivial setup code will usually optimise
perfectly anyway.

You're correct that a person 'maintaining' such code, who doesn't have such
a thorough understanding of the codegen may ruin it's perfectly tuned
efficiency. This may be the case, but in a commercial coding environment,
where a build MUST be delivered yesterday, the guy that understands it is
on holiday, and you need to tweak the behaviour immediately, this is a much
safer position to be in.
This is a very real scenario. I can't afford to ignore this practical
reality.

I might have a go at compiling the regular D code tonight, and seeing if I
can produce identical assembly. I haven't tried this so much with x86 as I
have with RISC architectures, which have much more predictable codegen.

I rarely fail to produce identical code with intrinsics to that which I
>> would write with hand written asm. The flags are always the biggest
>> challenge, as discussed prior in this thread. I think that could be
>> addressed with better intrinsics.
>>
>
> Again, look at std.internal.math.BiguintX86. There are many cases there
> where you can swap two instructions, and the code will still produce the
> correct result, but it will be 30% slower.
>

But that's precisely the sort of thing optimisers/schedulers are best at.
Can you point at a particular example where that is the case, that the
scheduler would get it wrong if left to its own ordering algorithm?
The opcode tables should have thorough information about the opcode timings
and latencies. The only thing that I find usually trips it up is not having
knowledge of the probability of the data being in nearby cache. If it has 2
loads, and one is less likely to be in cache, it should be scheduled
earlier.

As a side question, x86 architectures perform wildly differently from each
other. How do you reliably say some block of hand written x86 code is the
best possible code on all available processors?
Do you just benchmark on a suite of common processors available at the
time? I can imagine the opcode timing tables, which are presumably rather
different for every cpu, could easily feed wrong data to the codegen...

I think that the SIMD case gives you a misleading impression, because on
> x86 they are very easy to schedule (they nearly all take the same number of
> cycles, etc). So it's not hard for the compiler to do a good job of it.
>

True, but it's one of the most common usage scenarios, so it can't be
ignored. Some other case studies I feel close to are hardware emulation,
software rasterisation, particles, fluid dynamics, rigid body dynamics,
FFT's, and audio signal processing. In each, the only time I rarely need
inline asm, usually only when there is a hole in the high level language,
as you said earlier. I find this typically surfaces when needing to
interact with the flags regs directly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120620/f25447b3/attachment.html>