Should you be able to initialize a float with a char?

max haughton maxhaton at gmail.com
Thu May 19 22:51:07 UTC 2022


On Thursday, 19 May 2022 at 21:55:59 UTC, Walter Bright wrote:
> On 5/19/2022 1:24 PM, H. S. Teoh wrote:
>> IME, gcc and ldc2 are well able to convert the above ?: 
>> expression into
>> the latter, without uglifying the code.  Why are we promoting 
>> (or even
>> allowing) this kind of ugly code just because dmd's optimizer 
>> is so
>> lackluster you have to manually spell things out this way?
>
> See my reply to Steven.
>
> BTW, consider auto-vectorizing compilers. A common 
> characteristic of them is that sometimes a loop looks like it 
> should be vectorized, but the compiler didn't, for reasons that 
> are opaque to users. The compiler then substitutes a slow 
> emulation to give the *appearance* of being vectorized.

Good compilers can actually print a report of why they didn't 
vectorize things. If they couldn't most of the time these days 
it's because the compiler was right and the programmer has a loop 
that the compiler can't reasonably assume is free of dependencies.

https://d.godbolt.org/z/djhMhMj31 has reports enabled from gcc 
and llvm

Intel were the cutting edge for these reports but now Intel C++ 
is basically dead.

These reports aren't that good for instruction selection issues, 
granted.

As an addendum, I would actually contend that most optimizers are 
actually far too aggressive when performing loop optimizations.

https://d.godbolt.org/z/Y99zs9feh See this example. Unless you 
give the compiler a nudge in the right direction (i.e. You can 
make sure you never try to compute a factorial of 100 for 
example), it will generate reams and reams of code.

Unless you are compiling with profile guided optimizations 
everything the compiler does is blind. This isn't just a question 
of locality but the very basics of the compilers optimizations 
e.g. register allocation and spill placement.

> The only way to tell what is happening is to dump the generate 
> assembler. This is especially troublesome you're attempting to 
> write vector code that is portable among various SIMD 
> instruction sets. It doesn't scale, at all.

If you're writing SIMD code without dumping the assembler anyway 
you're not paying enough attention. If you're going to go to all 
that effort you're going to be profiling the code, and any good 
profiler will show you the disassembly alongside. Maybe it 
doesn't scale in some minute sense but in practice I don't think 
it makes that much difference because you have to either do the 
work anyway, or it doesn't matter.

This is still ignoring that instructions sets don't mean all that 
much, it's all about the microarchitecture, which once again will 
probably require different code. For example AMD processors 
present-ish and past have emulated the wider SIMD in terms of 
more numerous smaller execution units.

> Hence D's approach is different. You can write vector code in 
> D. If it won't compile to the target instruction set, it 
> doesn't replace it with emulation. It signals an error. Thus, 
> the user knows if he writes vector code, he gets vector code. 
> It makes it easy for him to use versioning to adjust the shape 
> of the expressions to line up with the vector capabilities of 
> each target.

LDC doesn't do this, GCC does. I don't think it actually matters, 
whereas if you're consuming a library from someone who didn't do 
the SIMD parts properly, it will at very least compile with LDC.

> To sum up, if you want a particular instruction mix in the 
> output stream, a systems programming language must enable 
> expression of that desired mix. It must not rely on 
> undocumented and inconsistent compiler transformations.

I agree, although D is getting massively out of sync with the 
interesting instructions even on X86. The fun stuff is not really 
available unless you use inline asm (or Guillaume's intrinsics 
library).

For the non-x86 world (i.e. the vast majority of all processors 
sold) ARM has NEON but the future will be SVE2, these are 
variable width vector instructions. This isn't impossible to fit 
into the D_SIMD paradigm but will require for example types that 
only have a lower bound on their size.

The RISC-V vector ISA is going in a similar direction.

If I can actually get my hands on some variable-width hardware I 
will write D code for it ("because it's there"), but I haven't 
found anything cheap enough yet.


More information about the Digitalmars-d mailing list