Should you be able to initialize a float with a char?

max haughton maxhaton at gmail.com
Fri May 20 04:34:28 UTC 2022


On Friday, 20 May 2022 at 03:42:06 UTC, Walter Bright wrote:
> Manu asked that a report be given in the form of an error 
> message. Since it's what he did all day, I gave that a lot of 
> weight.
>
> Also, the point was Manu could then adjust the code with 
> version statements to write loops that worked best on each 
> target, rather than suffer unacceptable degradation from the 
> fallback emulations.
>

I think you're talking about writing SIMD code not 
autovectorization. The report is *not* an error message, neither 
literally in this case nor spiritually, it's telling you what the 
compiler was able to infer from your code. Automatic 
vectorization is *not* writing code that uses SIMD instructions 
directly, they're two different beasts.

Typically the direct-SIMD algorithm is much faster, at the 
expense of being orders of magnitude slower to write: The 
instruction selection algorithms GCC and LLVM use simply aren't 
good enough to exploit all 15 billion instructions Intel have in 
their ISA, but they're almost literally hand-beaten to be good at 
SPEC benchmarks so many patterns are recognized and optimized 
just fine.

>> If you're writing SIMD code without dumping the assembler 
>> anyway you're not paying enough attention. If you're going to 
>> go to all that effort you're going to be profiling the code, 
>> and any good profiler will show you the disassembly alongside. 
>> Maybe it doesn't scale in some minute sense but in practice I 
>> don't think it makes that much difference because you have to 
>> either do the work anyway, or it doesn't matter.
>
> Manu did this all day and I gave a lot of weight to what he 
> said would work best for him. If you're writing vector 
> operations, for a vector instruction set, the compiler should 
> give errors if it cannot do it. Emulation code is not 
> acceptable.

It's not an unreasonable thing to do I just don't think it it's 
that much of a showstopper either way. If I *really* care about 
being right per platform I'm probably going to be checking CPUID 
at runtime anyway.

LDC is the compiler people who actually ship performant D code 
use and I've never actually seen anyone complain about this.

> I advocate disassembling, too, (remember the -vasm switch?) but 
> disassembling and inspecting manually does not scale at all.

You *have* to do it or you are lying to yourself - even if the 
compiler was perfect, which they often aren't. When I use VTune I 
see a complete breakdown of the disassembly, source code, 
pipeline state, memory hierarchy, how much power the CPU used 
etc, temperature (Cat blocking the computer's conveniently warm 
exhaust?)

This isn't so much about the actual instructions/intrinsics you 
end up with , that's just a means to an end, but rather that if 
you aren't keeping an eye on the performance effects of each line 
you add and where the performance is happening then you aren't 
being a good engineer e.g. you can spend too much time working on 
the SIMD parts of an algorithm and get distracted from the parts 
that are the new bottleneck (the memory hierarchy, also note that 
).

Despite this I do think it's still a huge failure of programming 
as an industry that it's a site like Compiler Explorer, or a flag 
like -vasm, actually needs to exist. This should be something 
much more deeply ingrained into our workflows, programming lags 
behind more serious forms of engineering when it comes to the 
correlation of what we think things do versus what they actually 
do.

Aside for anyone reading:
See Sites's classic article/note "It's the memory stupid" 
https://www.ardent-tool.com/CPU/docs/MPR/101006.pdf DEC died but 
he was right.


>> LDC doesn't do this, GCC does. I don't think it actually 
>> matters, whereas if you're consuming a library from someone 
>> who didn't do the SIMD parts properly, it will at very least 
>> compile with LDC.
>
> At least compiling is not good enough if you're expecting 
> vector speed.

You still have "vector speed" in a sense. The emulated SIMD is 
still good it's just not optimal, as I was saying previously 
there are targets where even though you *have* (say) 256 bit 
registers, you actually might want to use 128 bit ones in some 
places because newer instructions tend to be emulated (in a 
sense) so might not actually be worth the port pressure inside 
the processor.

Basically everything has (a lot of) SIMD units these days, so 
even this emulated computation will still be pretty fast. You see 
SIMD instruction sets included in basically anything for more 
than the price of a pint of beer (Sneaky DConf Plug...), e.g. the 
Allwinner D1 is a cheapo RISC-V core from China, comes with a 
reasonably standard-compliant vector instruction set 
implementation. Even microcontrollers.

For anyone interested the core inside the D-1 is open source 
https://github.com/T-head-Semi/openc906


More information about the Digitalmars-d mailing list