Should you be able to initialize a float with a char?
max haughton
maxhaton at gmail.com
Fri May 20 04:34:28 UTC 2022
On Friday, 20 May 2022 at 03:42:06 UTC, Walter Bright wrote:
> Manu asked that a report be given in the form of an error
> message. Since it's what he did all day, I gave that a lot of
> weight.
>
> Also, the point was Manu could then adjust the code with
> version statements to write loops that worked best on each
> target, rather than suffer unacceptable degradation from the
> fallback emulations.
>
I think you're talking about writing SIMD code not
autovectorization. The report is *not* an error message, neither
literally in this case nor spiritually, it's telling you what the
compiler was able to infer from your code. Automatic
vectorization is *not* writing code that uses SIMD instructions
directly, they're two different beasts.
Typically the direct-SIMD algorithm is much faster, at the
expense of being orders of magnitude slower to write: The
instruction selection algorithms GCC and LLVM use simply aren't
good enough to exploit all 15 billion instructions Intel have in
their ISA, but they're almost literally hand-beaten to be good at
SPEC benchmarks so many patterns are recognized and optimized
just fine.
>> If you're writing SIMD code without dumping the assembler
>> anyway you're not paying enough attention. If you're going to
>> go to all that effort you're going to be profiling the code,
>> and any good profiler will show you the disassembly alongside.
>> Maybe it doesn't scale in some minute sense but in practice I
>> don't think it makes that much difference because you have to
>> either do the work anyway, or it doesn't matter.
>
> Manu did this all day and I gave a lot of weight to what he
> said would work best for him. If you're writing vector
> operations, for a vector instruction set, the compiler should
> give errors if it cannot do it. Emulation code is not
> acceptable.
It's not an unreasonable thing to do I just don't think it it's
that much of a showstopper either way. If I *really* care about
being right per platform I'm probably going to be checking CPUID
at runtime anyway.
LDC is the compiler people who actually ship performant D code
use and I've never actually seen anyone complain about this.
> I advocate disassembling, too, (remember the -vasm switch?) but
> disassembling and inspecting manually does not scale at all.
You *have* to do it or you are lying to yourself - even if the
compiler was perfect, which they often aren't. When I use VTune I
see a complete breakdown of the disassembly, source code,
pipeline state, memory hierarchy, how much power the CPU used
etc, temperature (Cat blocking the computer's conveniently warm
exhaust?)
This isn't so much about the actual instructions/intrinsics you
end up with , that's just a means to an end, but rather that if
you aren't keeping an eye on the performance effects of each line
you add and where the performance is happening then you aren't
being a good engineer e.g. you can spend too much time working on
the SIMD parts of an algorithm and get distracted from the parts
that are the new bottleneck (the memory hierarchy, also note that
).
Despite this I do think it's still a huge failure of programming
as an industry that it's a site like Compiler Explorer, or a flag
like -vasm, actually needs to exist. This should be something
much more deeply ingrained into our workflows, programming lags
behind more serious forms of engineering when it comes to the
correlation of what we think things do versus what they actually
do.
Aside for anyone reading:
See Sites's classic article/note "It's the memory stupid"
https://www.ardent-tool.com/CPU/docs/MPR/101006.pdf DEC died but
he was right.
>> LDC doesn't do this, GCC does. I don't think it actually
>> matters, whereas if you're consuming a library from someone
>> who didn't do the SIMD parts properly, it will at very least
>> compile with LDC.
>
> At least compiling is not good enough if you're expecting
> vector speed.
You still have "vector speed" in a sense. The emulated SIMD is
still good it's just not optimal, as I was saying previously
there are targets where even though you *have* (say) 256 bit
registers, you actually might want to use 128 bit ones in some
places because newer instructions tend to be emulated (in a
sense) so might not actually be worth the port pressure inside
the processor.
Basically everything has (a lot of) SIMD units these days, so
even this emulated computation will still be pretty fast. You see
SIMD instruction sets included in basically anything for more
than the price of a pint of beer (Sneaky DConf Plug...), e.g. the
Allwinner D1 is a cheapo RISC-V core from China, comes with a
reasonably standard-compliant vector instruction set
implementation. Even microcontrollers.
For anyone interested the core inside the D-1 is open source
https://github.com/T-head-Semi/openc906
More information about the Digitalmars-d
mailing list