M1 10x faster than Intel at integral division, throughput one 64-bit divide in two cycles

Ola Fosheim Grøstad ola.fosheim.grostad at gmail.com
Thu May 13 22:40:06 UTC 2021


On Thursday, 13 May 2021 at 12:06:01 UTC, Witold Baryluk wrote:
> Next time, exercise more critical thinking when reading 
> "benchmark" claims.

Indeed, proper benchmarks use application suites, not shoehorned 
synthetic garble... Besides, most performance sensitive code does 
not use division much if the programmers know what they are 
doing. And in this "benchmark" the division could've been moved 
out of the inner loop by a less-than-braindead compiler.

Looks like Intel is releasing a Clang based C++ compiler with 
OpenMP offload to Intel GPUs... Wonder if anyone knows anything 
about it?





More information about the Digitalmars-d mailing list