M1 10x faster than Intel at integral division, throughput one 64-bit divide in two cycles

Max Haughton maxhaton at gmail.com
Thu May 13 04:52:07 UTC 2021


On Thursday, 13 May 2021 at 01:59:15 UTC, Andrei Alexandrescu 
wrote:
> https://www.reddit.com/r/programming/comments/nawerv/benchmarking_division_and_libdivide_on_apple_m1/
>
> Integral division is the strongest arithmetic operation.
>
> I have a friend who knows some M1 internals. He said it's 
> really Star Trek stuff.
>
> This will seriously challenge other CPU producers.
>
> What perspectives do we have to run the compiler on M1 and 
> produce M1 code?

It's already winning let alone challenging, although consider 
just how fucking enormous the transistor budget is on the M1 on a 
per-core basis (i.e. from what is known in public, the M1 doesn't 
really have that much magic to it but is rather an extremely wide 
- where it really matters - iteration of what already works 
elsewhere in the industry, combined with no X86 tax on desktop 
for the first time.). Intel's process engineers completely 
dropped the ball, so the M1 is on a process something like 4-5 
*x* denser than Intel 14nm.

Someone mentioned on hackernews that Intel improved the ThisXeon 
+ 1 integer division capabilities also, would be worth 
benchmarking - although expecting monster SPECint numbers from a 
28 core Xeon is probably missing the point.

Someone on the discord has an M1, D already works fine 
apparently, I'm aiming to get a blog post out of it.

The GCC project has M1 hardware and should apparently be getting 
support soon-ish. Apple don't like upstreaming their backends 
from what I can tell, so it could be a while before they get 
tuned much.

Apple also haven't published anything along the lines of an 
optimization manual for M1 so I guess we'll find out via osmosis 
what it's really capable of as times goes on - I think it's more 
likely Apple get the Microsoft hidden-api treatment than actually 
go public on some of the extensions they have made to the ARM ISA 
- both in new instructions and in the form of an old trick SPARC 
had which basically turns TSO on underneath a program to aid X86 
emulation.


More information about the Digitalmars-d mailing list