stride in slices
Ethan
gooberman at gmail.com
Tue Jun 5 20:07:06 UTC 2018
On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:
> For loops HAVE a direct cpu semantic! Do you doubt this?
...
Right. If you're gonna keep running your mouth off. How about
looking at some disassembly then.
for(auto i=0; i<a.length; i+=strideAmount)
Using ldc -O4 -release for x86_64 processors, the initialiser
translates to:
mov byte ptr [rbp + rcx], 0
The comparison translates to:
cmp r13, rcx
ja .LBB0_2
And the increment and store translates to:
mov byte ptr [rbp + rcx], 0
movsxd rcx, eax
add eax, 3
So. It uses three of the most basic instructions you can think
of: mov, cmp, j<cond>, add.
Now, what might you ask are the instructions that a range
compiles down to when everything is properly inlined?
The initialisation, since it's a function, pulls from the stack.
mov rax, qword ptr [rsp + 16]
movsxd rcx, dword ptr [rsp + 32]
But the comparison looks virtually identical.
cmp rax, rcx
jb .LBB2_4
But how does it do the add? With some register magic.
movsxd rcx, edx
lea edx, [rcx + r9]
Now, what that looks like it's doing to me is combing the pointer
load and index increment in to two those two instructions. One
instruction less than the flat for loop.
In conclusion. The semantics you talk about are literally some of
the most basic instructions in computing; and that escaping the
confines of a for loop for a foreach loop can let the compiler
generate more efficient code than 50-year-old compsci concepts
can.
More information about the Digitalmars-d
mailing list