stride in slices

Timon Gehr timon.gehr at gmx.ch
Tue Jun 5 23:56:40 UTC 2018


On 05.06.2018 21:05, DigitalDesigns wrote:
> On Tuesday, 5 June 2018 at 18:46:41 UTC, Timon Gehr wrote:
>> On 05.06.2018 18:50, DigitalDesigns wrote:
>>> With a for loop, it is pretty much a wrapper on internal cpu logic so 
>>> it will be near as fast as possible.
>>
>> This is not even close to being true for modern CPUs. There are a lot 
>> of architectural and micro-architectural details that affect 
>> performance but are not visible or accessible in your for loop. If you 
>> care about performance, you will need to test anyway, as even rather 
>> sophisticated models of CPU performance don't get everything right.
> 
> Those optimizations are not part of the instruction set so are 
> irrelevant. They will occur with ranges too.
> ...

I was responding to claims that for loops are basically a wrapper on 
internal CPU logic and nearly as fast as possible. Both of those claims 
were wrong.

> For loops HAVE a direct cpu semantic! Do you doubt this?
> ...

You'd have to define what that means. (E.g., Google currently shows no 
hits for "direct CPU semantics".)

> 
> Cpu's do not have range semantics. Ranges are layers on top of compiler 
> semantics... you act like they are equivalent, they are not!

I don't understand why you bring this up nor what you think it means.

The compiler takes a program and produces some machine code that has the 
right behavior. Performance is usually not formally specified. In terms 
of resulting behavior, code with explicit for loops and range-based code 
may have identical semantics. Which one executes faster depends on 
internal details of the compiler and the target architecture, and it may 
change over time, e.g. between compiler releases.

> All range 
> semantics must go through the library code then to the compiler then to 
> cpu. For loops of all major systems languages go almost directly to cpu 
> instructions.
> 
> for(int i = 0; i < N; i++)
> 
> translates in to either increment and loop or jump instructions.
> ...

Sure, or whatever else the compiler decides to do. It might even be 
translated into a memcpy call. Even if you want to restrict yourself to 
use only for loops, my point stands. Write maintainable code by default 
and let the compiler do what it does. Then optimize further in those 
cases where the resulting code is actually too slow. Test for 
performance regressions.

> There is absolutely no reason why any decent compiler would not use what 
> the cpu has to offer. For loops are language semantics, Ranges are 
> library semantics.

Not really. Also, irrelevant.

> To pretend they are equivalent is wrong and no amount 
> of justifying will make them the same.

Again, I don't think this point is part of this discussion.

> I actually do not know even any 
> commercial viable cpu exists without loop semantics.

What does it mean for a CPU to have "loop semantics"? CPUs typically 
have an instruction pointer register and possibly some built-in 
instructions to manipulate said instruction pointer. x86 has some 
built-in loop instructions, but I think they are just there for legacy 
support and not actually something you want to use in performant code.

> I also no of no 
> commercially viable compiler that does not wrap those instructions in a 
> for loop(or while, or whatever) like syntax that almost maps directly to 
> the cpu instructions.
> ...

The compiler takes your for loop and generates some machine code. I 
don't think there is a "commercially viable" compiler that does not 
sometimes do things that are not direct. And even then, there is no very 
simple mapping from CPU instructions to observed performance, so the 
entire point is a bit moot.

>> Also, it is often not necessary to be "as fast as possible". It is 
>> usually more helpful to figure out where the bottleneck is for your 
>> code and concentrate optimization effort there, which you can do more 
>> effectively if you can save time and effort for the remaining parts of 
>> your program by writing simple and obviously correct range-based code, 
>> which often will be fast as well.
> 
> It's also often not necessary to be "as slow as possible".

This seems to be quoting an imaginary person. My point is that to get 
even faster code, you need to spend effort and often get lower 
maintainability. This is not always a good trade-off, in particular if 
the optimization does not improve performance a lot and/or the code in 
question is not executed very often.

> I'm not 
> asking for about generalities but specifics. It's great to make 
> generalizations about how things should be but I would like to know how 
> they are.

That's a bit unspecific.

> Maybe in theory ranges could be more optimal than other 
> semantics but theory never equals practice.
> 

I don't know who this is addressed to. My point was entirely practical.


More information about the Digitalmars-d mailing list