Time to move std.experimental.checkedint to std.checkedint ?

Wed Mar 31 07:13:07 UTC 2021

On 3/30/2021 11:54 PM, Vladimir Panteleev wrote:
> On Wednesday, 31 March 2021 at 06:34:04 UTC, Walter Bright wrote:
>> On 3/30/2021 10:30 PM, Vladimir Panteleev wrote:
>>> On Wednesday, 31 March 2021 at 05:25:48 UTC, Walter Bright wrote:
>>>> It's a win because it uses the address decoder logic which is separate from 
>>>> the arithmetic logic unit. This enables it to be done in parallel with the ALU.
>>>
>>> Is this still true for modern CPUs?
>>
>> See https://www.agner.org/optimize/optimizing_assembly.pdf page 135.
> 
> Thanks!
> 
> It also says that LEA may be slower than ADD on some CPUs.

Slower than ADD, but not slower than multiple ADDs. DMD does not replace a mere 
ADD with LEA. If you also look at how LEA is used in the various examples of 
optimized code in the pdf, well, he uses it a lot.

 > some CPUs

Code gen is generally targeted at generating code that works well on most machines.

>> If you use a register that needs to be saved on the stack, it's going to cost.
> Sure, but why would you do that?

To map as many locals into registers as possible.

> If I'm reading the ABI spec correctly, almost 
> all registers belong to the callee, and don't need to be saved/restored, and 
> there's probably little reason to call a function in the middle of such a 
> computation and therefore save the interim value on the stack.

All I can say is code gen is never that simple. There are just too many rules 
that conflict. The combinatorial explosion means some heuristics are relied on 
that produce better results most of the time. I suppose a good AI research 
project would be to train an AI to produce better overall patterns.

But, in general,

1. LEA is faster for more than one operation
2. using fewer registers is better
3. getting locals into registers is better
4. generating fewer instructions is better
5. generating shorter instructions is better
6. jumpless code is better

None of these are *always* true. And Intel/AMD change the rules slightly with 
every new processor.

As for overflow checks, I am not going to post benchmarks because everyone picks 
at them. Every benchmark posted here by check proponents shows that overflow 
checks are slower. The Rust team apparently poured a lot of effort into overflow 
checks, and ultimately failed, as in the checks are turned off in release code. 
I don't see much hope in replicating their efforts.

And, once again, I reiterate that D *does* have some overflow checks that are 
done at compile time (i.e. are free) in the form of integral promotions and 
Value Range Propagation, neither of which are part of Zig or Rust.