[OT] The Usual Arithmetic Confusions

Sat Jan 29 12:23:28 UTC 2022

On Saturday, 29 January 2022 at 10:39:10 UTC, Ola Fosheim Grøstad 
wrote:
> As for practical problems: In C++ you can still enable signed 
> overflow checks. In D you cannot even do that, because in D all 
> integer operations are defined to be modular.

Yes, this is a problem and C++ is doing a somewhat better job 
here.

Imagine an alternative reality, where processors supported 
efficient modular indexing in arrays and had special instructions 
for this (specialized DSP processors might have this even now in 
some form). So that `a[i % a.length]` has no performance 
disadvantages compared to `a[i]`. My guess is that the D language 
design in this alternative reality would define arrays indexing 
as a modular operation and consider the "memory safety" goal 
perfectly achieved (no out of bounds accesses, yay!). Ignoring 
the fact that landing on a wrong array index is a source of all 
kind of hard to debug problems, leading to wrong computations or 
even security vulnerabilities.

> Interestingly, the main complaint about modular arithmetics in 
> C++ is not about correctness however, but about poor 
> optimizations.

I'm not sure if I can fully agree with that. Correctness is a 
major concern in C++.

> As a result, thoughtfully designed C++ libraries avoid using 
> unsigned integers in interfaces.

My understanding is that the primary source of unsigned types in 
applications is (or at least used to be) the `size_t` type. Which 
exists, because a memory buffer may technically span over more 
than half of the address space, at least on a 32-bit system. API 
functions, which process memory buffers, have to deal with 
`size_t` and this is a headache. But now with 64-bit processors, 
this is less of an issue and buffer sizes can become signed with 
no practical loss of functionality.

> Why are modular arithmetics performing worse than regular 
> arithmetics? It is often much more difficult (in some cases 
> impossible) to optimize computations that are mapped onto a 
> circle than computations that are mapped onto a straight line!

Yes, things are a bit tricky here. Modular arithmetics has some 
really nice math properties: 
https://math.stackexchange.com/questions/27336/associativity-commutativity-and-distributivity-of-modulo-arithmetic

When doing modular arithmetics, an expression `a + b - c` can be 
always safely transformed into `a - c + b`. But if we are dealing 
with potential integer overflow traps, then reordering operations 
in an expression can't be done safely anymore (will `a + b` 
overflow and cause an exception? or will `a - c` underflow and 
cause an exception?).

> This is something that D should fix!!

Do you have a good suggestion?

>> I think that it's only a matter of time until processors start 
>> adding the missing instructions to make this fast.
>
> No. It isn't crazy slow because of the check. It is crazy slow 
> because it prevents optimizations in complex expressions.
>
> In theory you could compute the overflow as a separate 
> expression and do speculative computations, then switch to a 
> slow path on overflow, but that would be more of a high level 
> approach than a system level approach. In low level programming 
> the programmer wants the code to map to machine language 
> instructions without blowing up the code size in ways that are 
> hard to predict. You want some transparency in how the code you 
> write maps to the hardware instructions.

We lost this transparency a long time ago. Compilers are allowed 
to optimize out big parts of expressions. Integer divisions by a 
constant are replaced by multiplications and shifts, etc. 
Functions are inlined, loops are unrolled and/or vectorized.

> To make overflow checks really cast you need a much more 
> advanced type system with constraints, so that the compiler can 
> know what values an integer picked up from the heap can have.

Well, the reality is that this is not just a theoretical 
discussion anymore. Trapping of arithmetic overflows already 
exists in the existing programming languages. And programming 
languages will keep evolving to handle it even better in the 
future.