[OT] The Usual Arithmetic Confusions

Fri Feb 4 14:12:05 UTC 2022

On Friday, 4 February 2022 at 04:29:21 UTC, Walter Bright wrote:
> No, then the VRP will emit an error.

No, because you casted it away.

Consider the old code being:

---
struct Thing {
   short a;
}

// somewhere very different

Thing calculate(int a, int b) {
     return Thing(a + b);
}
---

The current rules would require that you put an explicit cast in 
that constructor call. Then, later, Thing gets refactored into 
`int`. It will still compile, with the explicit cast still there, 
now chopping off bits.

The problem with anything requiring explicit casts is once 
they're written, they rarely get unwritten. I tell new users that 
`cast` is a code smell - somethings you need it, but it is 
usually an indication that you're doing something wrong.

But then you do:

short a;
short b = a + 1;

And suddenly the language requires one.

Yes, I know, there's a carry bit that might get truncated. But 
when you're using all `short`, there's probably an understanding 
that this is how it works. It's not really that hard - it's about 
two or three sentences. As long as one understands 2s-complement 
arithmetic.

On the other hand, there might be loss if there's an integer in 
there in some kinds of generic code.

I think a reasonable compromise would be to allow implicit 
conversions down to the biggest type of the input. The VRP can 
apply here on any literals present. Meaning:

short a;
short b = a + 1;

It checks the input:

a = type short
1 = VRP'd down to byte (or bool even)

Biggest type there? short. So it allows implicit conversion down 
to short. then VRP can run to further make it smaller:

byte c = (a&0x7e) + 1; // ok the VRP can see it still fits there, 
so it goes even smaller.

But since the biggest original input fits in a `short`, it allows 
the output to go to `short`, even if there's a carry bit it might 
lose.

On the other hand:

ushort b = a + 65535 + 3;

Nope, the compiler can constant fold that literal and VRP will 
size it to `int` given its value, so explicit cast required there 
to ensure none of the *actual* input is lost.

short a;
short b;
short c = a * b;

I'd allow that. The input is a and b, they're both short, so let 
the output truncate back to short implicitly too. Just like with 
int, there's some understanding that yes, there is a high word 
produced by the multiply, but it might not fit and I don't need 
the compiler nagging me like I'm some kind of ignoramus.

This compromise I think would balance the legitimate safety 
concerns with accidental loss or refactoring changing things (if 
you refactor to ints, now the input type grows and the compiler 
can issue an error again) with the annoying casts almost 
everywhere.

And by removing most the casts, it makes the ones that remain 
stand out more as the potential problems they are.