Deprecate implicit conversion between signed and unsigned integers

Fri Feb 14 00:00:47 UTC 2025

On Thursday, 6 February 2025 at 09:10:41 UTC, Walter Bright wrote:
> [I'm not sure why a new thread was created?]
>
> This comes up now and then. It's an attractive idea, and seems 
> obvious. But I've always been against it for multiple reasons.
>
> 1. Pascal solved this issue by not allowing any implicit 
> conversions. The result was casts everywhere, which made the 
> code ugly. I hate ugly code.

Let me guess: Pascal has no value-range propagation?

> 2. Java solve this by not having an unsigned type. People went 
> to great lengths to emulate unsigned behavior. Eventually, the 
> Java people gave up and added it.

Java 23 does not have unsigned types, though. There are only 
operations that essentially reinterpret the bits of signed 
integer types as unsigned integers and do operations on them. 
Signed and unsigned multiplication, division and modulo are 
completely different operations.

> 3. Is `1` a signed int or an unsigned int?

Ideally, it has its own type that implicitly converts to anything 
that can be initialized by the constant. Of course, `typeof()` 
must return something,
there are three options:
- `typeof(1)` is `typeof(1)`, similar to `typeof(null)`
- `typeof(1)` is `__static_integer` (cf. Zig’s `comptime_int`)
- `typeof(1)` is `int`, which makes it indistinguishable from a 
runtime expression.

D chooses the latter. None of those are a bad choice; tradeoffs 
everywhere.

> 4. What happens with `p[i]`? If `p` is the beginning of a 
> memory object, we want `i` to be unsigned. If `p` points to the 
> middle, we want `i` to be signed. What should be the type of `p 
> - q`? signed or unsigned?

Two questions, two answers.

> What happens with `p[i]`?

That’s a vague question. If `p` is a slice, range error if `i` is 
signed and negative. If `p` is a pointer, it’s `*(p + i)` and if 
`i` is signed and negative, so be it. `typeof(p + i)` is 
`typeof(p)`, so there shouldn’t be a problem.

> What should be the type of `p - q`? signed or unsigned?

Signed. If `p` and `q` are compile-time constants, so is `p - q`, 
and if it’s nonnegative, converts to unsigned types.

While it would be annoying for sure, it does make sense to use a 
function for pointer subtraction when one assumes the difference 
to be positive: `unsignedDifference(p, q)` It would assert that 
the result is in fact positive or zero and return a `size_t`. The 
cool thing about it is that if you expect an unsigned result and 
happen to be wrong, you’ll find out quicker than otherwise.

> 5. We rely on 2's complement overflow semantics to get the same 
> behavior if `i` is signed or unsigned, most of the time.

As I see it, 2’s complement for both signed and unsigned 
arithmetic is a straightforward choice D made to keep `@safe` 
useful. If D made any of them UB, it would exclude part of basic 
arithmetic from `@safe` because `@safe` bans every operation that 
*can* introduce UB. It’s essentially why pointer arithmetic is 
banned in `@safe`, since `++p` might push `p` outside an array, 
which is UB. D offers slices as a safe (because checked) 
alternative to pointers.

> 6. Casts are a blunt instrument that impair readability and can 
> cause unexpected behavior when changing a type in a 
> refactoring. High quality code avoids the use of explicit casts 
> as much as possible.

In my experience, when signed and unsigned are mixed, it points 
to a design issue.
I had this experience a couple of times working on an older C++ 
codebase.

> 7. C behavior on this is extremely well known.

Making something valid in C do something it can’t do in C is a 
bad idea and invites bugs, that is true. Making questionable C 
things errors *prima facie* isn’t.

AFAICT, D for the most part sticks to: If it looks like C, it 
behaves like C or doesn’t compile. Banning signed-to-unsigned 
conversions (unless VRP proves it’s okay) simply falls into the 
latter box.

> 8. The Value Range Propagation feature was a brilliant 
> solution, that resolved most issues with implicit signed and 
> unsigned conversions, without causing any problems.

Of course VRP is great. For the most part, it means if an 
implicit conversion compiles, it’s because nothing weird happens, 
no data can be lost, etc. Signed to unsigned conversion breaks 
this expectation that VRP in fact co-created.

> 9. Array bounds checking tends to catch the usual bugs with 
> conflating signed with unsigned. Array bounds checking is a 
> total winner of a feature.

It’s generally good. Almost no-one complains about it.

> Andrei and I went around and around on this, pointing out the 
> contradictions. There was no solution. There is no "correct" 
> answer for integer 2's complement arithmetic.

I don’t really know what that means. Integer types in C and most 
languages derived from it (D included) inherited have this oddity 
that addition and subtraction is 2’s complement, but 
multiplication, division, and modulo are not (`cast(uint)(-10 / 
3)` and `cast(uint)-10 / 3` are different). Mathematically 
speaking, integers in D are neither values modulo 2ⁿ nor a 
section of ℤ.

> Here's what I do:
>
> 1. use unsigned if the declaration should never be negative.
>
> 2. use size_t for all pointer offsets
>
> 3. use ptrdiff_t for deltas of size_t that could go negative
>
> 4. otherwise, use signed
>
> Stick with those and most of the problems will be avoided.

Sounds reasonable.