Mixing operations with signed and unsigned types

Sun Jul 4 10:24:09 PDT 2010

Stewart Gordon:

Sorry for the late reply, I was quite busy. Thank you for your comments, even if I don't agree with some of them :-)

>If it's logical and the program works, it isn't objectively wrong.<

Right. But bug-prone means that often enough people write code that doesn't work.

>Some of us prefer to use unsigned types where the value is semantically unsigned, and know what we're doing.  So any measures to stronghold programmers against using them are going to be a nuisance.<

I have not asked to remove the unsigned types, so you can relax. And replacing lengths/indexes with signed values isn't a way to forbid you to use unsigned values in your programs, it's right the opposite: it's a way to not force me (and many other programmers that want to write simple D non-system programs) to use unsigned values in my code.

D (and all other languages beside ASM) try to push programmers toward safer ways to write code, even types can be seen as restrictions, but a wise programmer knows they are there to help the creation of less buggy programs, etc.

> I can also imagine promoting your mindset leading to edit wars between
> developers declaring an int and then putting
>      assert (qwert >= 0);
> in the class invariant, and those who see this and think it's brain-damaged.

This is quite interesting. You think that using an unsigned type in D is today the same thing than using a signed value + an assert of it not being negative? In the beginning, when I was used to Delphi programming I have done the same, but I have soon found out that was unsafe. Today D unsigned values don't give you a nice overflow error (as I have asked Walter many times) when you try to assign them a number outside their range, they happily wrap around, this causes bugs in programs. So using an unsigned number to denote a value that can't be negative is dangerous and it can be stupid too. In D you need to take a signed value from outside and then assign it to a unsigned value only after you have tested it to be nonnegative.

> True, but that doesn't mean that we should force programmers to use
> signed values for nearly everything.

D wants to be a system language, and I presume system programmers are able to use unsigned values. But D can also be used as application language (as C#) and I presume most usages of D will be of this kind. And in my experience there is a good number of 'application programmers' that have problems with unsigned numbers. Length and array indexes are not something that is used by system programmers only (as the opBinary operator overloading) they are things used often in any kind of programs, even small ones, so making them unsigned will be a trap for many programmers.

I don't care if you use unsigned values in your programs, and I don't want to force you to use signed values in your programs, but I want to be able to avoid unsigned values when I write small non-system D programs, because they introduce complexities and bugs that I can live without.

> But it is all the more reason to fix unsigned op signed to be signed, if
> it is to be allowed at all.  The way it is at the moment, a single
> unsigned value in a formula can force the whole result to be unsigned,
> thereby leading to unexpected results.

I think Walter will not change this, because this way D syntax equal to C syntax does things different from C (there are few exceptions to this D rule, like fixed-sized arrays are passed by value in D and by pointer in C).

So given that this will not change, other solutions need to be found. I have suggested two solutions, that can be used at the same time:
- Introducing run-time integral overflow (as in Delphi and C#, but I think in D two separate switches can be useful, one for signed overflows and one for unsigned overflows);
- and removing a very common source of unsigned values in simple D programs (length/indexes).

> You could make a similar argument the same about integer types
> generally.  People coming from BASIC backgrounds, or new to programming
> generally, are sooner or later going to have some work to do when they
> find that 1/4 != 0.25.  

Some languages are indeed able to represent fractions natively, like Scheme. A "good" high-level language, designed for humans and not for CPUs deserves to act more correctly.

So I agree that's a possible source of problems for newbies. But having just one possible source of "problems" is better than having two possible sources of problems :-)

And in my experience, while somewhat more experienced programmers are quickly able to cope with the lack of native fractions in a language (and I prefer to have two operators to perform divisions, like / and div in Delphi and / and // in Python3, to denote float or integer divisions), they keep having bugs caused by unsigned values combined with C conversion rules. So I think unsigned values cause worse troubles.

> Add to that the surprise that is silent overflow....

Adding optional runtime integral overflows in D is something that I really want. My experience with Delphi has shown me many times they are able to catch some of my bugs. Walter is Just Wrong [TM] about not appreciating them. C# developers are right on this.

>Interfacing file formats.  Simplifying certain conditional expressions. Making code self-documenting.  Maybe others....<

Simplifying certain conditional expressions with unsigned values is cool, but you want to do it only in performance-critical spots of your programs, because they can be tricky and in every other part of your program they are bug-prone premature optimization :-)

Regarding the self-documenting of unsigned values, I have explained that this is true in a language that actually enforces their unsigned nature, but in D they are just traps :-) In a language like Ada you can actually do what you mean, and denote their non-negative nature, this is an example:

http://ideone.com/ViiOB

with Ada.Integer_Text_Io, Ada.Text_Io;
use Ada.Integer_Text_Io, Ada.Text_Io;
procedure Test is
   subtype Small is Integer range 0..99;
   Input : Small;
begin

   loop
      Get(Input);
      if Input = 42 then
         exit;
         Else 
         Put (Input);  new_line;
      end if;
   end loop;

end;

The Small type is user-defined and it can't be negative (or more than 99, in Ada ranges are closed on the right), so if you try to assign 100 or a negative value (as in that example), you receive a run-time error like:

raised CONSTRAINT_ERROR : prog.adb:9 range check failed

This is the right way to enforce a nonnegtive number. In D I will try to create a ranged integer (with run-time overflow errors), and if you don't like to use similar ranged values, then it's better to add things like that assert(qwert >= 0); to your class invariant.

Bye,
bearophile