Treating the abusive unsigned syndrome

Wed Nov 26 18:20:50 PST 2008

27.11.08 в 03:46 Sean Kelly в своём письме писал(а):

> Andrei Alexandrescu wrote:
>> Sean Kelly wrote:
>>> Don wrote:
>>>>
>>>> Although it would be nice to have a type which was range-limited,  
>>>> 'uint' doesn't do it. Instead, it guarantees the number is between 0  
>>>> and int.max*2+1 inclusive. Allowing mixed operations encourages  
>>>> programmers to focus the benefit of 'the lower bound is zero!' while  
>>>> forgetting that there is an enormous downside ('I'm saying that this  
>>>> could be larger than int.max!')
>>>
>>> This inspired me to think about where I use uint and I realized that I  
>>> don't.  I use size_t for size/length representations (largely because  
>>> sizes can theoretically be >2GB on a 32-bit system), and ubyte for  
>>> bit-level stuff, but that's it.
>>  For the record, I use unsigned types wherever there's a non-negative  
>> number involved (e.g. a count). So I'd be helped by better unsigned  
>> operations.
>
> To be fair, I generally use unsigned numbers for values that are  
> logically always positive.  These just tend to be sizes and counts in my  
> code.
>
>> I wonder how often these super-large arrays do occur on 32-bit systems.  
>> I do have programs that try to allocate as large a contiguous matrix as  
>> possible, but never sat down and tested whether a >2GB chunk was  
>> allocated on the Linux cluster I work on. I'm quite annoyed by this  
>> >2GB issue because it's a very practical and very rare issue in a weird  
>> contrast with a very principled issue (modeling natural numbers).
>
> Yeah, I have no idea how common they are, though my guess would be that  
> they are rather uncommon.  As a library programmer, I simply must assume  
> that they are in use, which is why I use size_t as a matter of course.
>
>
> Sean

If they can be more than 2Gb, why can't they be more than 4GB? It is  
dangerous to assume that they won't, that's why uint is dangerous. You  
exchange one additional bit of information for safety, this is wrong.

Soon enough we won't use uints the same way we don't use ushorts (I should  
have asked if anyone uses ushort these day first, but there is so little  
gain to use  ushort as opposed to short or int that I consider it  
impractical). 64bit era will give us 64bit pointers and 64 bit counters.  
Do you think you will prefer ulong over long for an additional bit? You  
really shoudn't.

My proposal

Short summary:
- Disallow bitwise operations on both signed types and unsigned types,  
allow arithmetic operations
- Discourage usage of unsigned types. Introduce bits8, bits16, bits32 and  
bits64 as a replacement
- Disallow arithmetic operations on bits* types, allow bitwise operations  
on them
- Disallow mixed-type operations (compare, add, sub, mul and div)
- Disallow implicit casts between all types
- Use int and long (or ranged types) for length and indices with runtime  
checks (a.length-- is always dangerous no mater what CT checks you will  
make).
- Add type constructors for int/uint/etc: "auto x = int(int.max + 1);"  
throws at run-time

The two most common uses of uints are:

0) Bitfields or masks, packed values and hexademical constants (bitfields  
later on)
1) Numbers that can't be negative (counters, sizes/lengths etc)

Bitfields

Bitfields are handy, and using of an unsigned type over a signed is surely  
preferable. Most common operations on bitfields are bitwise AND, OR,  
(R/L)SHIFT and XOR. You shouldn't substruct from or add to them, it is an  
error in most cases. This is what new bits8, bits16, bits32 and bits64  
types should be used for:

bits32 argbColor;
int alphaShift = 24; // any type here, actually

// shift
bits32 alphaMask = (0xFF << alphaShift); // 0xFF is of type bits8

auto value2 = value1 & mask; // all 3 are of type bits*

// you can only shift bits, result is in bits, too, i.e. the following is  
incorrect:
int i = -42;
int x = (i << 8); // An error
// 1) can't shift value of type int
// 2) can't assign valus of type bits32 to variable of type int

// ubyte is still handy sometimes (color should belong to [0..255] range)
auto red = (argbColor & alphaMask) >> alphaShift; // result is in bits32,  
use explicit cast to convert it to target data type:

ubyte red = cast(ubyte)((argbColor & alphaMask) >> alphaShift);

// Alternatively:
ubyte alpha = ubyte((argbColor & alphaMask) >> alphaShift);

Type constructor throws an error if source value (which is of type bits32  
in this example) can't be stored in ubyte. This might be a replacement for  
signed/unsigned methods.

int i = 0xFFFFFFFF; // an error, can't convert value of type bits32 to  
variable of type int
int i = int.max + 1; // ok
int i =  int(int.max + 1); // an exception is raised at runtime

int i = 0xABCD - 0xDCBA; // not allowed. Add explicit casts

auto u = cast(uint)0xABCD - cast(uint)0xDCBA; // result type is uint, no  
checks for overflow
auto i = cast(int)0xABCD - cast(int)0xDCBA; // result type is int, no  
checks for overflow

auto e = cast(uint)0xABCD - cast(int)0xDCBA; // an error, can't substruct  
int from uint

// type ctors in action:
auto i = int(cast(int)0xABCD - cast(int)0xDCBA); // result type is int, an  
exception on overflow
auto u = int(cast(uint)0xABCD - cast(uint)0xDCBA); // same here for uint

Non-negative values

Just use int/long. Or some ranged type ([0..short.max], [0..int.max],  
[0..long.max]) could be used as well. A library type, perhaps. Let's call  
it nshort/nint/nlong. It should have the same set of operations as  
short/int/long but makes additional checks. Throws on under- and overflow.

int x = 42;
nint nx = x; // ok
nx = -x; // throws

nx = int.max; // ok
++nx; // throws

nx = 0;
--nx; // throws

nx = 0;
nint ny = 42;

nx = ny; // no checking is done

int y = ny; // no checking is done, either
short s = ny; // error, cast needed

short s = cast(short)ny; // never throws
short s = short(ny); // might throw