Treating the abusive unsigned syndrome
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Nov 27 08:49:02 PST 2008
Don wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> One fear of mine is the reaction of throwing of hands in the air
>>>> "how many integral types are enough???". However, if we're to judge
>>>> by the addition of long long and a slew of typedefs to C99 and
>>>> C++0x, the answer is "plenty". I'd be interested in gaging how
>>>> people feel about adding two (bits64, bits32) or even four (bits64,
>>>> bits32, bits16, and bits8) types as basic types. They'd be bitbags
>>>> with undecided sign ready to be converted to their counterparts of
>>>> decided sign.
>>>
>>> Here I think we have a fundamental disagreement: what is an 'unsigned
>>> int'? There are two disparate ideas:
>>>
>>> (A) You think that it is an approximation to a natural number, ie, a
>>> 'positive int'.
>>> (B) I think that it is a 'number with NO sign'; that is, the sign
>>> depends on context. It may, for example, be part of a larger number.
>>> Thus, I largely agree with the C behaviour -- once you have an
>>> unsigned in a calculation, it's up to the programmer to provide an
>>> interpretation.
>>>
>>> Unfortunately, the two concepts are mashed together in C-family
>>> languages. (B) is the concept supported by the language typing rules,
>>> but usage of (A) is widespread in practice.
>>
>> In fact we are in agreement. C tries to make it usable as both, and
>> partially succeeds by having very lax conversions in all directions.
>> This leads to the occasional puzzling behaviors. I do *want* uint to
>> be an approximation of a natural number, while acknowledging that
>> today it isn't much of that.
>>
>>> If we were going to introduce a slew of new types, I'd want them to
>>> be for 'positive int'/'natural int', 'positive byte', etc.
>>>
>>> Natural int can always be implicitly converted to either int or uint,
>>> with perfect safety. No other conversions are possible without a cast.
>>> Non-negative literals and manifest constants are naturals.
>>>
>>> The rules are:
>>> 1. Anything involving unsigned is unsigned, (same as C).
>>> 2. Else if it contains an integer, it is an integer.
>>> 3. (Now we know all quantities are natural):
>>> If it contains a subtraction, it is an integer [Probably allow
>>> subtraction of compile-time quantities to remain natural, if the
>>> values stay in range; flag an error if an overflow occurs].
>>> 4. Else it is a natural.
>>>
>>>
>>> The reason I think literals and manifest constants are so important
>>> is that they are a significant fraction of the natural numbers in a
>>> program.
>>>
>>> [Just before posting I've discovered that other people have posted
>>> some similar ideas].
>>
>> That sounds encouraging. One problem is that your approach leaves the
>> unsigned mess as it is, so although natural types are a nice addition,
>> they don't bring a complete solution to the table.
>>
>>
>> Andrei
>
> Well, it does make unsigned numbers (case (B)) quite obscure and
> low-level. They could be renamed with uglier names to make this clearer.
> But since in this proposal there are no implicit conversions from uint
> to anything, it's hard to do any damage with the unsigned type which
> results.
> Basically, with any use of unsigned, the compiler says "I don't know if
> this thing even has a meaningful sign!".
>
> Alternatively, we could add rule 0: mixing int and unsigned is illegal.
> But it's OK to mix natural with int, or natural with unsigned.
> I don't like this as much, since it would make most usage of unsigned
> ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make
things much harder for systems-level programs that mix arithmetic and
bit-level operations.
I'm glad there is interest and that quite a few ideas were brought up.
Unfortunately, it looks like all have significant disadvantages.
One compromise solution Walter and I discussed in the past is to only
sever one of the dangerous implicit conversions: int -> uint. Other than
that, it's much like C (everything involving one unsigned is unsigned
and unsigned -> signed is implicit) Let's see where that takes us.
(a) There are fewer situations when a small, reasonable number
implicitly becomes a large, weird numnber.
(b) An exception to (a) is that u1 - u2 is also uint, and that's for the
sake of C compatibility. I'd gladly drop it if I could and leave
operations such as u1 - u2 return a signed number. That assumes the
least and works with small, usual values.
(c) Unlike C, arithmetic and logical operations always return the
tightest type possible, not a 32/64 bit value. For example, byte / int
yields byte and so on.
What do you think?
Andrei
More information about the Digitalmars-d
mailing list