Integer conversions too pedantic in 64-bit

Kevin Bealer kevindangerbealer at removedanger.gmail.com
Wed Feb 16 20:19:18 PST 2011


== Quote from spir (denis.spir at gmail.com)'s article
> On 02/16/2011 03:07 AM, Jonathan M Davis wrote:
> > On Tuesday, February 15, 2011 15:13:33 spir wrote:
> >> On 02/15/2011 11:24 PM, Jonathan M Davis wrote:
> >>> Is there some low level reason why size_t should be signed or something
> >>> I'm completely missing?
> >>
> >> My personal issue with unsigned ints in general as implemented in C-like
> >> languages is that the range of non-negative signed integers is half of the
> >> range of corresponding unsigned integers (for same size).
> >> * practically: known issues, and bugs if not checked by the language
> >> * conceptually: contradicts the "obvious" idea that unsigned (aka naturals)
> >> is a subset of signed (aka integers)
> >
> > It's inevitable in any systems language. What are you going to do, throw away a
> > bit for unsigned integers? That's not acceptable for a systems language. On some
> > level, you must live with the fact that you're running code on a specific machine
> > with a specific set of constraints. Trying to do otherwise will pretty much
> > always harm efficiency. True, there are common bugs that might be better
> > prevented, but part of it ultimately comes down to the programmer having some
> > clue as to what they're doing. On some level, we want to prevent common bugs,
> > but the programmer can't have their hand held all the time either.
> I cannot prove it, but I really think you're wrong on that.
> First, the question of 1 bit. Think at this -- speaking of 64 bit size:
> * 99.999% of all uses of unsigned fit under 2^63
> * To benefit from the last bit, you must have the need to store a value 2^63 <=
> v < 2^64
> * Not only this, you must step on a case where /any/ possible value for v
> (depending on execution data) could be >= 2^63, but /all/ possible values for v
> are guaranteed < 2^64
> This can only be a very small fraction of cases where your value does not fit
> in 63 bits, don't you think. Has it ever happened to you (even in 32 bits)?
> Something like: "what a luck! this value would not (always) fit in 31 bits, but
> (due to this constraint), I can be sure it will fit in 32 bits (always,
> whatever input data it depends on).
> In fact, n bits do the job because (1) nearly all unsigned values are very
> small (2) the size used at a time covers the memory range at the same time.
> Upon efficiency, if unsigned is not a subset of signed, then at a low level you
> may be forced to add checks in numerous utility routines, the kind constantly
> used, everywhere one type may play with the other. I'm not sure where the gain is.
> Upon correctness, intuitively I guess (just a wild guess indeed) if unigned
> values form a subset of signed ones programmers will more easily reason
> correctly about them.
> Now, I perfectly understand the "sacrifice" of one bit sounds like a sacrilege ;-)
> (*)
> Denis
> (*) But you know, when as a young guy you have coded for 8 & 16-bit machines,
> having 63 or 64...

If you write low level code, it happens all the time.  For example, you can copy
memory areas quickly on some machines by treating them as arrays of "long" and
copying the values -- which requires the upper bit to be preserved.

Or you compute a 64 bit hash value using an algorithm that is part of some
standard protocol.  Oops -- requires an unsigned 64 bit number, the signed version
would produce the wrong result.  And since the standard expects normal behaving
int64's you are stuck -- you'd have to write a little class to simulate unsigned
64 bit math.  E.g. a library that computes md5 sums.

Not to mention all the code that uses 64 bit numbers as bit fields where the
different bits or sets of bits are really subfields of the total range of values.

What you are saying is true of high level code that models real life -- if the
value is someone's salary or the number of toasters they are buying from a store
you are probably fine -- but a lot of low level software (ipv4 stacks, video
encoders, databases, etc) are based on designs that require numbers to behave a
certain way, and losing a bit is going to be a pain.

I've run into this with Java, which lacks unsigned types, and once you run into a
case that needs that extra bit it gets annoying right quick.

Kevin


More information about the Digitalmars-d mailing list