Integer conversions too pedantic in 64-bit

Don nospam at nospam.com
Thu Feb 17 04:28:47 PST 2011


Russel Winder wrote:
> <minor-rant>
> 
> On Thu, 2011-02-17 at 10:13 +0100, Don wrote:
> [ . . . ]
>> Me too. A word is two bytes. Any other definition seems to be pretty 
>> useless.
> 
> Sounds like people have been living with 8- and 16-bit processors for
> too long.
> 
> A word is the natural length of an integer item in the processor.  It is
> necessarily machine specific.  cf. DEC-10 had 9-bit bytes and 36-bit
> word, IBM 370 has an 8-bit byte and a 32-bit word, though addresses were
> 24-bit.  ix86 follows IBM 8-bit byte and 32-bit word.

Yes, I know. It's true but I think rather useless.
We need a name for an 8 bit quantity, and a 16 bit quantity, and higher 
powers of two. 'byte' is an established name for the first one, even 
though historically there were 9-bit bytes. IMHO 'word' wasn't such a 
bad name for the second one, even though its etomology comes from the 
machine word size of some specific early processors. But the equally 
arbitrary name 'short' has become widely accepted.

> The really interesting question is whether on x86_64 the word is 32-bit
> or 64-bit.

With the rising importance of the SIMD instruction set, you could even 
argue that it is 128 bits in many cases...


>> The whole concept of "machine word" seems very archaic and incorrect to 
>> me anyway. It assumes that the data registers and address registers are 
>> the same size, which is very often not true.
> 
> Machine words are far from archaic, even on the JVM, if you don't know
> the length of the word on the machine you are executing on, how do you
> know the set of values that can be represented?  In floating point
> numbers, if you don't know the length of the word, how do you know the
> accuracy of the computation?

Yes, but they're not necessarily the same number. There is a native size 
for every type of operation, but it's not universal across all operations.

I don't think there's a way you can define "machine word" in a way which 
is terribly useful. By the time you've got something unambiguous and 
well-defined, it doesn't have many interesting properties. It's valid in 
such limited cases that you'd be better off with a clearer name.

> Clearly data registers and address registers can be different lengths,
> it is not the job of a programming language that compiles to native code
> to ignore this and attempt to homogenize things beyond what is
> reasonable.

Agreed, and this is I think what makes the concept of "machine word" not 
very helpful.

> 
> If you are working in native code then word length is a crucial property
> since it can change depending on which processor you compile for.
> 
>> For example, on an 8-bit machine (eg, 6502 or Z80), the accumulator was 
>> only 8 bits, yet size_t was definitely 16 bits.
> 
> The 8051 was only surpassed a couple of years ago by ARMs as the most
> numerous processor on the planet.  8-bit processors may only have had
> 8-bit ALUs -- leading to an hypothesis that the word was 8-bits -- but
> the word length was effectively 16-bit due to the hardware support for
> multi-byte integer operations.

The 6502 was restricted to 8 bits in almost every way. About half of the 
instructions that involved 16 bit quantities would wrap on page 
boundaries. jmp (0x7FF) would do an indirect jump, getting the low word 
from address 0x7FF and the high word from 0x700 !!


>> It's quite plausible that at some time in the future we'll get a machine 
>> with 128-bit registers and data bus, but retaining the 64 bit address 
>> bus. So we could get a size_t which is smaller than the machine word.
>>
>> In summary: size_t is not the machine word.
> 
> Agreed !
> 
> As long as the address bus is less wide than an integer, there are no
> apparent problems using integers as addresses.  The problem comes when
> addresses are wider than integers.  A good statically-typed programming
> language should manage this by having integers and addresses as distinct
> sets.  C and C++ have led people astray.  There should be an appropriate
> set of integer types and an appropriate set of address types and using
> one from the other without active conversion is always going to lead to
> problems.

Indeed.

> 
> Do not be afraid of the word.  Fear leads to anger.  Anger leads to
> hate.  Hate leads to suffering. (*)
> 
> </minor-rant>
> 
> (*) With apologies to Master Yoda (**) for any misquote.
> 
> (**) Or more likely whoever his script writer was.


More information about the Digitalmars-d mailing list