byte and short data types use cases

Cecil Ward cecil at cecilward.com
Fri Jun 9 12:56:20 UTC 2023


On Friday, 9 June 2023 at 11:24:38 UTC, Murloc wrote:
> Hi, I was interested why, for example, `byte` and `short` 
> literals do not have their own unique suffixes (like `L` for 
> `long` or `u` for `unsigned int` literals) and found the 
> following explanation:
>
> - "I guess short literal is not supported solely due to the 
> fact that anything less than `int` will be "promoted" to `int` 
> during evaluation. `int` has the most natural size. This is 
> called integer promotion in C++."
>
> Which raised another question: since objects of types smaller 
> than `int` are promoted to `int` to use integer arithmetic on 
> them anyway, is there any point in using anything of integer 
> type less than `int` other than to limit the range of values 
> that can be assigned to a variable at compile time? Are these 
> data types there because of some historical reasons (maybe 
> `byte` and/or `short` were "natural" for some architectures 
> before)?
>
> People say that there is no advantage for using `byte`/`short` 
> type for integer objects over an int for a single variable, 
> however, as they say, this is not true for arrays, where you 
> can save some memory space by using `byte`/`short` instead of 
> `int`. But isn't any further manipulations with these array 
> objects will produce results of type `int` anyway? Don't you 
> have to cast these objects over and over again after 
> manipulating them to write them back into that array or for 
> some other manipulations with these smaller types objects? Or 
> is this only useful if you're storing some array of constants 
> for reading purposes?
>
> Some people say that these promoting and casting operations in 
> summary may have an even slower overall effect than simply 
> using int, so I'm kind of confused about the use cases of these 
> data types... (I think that my misunderstanding comes from not 
> knowing how things happen at a slightly lower level of 
> abstractions, like which operations require memory allocation, 
> which do not, etc. Maybe some resource recommendations on 
> that?) Thanks!

For me there are two use cases for using byte and short, ubyte 
and ushort.

The first is simply to save memory in a large array or neatly fit 
into a ‘hole’ in a struct, say next to a bool which is also a 
byte. If you have four ubyte variables in a struct and then an 
array of them, then you are getting optimal memory usage. In the 
x86 for example the casting operations for ubyte to uint use 
instructions that have zero added cost compared to a normal uint 
fetch. And casting to a ubyte generates no code at all. So the 
costs of casting in total are zero.

The second use-case is where you need to interface to external 
specifications that deman uint8_t (ubyte), or uint16_t (ushort) 
where I am using the standard definitions from std.stdint. These 
types are the in C. If you are interfacing to externally defined 
struct in data structures in ram or in messages, that’s one 
example. The second example is where you need to interface to 
machine code that has registers or operands of 8-bit or 16-bit 
types. I like to use the stdint types for the purposes of 
documentation as it rams home the point that these are truly 
fixed width types and can not change. (And I do know that in D, 
unlike C, int, long etc are of defined fixed widths. Since C 
doesn’t have those guarantees that’s why the C stdint.h is needed 
in C too.) As well as machine code, we could add other high-level 
languages where interfaces are defined in the other language and 
you have to hope that the other language’s type widths don’t 
change.


More information about the Digitalmars-d-learn mailing list