byte and short data types use cases

H. S. Teoh hsteoh at qfbox.info
Sun Jun 11 00:05:52 UTC 2023


On Sat, Jun 10, 2023 at 09:58:12PM +0000, Cecil Ward via Digitalmars-d-learn wrote:
> On Friday, 9 June 2023 at 15:07:54 UTC, Murloc wrote:
[...]
> > So you can optimize memory usage by using arrays of things smaller
> > than `int` if these are enough for your purposes, but what about
> > using these instead of single variables, for example as an iterator
> > in a loop, if range of such a data type is enough for me? Is there
> > any advantages on doing that?
> 
> A couple of other important use-cases came to me. The first one is
> unicode which has three main representations, utf-8 which is a stream
> of bytes each character can be several bytes, utf-16 where a character
> can be one or rarely two 16-bit words, and utf32 - a stream of 32-bit
> words, one per character. The simplicity of the latter is a huge deal
> in speed efficiency, but utf32 takes up almost four times as memory as
> utf-8 for western european languages like english or french. The
> four-to-one ratio means that the processor has to pull in four times
> the amount of memory so that’s a slowdown, but on the other hand it is
> processing the same amount of characters whichever way you look at it,
> and in utf8 the cpu is having to parse more bytes than characters
> unless the text is entirely ASCII-like.
[...]

On contemporary machines, the CPU is so fast that memory access is a
much bigger bottleneck than processing speed. So unless an operation is
being run hundreds of thousands of times, you're not likely to notice
the difference. OTOH, accessing memory is slow (that's why the memory
cache hierarchy exists). So utf8 is actually advantageous here: it fits
in a smaller space, so it's faster to fetch from memory; more of it can
fit in the CPU cache, so less DRAM roundtrips are needed. Which is
faster.  Yes you need extra processing because of the variable-width
encoding, but it happens mostly inside the CPU, which is fast enough
that it generally outstrips the memory roundtrip overhead. So unless
you're doing something *really* complex with the utf8 data, it's an
overall win in terms of performance. The CPU gets to do what it's good
at -- running complex code -- and the memory cache gets to do what it's
good at: minimizing the amount of slow DRAM roundtrips.


T

-- 
It said to install Windows 2000 or better, so I installed Linux instead.


More information about the Digitalmars-d-learn mailing list