Why do you decode ? (Seriously)

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Aug 2 14:07:00 PDT 2012


On 03-Aug-12 00:40, Artur Skawina wrote:
> On 08/02/12 18:47, Dmitry Olshansky wrote:
>> char[] input = ...;
>> size_t idx = ...;
>> size_t len = stride(input, idx);
>> uint u8word = *cast(uint*)(input.ptr+idx);
>
>> So why do we use dchar and not UTF-8 word, as it's as good as dchar and faster to obtain?
>
> Iff unaligned accesses happen to be legal on the platform _and_ iff doing
> them is faster than the (not that complex) decoding.
>

You read memory either way, suppose you read it byte by byte vs "1 or 2 
words (if unaligned)" at once.

And take a look at std.utf, I'd say it is rather involved.

In any case there is a minimum of:
mask out upper contol bits, shift to proper position or with result 
register [repeat per byte]
return result


Of course, I'm biased by x86 but it is my understanding that unaligned 
support is more or less understood to be a good feature. Arm v6+ seems 
to have it. And I suspect there is a way to recode the above to be more 
word-aligned friendly (e.g. via adding explicit leftover word).

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list