Top 5

Sascha Katzner sorry.no at spam.invalid
Sat Oct 11 12:32:25 PDT 2008


Benji Smith wrote:
> And, btw, you *can't* scan bytewise through a D string to find space
>  characters, because the value '32' can occur as the 
> least-significant-byte in a multi-byte non-whitespace character. Any
>  code that iterates bytewise through a char[] array is fundamentally
> broken.

And here you're wrong. In fact you can do this with every ASCII
character, because it's codepoint is below 128 and therefore it's most
significant bit is always cleared and it is always represented with only
one byte. *Every* other UTF8 codepoint is represented with a byte
sequence with more than one byte and where *every* byte has set it's
most significant bit. If you don't believe me, here is a very good
documentation of the Unicode standard:

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixA#sec3

LLAP,
Sascha



More information about the Digitalmars-d mailing list