Top 5
Benji Smith
dlanguage at benjismith.net
Sat Oct 11 13:14:26 PDT 2008
Sascha Katzner wrote:
> Benji Smith wrote:
>> And, btw, you *can't* scan bytewise through a D string to find space
>> characters, because the value '32' can occur as the
>> least-significant-byte in a multi-byte non-whitespace character. Any
>> code that iterates bytewise through a char[] array is fundamentally
>> broken.
>
> And here you're wrong. In fact you can do this with every ASCII
> character, because it's codepoint is below 128 and therefore it's most
> significant bit is always cleared and it is always represented with only
> one byte. *Every* other UTF8 codepoint is represented with a byte
> sequence with more than one byte and where *every* byte has set it's
> most significant bit. If you don't believe me, here is a very good
> documentation of the Unicode standard:
>
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixA#sec3
>
>
> LLAP,
> Sascha
Yeah. I have been schooled :(
--benji
More information about the Digitalmars-d
mailing list