Top 5

Benji Smith dlanguage at benjismith.net
Sat Oct 11 13:14:26 PDT 2008


Sascha Katzner wrote:
> Benji Smith wrote:
>> And, btw, you *can't* scan bytewise through a D string to find space
>>  characters, because the value '32' can occur as the 
>> least-significant-byte in a multi-byte non-whitespace character. Any
>>  code that iterates bytewise through a char[] array is fundamentally
>> broken.
> 
> And here you're wrong. In fact you can do this with every ASCII
> character, because it's codepoint is below 128 and therefore it's most
> significant bit is always cleared and it is always represented with only
> one byte. *Every* other UTF8 codepoint is represented with a byte
> sequence with more than one byte and where *every* byte has set it's
> most significant bit. If you don't believe me, here is a very good
> documentation of the Unicode standard:
> 
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixA#sec3 
> 
> 
> LLAP,
> Sascha

Yeah. I have been schooled :(

--benji



More information about the Digitalmars-d mailing list