Top 5

Benji Smith dlanguage at benjismith.net
Sat Oct 11 12:26:19 PDT 2008


Sergey Gromov wrote:
> Sat, 11 Oct 2008 14:46:55 -0400,
> Benji Smith wrote:
>> And, btw, you *can't* scan bytewise through a D string to find space 
>> characters, because the value '32' can occur as the 
>> least-significant-byte in a multi-byte non-whitespace character. Any 
>> code that iterates bytewise through a char[] array is fundamentally broken.
> 
> You're wrong.  char[] is not MBCS, it's UTF-8.  In UTF-8 any byte which 
> is part of a multi-byte sequence always has its most significant bit 
> set.  You can safely search for any ASCII in UTF-8 sequence as if it 
> were an array of bytes.

Oh yeah. I totally forgot about that. Good point.

--benji



More information about the Digitalmars-d mailing list