Ascii matters

Wed Aug 22 20:03:48 PDT 2012

On Wednesday, August 22, 2012 19:52:10 Sean Kelly wrote:
> I'm clearly missing something. ASCII and UTF-8 are compatible. What's
> stopping you from just processing these as if they were UTF-8 strings?

Range-based functions will treat arrays of char or wchar as forward ranges of 
dchar. Because of the variable length of their code points, they aren't 
considered to have length, be random access, or have slicing and will not 
generally work with range-based functions which require any of those 
operations (though some range-based functions do specialize on strings and use 
those operations where they can based on proper understanding of unicode).

On the other hand, if you have a string that specifically holds ASCII and you 
know that it only holds ASCII, you know that you can safely use length, random 
access, and slicing as if each code unit were a full code point. But the 
range-based functions don't know that your string is guaranteed to be ASCII-
only, so they continue to treat it as a range of dchar rather than char. The 
solution is to either create a wrapper range whose element type is char or to 
cast the char[] to ubyte[]. And Bearophile wants such a wrapper range to be 
added to Phobos.

- Jonathan M Davis