Ascii matters

Sean Kelly sean at invisibleduck.org
Wed Aug 22 21:13:44 PDT 2012


On Aug 22, 2012, at 8:03 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:

> On Wednesday, August 22, 2012 19:52:10 Sean Kelly wrote:
>> I'm clearly missing something. ASCII and UTF-8 are compatible. What's
>> stopping you from just processing these as if they were UTF-8 strings?
> 
> Range-based functions will treat arrays of char or wchar as forward ranges of 
> dchar. Because of the variable length of their code points, they aren't 
> considered to have length, be random access, or have slicing and will not 
> generally work with range-based functions which require any of those 
> operations (though some range-based functions do specialize on strings and use 
> those operations where they can based on proper understanding of unicode).

Yeah.  I understand why the range-based functions use dchar, but for my own use I generally want to work directly with a char string of UTF-8 so I can slice buffers.  Typing these as uchar buffers isn't ideal, but it does work.

> On the other hand, if you have a string that specifically holds ASCII and you 
> know that it only holds ASCII, you know that you can safely use length, random 
> access, and slicing as if each code unit were a full code point. But the 
> range-based functions don't know that your string is guaranteed to be ASCII-
> only, so they continue to treat it as a range of dchar rather than char. The 
> solution is to either create a wrapper range whose element type is char or to 
> cast the char[] to ubyte[]. And Bearophile wants such a wrapper range to be 
> added to Phobos.

Gotcha.  Despite it being something I'd use regularly, I wouldn't want this in Phobos because it seems like it could cause maintenance problems.  I'd rather explicitly cast to ubyte as a way to flag that I was doing something potentially unsafe.


More information about the Digitalmars-d mailing list