Ascii matters
Sean Kelly
sean at invisibleduck.org
Wed Aug 22 21:13:44 PDT 2012
On Aug 22, 2012, at 8:03 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> On Wednesday, August 22, 2012 19:52:10 Sean Kelly wrote:
>> I'm clearly missing something. ASCII and UTF-8 are compatible. What's
>> stopping you from just processing these as if they were UTF-8 strings?
>
> Range-based functions will treat arrays of char or wchar as forward ranges of
> dchar. Because of the variable length of their code points, they aren't
> considered to have length, be random access, or have slicing and will not
> generally work with range-based functions which require any of those
> operations (though some range-based functions do specialize on strings and use
> those operations where they can based on proper understanding of unicode).
Yeah. I understand why the range-based functions use dchar, but for my own use I generally want to work directly with a char string of UTF-8 so I can slice buffers. Typing these as uchar buffers isn't ideal, but it does work.
> On the other hand, if you have a string that specifically holds ASCII and you
> know that it only holds ASCII, you know that you can safely use length, random
> access, and slicing as if each code unit were a full code point. But the
> range-based functions don't know that your string is guaranteed to be ASCII-
> only, so they continue to treat it as a range of dchar rather than char. The
> solution is to either create a wrapper range whose element type is char or to
> cast the char[] to ubyte[]. And Bearophile wants such a wrapper range to be
> added to Phobos.
Gotcha. Despite it being something I'd use regularly, I wouldn't want this in Phobos because it seems like it could cause maintenance problems. I'd rather explicitly cast to ubyte as a way to flag that I was doing something potentially unsafe.
More information about the Digitalmars-d
mailing list