Unicode handling comparison
H. S. Teoh
hsteoh at quickfur.ath.cx
Thu Nov 28 10:19:47 PST 2013
On Thu, Nov 28, 2013 at 09:52:08AM -0800, Walter Bright wrote:
> On 11/28/2013 5:24 AM, monarch_dodra wrote:
> >Which operations are you thinking of in std.array that decode
> >when they shouldn't?
>
> front() in std.array looks like:
>
> @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[]))
> {
> assert(a.length, "Attempting to fetch the front of an empty
> array of " ~ T.stringof);
> size_t i = 0;
> return decode(a, i);
> }
>
> So anytime I write a generic algorithm using empty, front, and
> popFront(), it decodes the strings, which is a large pessimization.
OTOH, it is actually correct by default. If it *didn't* decode, things
like std.algorithm.sort and std.range.retro would mangle all your
multibyte UTF-8 characters.
Having said that, though, it would be nice if there were a standard
ASCII string type that didn't decode by default. Always decoding strings
*is* slow, esp. when you already know that it only contains ASCII
characters. Maybe we want something like this:
struct AsciiString {
immutable(ubyte)[] impl;
alias impl this;
// This is so that .front returns char instead of ubyte
@property char front() { return cast(char) impl[0]; }
char opIndex(size_t idx) { ... /* ditto */ }
... // other range methods here
}
AsciiString assumeAscii(string s)
{
return AsciiString(cast(immutable(ubyte)[]) s);
}
T
--
"640K ought to be enough" -- Bill G., 1984.
"The Internet is not a primary goal for PC usage" -- Bill G., 1995.
"Linux has no impact on Microsoft's strategy" -- Bill G., 1999.
More information about the Digitalmars-d
mailing list