Unicode handling comparison

Thu Nov 28 10:19:47 PST 2013

On Thu, Nov 28, 2013 at 09:52:08AM -0800, Walter Bright wrote:
> On 11/28/2013 5:24 AM, monarch_dodra wrote:
> >Which operations are you thinking of in std.array that decode
> >when they shouldn't?
> 
> front() in std.array looks like:
> 
> @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[]))
> {
>     assert(a.length, "Attempting to fetch the front of an empty
> array of " ~ T.stringof);
>     size_t i = 0;
>     return decode(a, i);
> }
> 
> So anytime I write a generic algorithm using empty, front, and
> popFront(), it decodes the strings, which is a large pessimization.

OTOH, it is actually correct by default. If it *didn't* decode, things
like std.algorithm.sort and std.range.retro would mangle all your
multibyte UTF-8 characters.

Having said that, though, it would be nice if there were a standard
ASCII string type that didn't decode by default. Always decoding strings
*is* slow, esp. when you already know that it only contains ASCII
characters. Maybe we want something like this:

	struct AsciiString {
		immutable(ubyte)[] impl;
		alias impl this;

		// This is so that .front returns char instead of ubyte
		@property char front() { return cast(char) impl[0]; }

		char opIndex(size_t idx) { ... /* ditto */ }

		... // other range methods here
	}

	AsciiString assumeAscii(string s)
	{
		return AsciiString(cast(immutable(ubyte)[]) s);
	}

T

-- 
"640K ought to be enough" -- Bill G., 1984.
"The Internet is not a primary goal for PC usage" -- Bill G., 1995.
"Linux has no impact on Microsoft's strategy" -- Bill G., 1999.