Major performance problem with std.array.front()

Sun Mar 9 06:08:04 PDT 2014

On Friday, 7 March 2014 at 04:11:15 UTC, Nick Sabalausky wrote:
> What about this?:
>
> Anywhere we currently have a front() that decodes, such as your 
> example:
>
>>   @property dchar front(T)(T[] a) @safe pure if 
>> (isNarrowString!(T[]))
>>   {
>>     assert(a.length, "Attempting to fetch the front of an 
>> empty array
>> of " ~
>>            T.stringof);
>>     size_t i = 0;
>>     return decode(a, i);
>>   }
>>
>
> We rip out that front() entirely. The result is *not* 
> technically a range...yet! We could call it a protorange.
>
> Then we provide two functions:
>
> auto decode(someStringProtoRange) {...}
> auto raw(someStringProtoRange) {...}
>
> These convert the protoranges into actual ranges by adding the 
> missing front() function. The 'decode' adds a front() which 
> decodes into dchar, while the 'raw' adds a front() which simply 
> returns the raw underlying type.
>
> I imagine the decode/raw would probably also handle any 
> "length" property (if it exists in the protorange) accordingly.
>
> This way, the user is forced to specify "myStringRange.decode" 
> or "myStringRange.raw" as appropriate, otherwise myStringRange 
> can't be used since it isn't technically a range, only a 
> protorange.
>
> (Naturally, ranges of dchar would always have front, since no 
> decoding is ever needed for them anyway. For these ranges, the 
> decode/raw funcs above would simply be no-ops.)

Strings can be iterated over by code unit, code point, grapheme, 
grapheme cluster (?), words, sentences, lines, paragraphs, and 
potentially other things. Therefore, it makes sense two require 
the same for ranges of dchar, too.

Also, `byCodeUnit` and `byCodePoint` would probably be better 
names than `raw` and `decode`, to much the already existing 
`byGrapheme` in std.uni.