Major performance problem with std.array.front()

Fri Mar 7 15:12:16 PST 2014

On Fri, Mar 07, 2014 at 10:35:46PM +0000, Sarath Kodali wrote:
> On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote:
> >On Friday, 7 March 2014 at 19:57:38 UTC, Andrei Alexandrescu
> >wrote:
[...]
> >>Clearly one might argue that their app has no business dealing
> >>with diacriticals or Asian characters. But that's the typical
> >>provincial view that marred many languages' approach to UTF and
> >>internationalization.
> >
> >So is yours, if you think that making everything magically a dchar
> >is going to solve all problems.
> >
> >The TDPL example only showcases the problem. Yes, it works with
> >Swedish. Now try it again with Sanskrit.
> 
> +1
> In Indian languages, a character consists of one or more UNICODE
> code points. For example, in Sanskrit "ddhrya"
> http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg
> consists of 7 UNICODE code points. So to search for this char I have
> to use string search.
[...]

That's what I've been arguing for. The most general form of character
searching in Unicode requires substring searching, and similarly many
character-based operations on Unicode strings are effectively
substring-based operations, because said "character" may be a multibyte
code point, or, in your case, multiple code points. Since that's the
case, we might as well just forget about the distinction between
"character" and "string", and treat all such operations as substring
operations (even if the operand is supposedly "just 1 character long").

This would allow us to get rid of the hackish auto-decoding of narrow
strings, and thus eliminate the needless overhead of always decoding.

T

-- 
All men are mortal. Socrates is mortal. Therefore all men are Socrates.