Major performance problem with std.array.front()

Sat Mar 8 12:51:06 PST 2014

On Sat, Mar 08, 2014 at 08:38:40PM +0000, Vladimir Panteleev wrote:
> On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu
> wrote:
> >Searching for characters in strings would be difficult to deem
> >inappropriate.
> 
> The notion of "character" exists only in certain writing systems. It
> is thus a flawed practice, and I think it should not be encouraged,
> as it will only make writing truly-international software more
> difficult. A more correct approach is searching for a certain
> substring. If non-exact matching is needed (normalization, case
> insensitivity etc.), then the appropriate solution is to use the
> Unicode algorithms.

+1. Most "character"-based Unicode string operations are actually
*substring* operations, because the notion of "character" is not
universal to every writing system, and doesn't map 1-to-1 to Unicode
code points anyway. I would argue that most instances of code that
perform character-based operations on strings are incorrect, in the
sense that they will fail to correctly process strings in certain
languages.

[...]
> >From experience with C++ I knew (1) had a bad track record, and
> >(2) "generically conservative, specialize for speed" was a
> >successful pattern.
> >
> >What would you have chosen given that context?
> 
> Ideally, we would have the Unicode algorithms in the standard
> library from day 1, and advocated their use throughout the
> documentation.

+1. I came to D expecting this to be the case... and was a little let
down when I discovered the actual state of affairs in std.uni at the
time.  Thankfully, things have improved since, and all those who worked
on that have my gratitude. But it's still not quite there yet.

[...]
> >>So the problem to me is that we're stuck not fixing something that's
> >>horribly broken just because it's broken in a way that people
> >>presumably now expect.
> >
> >Clearly I'm being subjective here but again I'd find it difficult to
> >get convinced we have something horribly broken from the evidence I
> >gathered inside and outside Facebook.
> 
> Have you or anyone you personally know tried to process text in D
> containing a writing system such as Sanskrit's?
[...]

Or more to the point, do you know of any experience that you can share
about code that attempts to process these sorts of strings on a per
character basis? My suspicion is that any code that operates on such
strings, if they have any claim to correctness at all, must be
substring-based, rather than character-based.

T

-- 
I think Debian's doing something wrong, `apt-get install pesticide',
doesn't seem to remove the bugs on my system! -- Mike Dresser