Major performance problem with std.array.front()
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sun Mar 9 10:16:00 PDT 2014
On 3/9/14, 4:34 AM, Peter Alexander wrote:
> I think this is the main confusion: the belief that iterating by code
> point has utility.
>
> If you care about normalization then neither by code unit, by code
> point, nor by grapheme are correct (except in certain language subsets).
I suspect that code point iteration is the worst as it works only with
ASCII and perchance with ASCII single-byte extensions. Then we have code
unit iteration that works with a larger spectrum of languages. One
question would be how large that spectrum it is. If it's larger than
English, then that would be nice because we would've made progress.
I don't know about normalization beyond discussions in this group, but
as far as I understand from
http://www.unicode.org/faq/normalization.html, normalization would be a
one-step process, after which code point iteration would cover still
more human languages. No? I'm pretty sure it's more complicated than
that, so please illuminate me :o).
> If you don't care about normalization then by code unit is just as good
> as by code point, but you don't need to specialise everywhere in Phobos.
>
> AFAIK, there is only one exception, stuff like s.all!(c => c == 'é'),
> but as Vladimir correctly points out: (a) by code point, this is still
> broken in the face of normalization, and (b) are there any real
> applications that search a string for a specific non-ASCII character?
What happened to counting characters and such?
> To those that think the status quo is better, can you give an example of
> a real-life use case that demonstrates this?
split(ter) comes to mind.
> I do think it's probably too late to change this, but I think there is
> value in at least getting everyone on the same page.
Awesome.
Andrei
More information about the Digitalmars-d
mailing list