Major performance problem with std.array.front()
Sarath Kodali
sarath at dummy.com
Fri Mar 7 15:13:50 PST 2014
On Friday, 7 March 2014 at 22:35:47 UTC, Sarath Kodali wrote:
>
> +1
> In Indian languages, a character consists of one or more
> UNICODE code points. For example, in Sanskrit "ddhrya"
> http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg
> consists of 7 UNICODE code points. So to search for this char I
> have to use string search.
>
> - Sarath
Oops, incomplete reply ...
Since a single "alphabet" in Indian languages can contain
multiple code-points, iterating over single code-points is like
iterating over char[] for non English European languages. So
decode is of no use other than decreasing the performance. A raw
char[] comparison is much faster.
And then there is this "unicode normalization" that makes it very
difficult for string searches or comparisons.
- Sarath
More information about the Digitalmars-d
mailing list