Major performance problem with std.array.front()

Sarath Kodali sarath at dummy.com
Fri Mar 7 15:13:50 PST 2014


On Friday, 7 March 2014 at 22:35:47 UTC, Sarath Kodali wrote:
>
> +1
> In Indian languages, a character consists of one or more 
> UNICODE code points. For example, in Sanskrit "ddhrya" 
> http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg 
> consists of 7 UNICODE code points. So to search for this char I 
> have to use string search.
>
> - Sarath

Oops, incomplete reply ...

Since a single "alphabet" in Indian languages can contain 
multiple code-points, iterating over single code-points is like 
iterating over char[] for non English European languages. So 
decode is of no use other than decreasing the performance. A raw 
char[] comparison is much faster.

And then there is this "unicode normalization" that makes it very 
difficult for string searches or comparisons.

- Sarath


More information about the Digitalmars-d mailing list