Major performance problem with std.array.front()

Abdulhaq alynch4047 at gmail.com
Mon Mar 10 14:03:19 PDT 2014


On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote:
> On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:
>> My app deals with unicode arabic text that is 'out there', and 
>> the UnicodeTM support for Arabic is not that well thought out, 
>> so the data is often (always) inconsistent in terms of 
>> sequencing diacritics etc. Even the code page can vary. 
>> Therefore my code has to cater to various ways that other 
>> developers have sequenced the code points.
>>
>> So, my needs as a 'user' are:
>> * I want to encode all incoming data immediately into unicode, 
>> usually UTF8, if isn't already.
>> * I want to iterate over code points. I don't care about the 
>> raw data.
>> * When I get the length of my string it should be the number 
>> of code points.
>> * When I index my string it should return the nth code point.
>> * When I manipulate my strings I want to work with code points
>> ... you get the drift.
>
> Are you sure that code points is what you want? AFAIK there are 
> lots of diacritics in Arabic, and I believe they are not 
> precomposed with their carrying letters...

I checked the terminology before posting so I'm pretty sure. 
Arabic has a code page for the logical characters, one code point 
for each letter of the alphabet and others for various diacritics.

Because of the 'shaping' each logical character has various 
glyphs, found on other code pages.

Text editing programs tend to store typed Arabic as the user 
entered it, and because there can be more than one diacritic per 
alphabetic letter the sequence varies as to how the user 
sequenced them.


More information about the Digitalmars-d mailing list