Major performance problem with std.array.front()

Abdulhaq alynch4047 at gmail.com
Mon Mar 10 06:48:43 PDT 2014


On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote:
> I'm not sure I understood the point of this (long) thread.
> The main problem is that decode() is called also if not needed?
>

I'd like to offer up one D 'user' perspective, it's just a single 
data point but perhaps useful. I write applications that process 
Arabic, and I'm thinking about converting one of those apps from 
python to D, for performance reasons.

My app deals with unicode arabic text that is 'out there', and 
the UnicodeTM support for Arabic is not that well thought out, so 
the data is often (always) inconsistent in terms of sequencing 
diacritics etc. Even the code page can vary. Therefore my code 
has to cater to various ways that other developers have sequenced 
the code points.

So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode, 
usually UTF8, if isn't already.
* I want to iterate over code points. I don't care about the raw 
data.
* When I get the length of my string it should be the number of 
code points.
* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.

If I want to access the raw data, which I don't, then I'm very 
happy to cast to ubyte etc.

If encode/decode is a performance issue then perhaps there could 
be a cache for recently used strings where the code point 
representation is held.

BTW to answer a question in the thread, yes the data is 
left-to-right and visualised right-to-left.





More information about the Digitalmars-d mailing list