Major performance problem with std.array.front()
Andrea Fontana
nospam at example.com
Mon Mar 10 07:05:03 PDT 2014
In italian we need unicode too. We have several accented letters
and often programming languages don't handle utf-8 and other
encoding so well...
In D I never had any problem with this, and I work a lot on text
processing.
So my question: is there any problem I'm missing in D with
unicode support or is just a performance problem on algorithms?
If the problem is performance on algorithms that use .front() but
don't care to understand its data, why don't we add a .rawFront()
property to implement only when make sense and then a "fallback"
like:
auto rawFront(R)(R range) if ( ... isrange ... &&
!__traits(compiles, range.rawFront)) { return range.front; }
In this way on copy() or other algorithms we can use rawFront()
and it's backward compatible with other ranges too.
But I guess I'm missing the point :)
On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:
> On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote:
>> I'm not sure I understood the point of this (long) thread.
>> The main problem is that decode() is called also if not needed?
>>
>
> I'd like to offer up one D 'user' perspective, it's just a
> single data point but perhaps useful. I write applications that
> process Arabic, and I'm thinking about converting one of those
> apps from python to D, for performance reasons.
>
> My app deals with unicode arabic text that is 'out there', and
> the UnicodeTM support for Arabic is not that well thought out,
> so the data is often (always) inconsistent in terms of
> sequencing diacritics etc. Even the code page can vary.
> Therefore my code has to cater to various ways that other
> developers have sequenced the code points.
>
> So, my needs as a 'user' are:
> * I want to encode all incoming data immediately into unicode,
> usually UTF8, if isn't already.
> * I want to iterate over code points. I don't care about the
> raw data.
> * When I get the length of my string it should be the number of
> code points.
> * When I index my string it should return the nth code point.
> * When I manipulate my strings I want to work with code points
> ... you get the drift.
>
> If I want to access the raw data, which I don't, then I'm very
> happy to cast to ubyte etc.
>
> If encode/decode is a performance issue then perhaps there
> could be a cache for recently used strings where the code point
> representation is held.
>
> BTW to answer a question in the thread, yes the data is
> left-to-right and visualised right-to-left.
More information about the Digitalmars-d
mailing list