Major performance problem with std.array.front()

monarch_dodra monarchdodra at gmail.com
Sun Mar 9 00:32:08 PST 2014


On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu 
wrote:
>> The current approach is a cut above treating strings as arrays 
>> of bytes
>> for some languages, and still utterly broken for others. If I'm
>> operating on a right to left language like Hebrew, what would 
>> I expect
>> the result to be from something like countUntil?
>
> The entire string processing paraphernalia is left to right. I 
> figure RTL languages are under-supported, but 
> s.retro.countUntil comes to mind.
>
> Andrei

I'm pretty sure that all string operations are actually "front to 
back". If I recall correctly, evenlanguages that "read" right to 
left, are stored in a front to back manner: EG: string[0] would 
be the right-most character. Is is only a question of "display", 
and changes nothing to the code. As for "countUntil", it would 
still work perfectly fine, as a RTL reader would expect the 
counting to start at the "begining" eg: the "Right" side.

I'm pretty confident RTL is 100% supported. The only issue is the 
"front"/"left" abiguity, and the only one I know of is the oddly 
named "stripLeft" function, which actually does a "stripFront" 
anyways.

So I wouldn't worry about RTL.

But as mentioned, it is languages like indian, that have complex 
graphemes, or languages with accentuated characters, eg, most 
europeans ones, that can have problems, such as canFind("cassé", 
'e').

On topic, I think D's implicit default decode to dchar is 
*infinity* times better than C++'s char-based strings. While 
imperfect in terms of grapheme, it was still a design decision 
made of win.

I'd be tempted to not ask "how do we back out", but rather, "how 
can we take this further"? I'd love to ditch the whole 
"char"/"dchar" thing altogether, and work with graphemes. But 
that would be massive involvement.


More information about the Digitalmars-d mailing list