Major performance problem with std.array.front()

Sean Kelly sean at invisibleduck.org
Sun Mar 9 08:30:11 PDT 2014


On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote:
> On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu 
> wrote:
>>> The current approach is a cut above treating strings as 
>>> arrays of bytes
>>> for some languages, and still utterly broken for others. If 
>>> I'm
>>> operating on a right to left language like Hebrew, what would 
>>> I expect
>>> the result to be from something like countUntil?
>>
>> The entire string processing paraphernalia is left to right. I 
>> figure RTL languages are under-supported, but 
>> s.retro.countUntil comes to mind.
>>
>> Andrei
>
> I'm pretty sure that all string operations are actually "front 
> to back". If I recall correctly, evenlanguages that "read" 
> right to left, are stored in a front to back manner: EG: 
> string[0] would be the right-most character. Is is only a 
> question of "display", and changes nothing to the code. As for 
> "countUntil", it would still work perfectly fine, as a RTL 
> reader would expect the counting to start at the "begining" eg: 
> the "Right" side.
>
> I'm pretty confident RTL is 100% supported. The only issue is 
> the "front"/"left" abiguity, and the only one I know of is the 
> oddly named "stripLeft" function, which actually does a 
> "stripFront" anyways.
>
> So I wouldn't worry about RTL.

Yeah, I think RTL strings are preceded by a code point that 
indicates RTL display. It was just something I mentioned because 
some operations might be confusing to the programmer.


> But as mentioned, it is languages like indian, that have 
> complex graphemes, or languages with accentuated characters, 
> eg, most europeans ones, that can have problems, such as 
> canFind("cassé", 'e').

True. I still question why anyone would want to do 
character-based operations on Unicode strings. I guess substring 
searches could even end up with the same problem in some cases if 
not implemented specifically for Unicode for the same reason, but 
those should be far less common.


More information about the Digitalmars-d mailing list