D's confusing strings (was Re: D on hackernews)

Wed Sep 21 13:26:22 PDT 2011

Andrei Alexandrescu , dans le message (digitalmars.D:144936), a écrit :
> On 9/21/11 1:20 PM, Christophe Travert wrote:
>> Dealing with utfencoded strings is less efficient, but there is a number
>> of algorithms that can be optimized for utfencoded strings, like copying
>> or finding an ascii char in a string. Unfortunately, there is no
>> practical way to do this with the current range API.
> 
> I'd love to hear more about that. The standard library does optimize 
> certain algorithms for UTF strings.

Well, in that other thread called "Re: toUTFz and WinAPI 
GetTextExtentPoint32W/" in D.learn (what is the proper way to refer to 
a message here ?), I showed how to improve walkLength for strings and 
utf.stride.

About finding a character in a string, rather than relying 
on string.popFront, which makes the loop un-unrollable, 
we could search code unit per code unit directly. This is obviously 
better for ascii char, and I'll be looking for a nice idea for other 
code points (besides using find(Range, Range)).

I didn't review phobos with that idea in mind, and didn't do any 
benchmark exept the one for walkLength, but using string.popFront is a 
bad idea in term of performance, so work-arrounds are often better, and 
they are not that hard to find. I may do that when I have more time to 
give to D.

-- 
Christophe