D's confusing strings (was Re: D on hackernews)

Wed Sep 21 13:53:01 PDT 2011

On 9/21/11 3:26 PM, Christophe Travert wrote:
> Andrei Alexandrescu , dans le message (digitalmars.D:144936), a écrit :
>> On 9/21/11 1:20 PM, Christophe Travert wrote:
>>> Dealing with utfencoded strings is less efficient, but there is a number
>>> of algorithms that can be optimized for utfencoded strings, like copying
>>> or finding an ascii char in a string. Unfortunately, there is no
>>> practical way to do this with the current range API.
>>
>> I'd love to hear more about that. The standard library does optimize
>> certain algorithms for UTF strings.
>
>
> Well, in that other thread called "Re: toUTFz and WinAPI
> GetTextExtentPoint32W/" in D.learn (what is the proper way to refer to
> a message here ?), I showed how to improve walkLength for strings and
> utf.stride.

Interesting, thanks.

> About finding a character in a string, rather than relying
> on string.popFront, which makes the loop un-unrollable,
> we could search code unit per code unit directly. This is obviously
> better for ascii char, and I'll be looking for a nice idea for other
> code points (besides using find(Range, Range)).
>
> I didn't review phobos with that idea in mind, and didn't do any
> benchmark exept the one for walkLength, but using string.popFront is a
> bad idea in term of performance, so work-arrounds are often better, and
> they are not that hard to find. I may do that when I have more time to
> give to D.

That sounds great. Looking forward to your pull requests!

Andrei