D's confusing strings (was Re: D on hackernews)
Christophe Travert
travert at phare.normalesup.org
Wed Sep 21 13:26:22 PDT 2011
Andrei Alexandrescu , dans le message (digitalmars.D:144936), a écrit :
> On 9/21/11 1:20 PM, Christophe Travert wrote:
>> Dealing with utfencoded strings is less efficient, but there is a number
>> of algorithms that can be optimized for utfencoded strings, like copying
>> or finding an ascii char in a string. Unfortunately, there is no
>> practical way to do this with the current range API.
>
> I'd love to hear more about that. The standard library does optimize
> certain algorithms for UTF strings.
Well, in that other thread called "Re: toUTFz and WinAPI
GetTextExtentPoint32W/" in D.learn (what is the proper way to refer to
a message here ?), I showed how to improve walkLength for strings and
utf.stride.
About finding a character in a string, rather than relying
on string.popFront, which makes the loop un-unrollable,
we could search code unit per code unit directly. This is obviously
better for ascii char, and I'll be looking for a nice idea for other
code points (besides using find(Range, Range)).
I didn't review phobos with that idea in mind, and didn't do any
benchmark exept the one for walkLength, but using string.popFront is a
bad idea in term of performance, so work-arrounds are often better, and
they are not that hard to find. I may do that when I have more time to
give to D.
--
Christophe
More information about the Digitalmars-d
mailing list