toUTFz and WinAPI GetTextExtentPoint32W

Timon Gehr timon.gehr at gmx.ch
Wed Sep 21 05:46:48 PDT 2011


On 09/21/2011 12:37 PM, Dmitry Olshansky wrote:
> On 21.09.2011 4:04, Timon Gehr wrote:
>> On 09/21/2011 01:57 AM, Christophe wrote:
>>> "Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
>>> écrit :
>>>> On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
>>>>> On 9/20/11, Jonathan M Davis<jmdavisProg at gmx.com> wrote:
>>>>>> Or std.range.walkLength. I don't know why we really have
>>>>>> std.utf.count. I
>>>>>> just
>>>>>> calls walkLength anyway. I suspect that it's a function that predates
>>>>>> walkLength and was made to use walkLength after walkLength was
>>>>>> introduced. But
>>>>>> it's kind of pointless now.
>>>>>>
>>>>>> - Jonathan M Davis
>>>>>
>>>>> I don't think having better-named aliases is a bad thing. Although now
>>>>> I'm seeing it's not just an alias but a function.
>>>>
>>>
>>> std.utf.count has on advantage: someone looking for the function will
>>> find it. The programmer might not look in std.range to find a function
>>> about UFT strings, and even if he did, it is not indicated in walkLength
>>> that it works with (narrow) strings the way it does. To know you can use
>>> walklength, you must know that:
>>> -popFront works differently in string.
>>> -hasLength is not true for strings.
>>> -what is walkLength.
>>>
>>> So yes, you experienced programmer don't need std.utf.count, but newbies
>>> do.
>>>
>>> Last point: WalkLength is not optimized for strings.
>>> std.utf.count should be.
>>>
>>> This short implementation of count was 3 to 8 times faster than
>>> walkLength is a simple benchmark:
>>>
>>> size_t myCount(string text)
>>> {
>>> size_t n = text.length;
>>> for (uint i=0; i<text.length; ++i)
>>> {
>>> auto s = text[i]>>6;
>>> n -= (s>>1) - ((s+1)>>2);
>>> }
>>> return n;
>>> }
>>>
>>> (compiled with gdc on 64 bits, the sample text was the introduction of
>>> french wikipedia UTF-8 article down to the sommaire -
>>> http://fr.wikipedia.org/wiki/UTF-8 ).
>>>
>>> The reason is that the loop can be unrolled by the compiler.
>>
>> Very good point, you might want to file an enhancement request. It would
>> make the functionality different enough to prevent count from being
>> removed: walkLength throws on an invalid UTF sequence.
>
> Actually, I don't buy it. I guess the reason it's faster is that it
> doesn't check if the codepoint is valid. In fact you can easily get
> ridiculous overflowed "negative" lengths.

Most of these could be caught by a final check. I think having the 
option of a version that is so much faster would be nice. Chances are 
pretty high that code actually manipulating the string will throw 
eventually if it is invalid.

 > Maybe we can put it here as
> unsafe and fast version though.
> Also check std.utf.stride to see if you can get it better, it's the
> beast behind narrow string popFront.
>



More information about the Digitalmars-d-learn mailing list