toUTFz and WinAPI GetTextExtentPoint32W

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Sep 21 03:37:33 PDT 2011


On 21.09.2011 4:04, Timon Gehr wrote:
> On 09/21/2011 01:57 AM, Christophe wrote:
>> "Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
>> écrit :
>>> On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
>>>> On 9/20/11, Jonathan M Davis<jmdavisProg at gmx.com> wrote:
>>>>> Or std.range.walkLength. I don't know why we really have
>>>>> std.utf.count. I
>>>>> just
>>>>> calls walkLength anyway. I suspect that it's a function that predates
>>>>> walkLength and was made to use walkLength after walkLength was
>>>>> introduced. But
>>>>> it's kind of pointless now.
>>>>>
>>>>> - Jonathan M Davis
>>>>
>>>> I don't think having better-named aliases is a bad thing. Although now
>>>> I'm seeing it's not just an alias but a function.
>>>
>>
>> std.utf.count has on advantage: someone looking for the function will
>> find it. The programmer might not look in std.range to find a function
>> about UFT strings, and even if he did, it is not indicated in walkLength
>> that it works with (narrow) strings the way it does. To know you can use
>> walklength, you must know that:
>> -popFront works differently in string.
>> -hasLength is not true for strings.
>> -what is walkLength.
>>
>> So yes, you experienced programmer don't need std.utf.count, but newbies
>> do.
>>
>> Last point: WalkLength is not optimized for strings.
>> std.utf.count should be.
>>
>> This short implementation of count was 3 to 8 times faster than
>> walkLength is a simple benchmark:
>>
>> size_t myCount(string text)
>> {
>> size_t n = text.length;
>> for (uint i=0; i<text.length; ++i)
>> {
>> auto s = text[i]>>6;
>> n -= (s>>1) - ((s+1)>>2);
>> }
>> return n;
>> }
>>
>> (compiled with gdc on 64 bits, the sample text was the introduction of
>> french wikipedia UTF-8 article down to the sommaire -
>> http://fr.wikipedia.org/wiki/UTF-8 ).
>>
>> The reason is that the loop can be unrolled by the compiler.
>
> Very good point, you might want to file an enhancement request. It would
> make the functionality different enough to prevent count from being
> removed: walkLength throws on an invalid UTF sequence.

Actually, I don't buy it. I guess the reason it's faster is that it 
doesn't check if the codepoint is valid. In fact you can easily get 
ridiculous overflowed "negative" lengths. Maybe we can put it here as 
unsafe and fast version though.
Also check std.utf.stride to see if you can get it better, it's the 
beast behind narrow string popFront.

-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list