toUTFz and WinAPI GetTextExtentPoint32W

Timon Gehr timon.gehr at gmx.ch
Tue Sep 20 17:04:47 PDT 2011


On 09/21/2011 01:57 AM, Christophe wrote:
> "Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
>   écrit :
>> On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
>>> On 9/20/11, Jonathan M Davis<jmdavisProg at gmx.com>  wrote:
>>>> Or std.range.walkLength. I don't know why we really have std.utf.count. I
>>>> just
>>>> calls walkLength anyway. I suspect that it's a function that predates
>>>> walkLength and was made to use walkLength after walkLength was
>>>> introduced. But
>>>> it's kind of pointless now.
>>>>
>>>> - Jonathan M Davis
>>>
>>> I don't think having better-named aliases is a bad thing. Although now
>>> I'm seeing it's not just an alias but a function.
>>
>
> std.utf.count has on advantage: someone looking for the function will
> find it. The programmer might not look in std.range to find a function
> about UFT strings, and even if he did, it is not indicated in walkLength
> that it works with (narrow) strings the way it does. To know you can use
> walklength, you must know that:
> -popFront works differently in string.
> -hasLength is not true for strings.
> -what is walkLength.
>
> So yes, you experienced programmer don't need std.utf.count, but newbies
> do.
>
> Last point: WalkLength is not optimized for strings.
> std.utf.count should be.
>
> This short implementation of count was 3 to 8 times faster than
> walkLength is a simple benchmark:
>
> size_t myCount(string text)
> {
>    size_t n = text.length;
>    for (uint i=0; i<text.length; ++i)
>      {
>        auto s = text[i]>>6;
>        n -= (s>>1) - ((s+1)>>2);
>      }
>    return n;
> }
>
> (compiled with gdc on 64 bits, the sample text was the introduction of
> french wikipedia UTF-8 article down to the sommaire -
> http://fr.wikipedia.org/wiki/UTF-8 ).
>
> The reason is that the loop can be unrolled by the compiler.

Very good point, you might want to file an enhancement request. It would 
make the functionality different enough to prevent count from being 
removed: walkLength throws on an invalid UTF sequence.


More information about the Digitalmars-d-learn mailing list