VLERange: a range in between BidirectionalRange and RandomAccessRange

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Jan 17 09:36:47 PST 2011


On 1/17/11 10:55 AM, spir wrote:
> On 01/15/2011 12:21 AM, Michel Fortin wrote:
>> Also, it'd really help this discussion to have some hard numbers about
>> the cost of decoding graphemes.
>
> Text has a perf module that provides such numbers (on different stages
> of Text object construction) (but the measured algos are not yet
> stabilised, so that said numbers regularly change, but in the right
> sense ;-)
> You can try the current version at
> https://bitbucket.org/denispir/denispir-d/src (the perf module is called
> chrono.d)
>
> For information, recently, the cost of full text construction: decoding,
> normalisation (both decomp & ordering), piling, was about 5 times
> decoding alone. The heavy part (~ 70%) beeing piling. But Stephan just
> informed me about a new gain in piling I have not yet tested.
> This performance places our library in-between Windows native tools and
> ICU in terms of speed. Which is imo rather good for a brand new tool
> written in a still unstable language.
>
> I have carefully read your arguments on Text's approach to
> systematically "pile" and normalise source texts not beeing the right
> one from an efficiency point of view. Even for strict use cases of
> universal text manipulation (because the relative space cost would
> indirectly cause time cost due to cache effects). Instead, you state we
> should "pile" and/or normalise on the fly. But I am, similarly to you,
> rather doubtful on this point without any numbers available.
> So, let us produce some benchmark results on both approaches if you like.

Congrats on this great work. The initial numbers are in keeping with my 
expectation; UTF adds for certain primitives up to 3x overhead compared 
to ASCII, and I expect combining character handling to bring about as 
much on top of that.

Your work and Steve's won't go to waste; one way or another we need to 
add grapheme-based processing to D. I think it would be great if later 
on a Phobos submission was made.


Andrei


More information about the Digitalmars-d mailing list