VLERange: a range in between BidirectionalRange and RandomAccessRange

Tue Jan 18 05:25:40 PST 2011

On 01/18/2011 03:52 AM, Andrei Alexandrescu wrote:
> On 1/17/11 5:13 PM, spir wrote:
>> On 01/17/2011 07:57 PM, Andrei Alexandrescu wrote:
>>> * Line 130: representing a text as a dchar[][] has its advantages but
>>> major efficiency issues. To be frank I think it's a disaster. I think a
>>> representation building on UTF strings directly is bound to be vastly
>>> better.
>>
>> I don't understand your point. Where is the difference with D's builtin
>> types, then?
>
> Unfortunately I won't have much time to discuss all these points, but
> this is a simple one: using dchar[][] wastes memory and time. You need
> to build on a flatter representation. Don't confuse the abstraction you
> are building with its underlying representation. The difference between
> your abstraction and char[]/wchar[]/dchar[] (which I strongly recommend
> you to build on) is that the abstractions offer different, higher-level
> primitives that the representation doesn't.

I think it is needed to repeat again the following: Text in my view (or 
whatever variant solution to work correctly with universal text) is 
_not_ intended as a basic string type, even less default.
If programmers can guarantee all their app's input will ever hold 
single-codepoint characters only, _or_ if they jst pass pieces of text 
around without manipulation, then such a tool is big overkill.

It has a time cost a Text construction time, which I consider as an 
investment. It has also some space & time cost for operations that 
should be only slightly relevant compared to speed offered by the simple 
facts routines can then operate just (actualy nearly) like with historic 
charsets.
Indexing is just normal O(1) indexing, possibly plus producing the 
result. Not O(n) across the source with building piles along the way. 
(1000X slower, 1000000X slower?)
Counting is just O(n) with mini-array compares, not building & 
normalising piles across the whole code sequence. (10X, 100X slower?)

> Let me repeat again: if anyone in this community wants to put work in a
> forward range that iterates one grapheme at a time, that work would be
> very valuable because it will allow us to experiment with graphemes in a
> non-disruptive way while benefiting of a host of algorithms. ByGrapheme
> and friends will help more than defining new string types.

Right. I understand your point-of-view, esp "non-disruptive".
But then, how to avoid the possibly huge inefficiency evoked above? We 
have no true perf numbers yet, right, for any alternative to Text's 
approach. But for this reason we also should not randomly speak of this 
approach's space & time costs. Compared to what?

Denis
_________________
vita es estrany
spir.wikidot.com