Request for comments: std.d.lexer

Fri Feb 8 02:29:20 PST 2013

08-Feb-2013 13:40, Jonathan M Davis пишет:
> On Friday, February 08, 2013 12:12:30 Dmitry Olshansky wrote:
>> 08-Feb-2013 12:01, Jonathan M Davis пишет:
>>> On Tuesday, February 05, 2013 22:51:32 Andrei Alexandrescu wrote:
>>>> I think it would be reasonable for a lexer to require a range of ubyte
>>>> as input, and carry its own decoding. In the first approximation it may
>>>> even require a random-access range of ubyte.
>>>
>>> Another big issue is the fact that in some ways, using a pointer like
>>> dmd's
>>> lexer does is actually superior to using a range. In particular, it's
>>> trivial to determine where in the text a token is, because you can simply
>>> subtract the pointer in the token from the initial pointer. Strings would
>>> be okay too, because you can subtract their ptr properties. But the
>>> closest that you'll get with ranges is to subtract their lengths, and the
>>> only ranges that are likely to define length are random-access ranges.
>>
>> Not true, certain ranges know length but can't be random access as
>> indexing is O(lgN) or worse. Including a stripe of chunks as taken from
>> file.
>
> I said that the only ones which are "likely" to define length are random-access
> range. There _are_ other ranges which can, but in most cases, if you can know
> the length, you can do random access as well.

Well I honestly disagree about the promise of knowing length - being 
able to index. "The most ranges" is arrays and wrappers on top of these.
Given current realities oF D and Phobos I'm afraid you are right though.

  Regardless, the main issue still
> stands in that dealing with keeping track of the index of the code unit of a
> token is more complicated and generally more expensive with ranges than it is
> with a pointer.

If target is random access range just use offset throughout. It's 
basically becomes base + offset vs base + pointer i.e. non-issue

If not then pointer argument no longer applies and you can just as well 
use separate counter on per popFront. It'd not that costly in this case 
and flexibility tramps other concerns with forward ranges in any case.

> Some range types will do better than others, but short of
> using a string's ptr property, there's always going to be some additional
> overhead in comparison to pointers to keep track of the indices or to keep a
> range or slice of one as part of a token. The pointer's just more lightweight.
> That doesn't make ranges unacceptable by any means. It just means that they're
> going to take at least a slight performance hit in comparison to pointers.

See above. Pointer to something inside of a buffer == index in buffer, 
typically even with pointer you can't drop the 'buffer' reference itself.

-- 
Dmitry Olshansky