ByToken Range

Sat Dec 11 23:04:22 PST 2010

On 12/11/10 22:41, Matthias Walter wrote:
> Hi all,
> 
> I wrote a ByToken tokenizer that models Range, i.e. it can be used in a
> foreach loop to read from a std.stdio.File. For it to work one has to
> supply it with a delegate, taking a current buffer and a controller
> class instance. It is called to extract a token from the unprocessed
> part of the buffer, but can act as follows (by calling methods from the
> controller class):
> 
> - It can skip some bytes.
> - It can succeed, by eating some bytes and setting the token to be read
> by the front() property.
> - It can request more data.
> - It can indicate that the data is invalid, in which case further
> processing is stopped and a user-supplied delegate is invoked that may
> or may not handle this failure.
> 
> 
> It is efficient, because it reuses the same buffer every time and just
> supplies the user with a slice of unprocessed data. If more data is
> requested, the remaining unprocessed part is copied to the beginning and
> more data is read. If there is no such unprocessed data, the buffer is
> enlarged, i.e. length doubled.
> 
> The ByToken class has the type of a token as a template parameter.
> 
> Does this behavior make sense? Any further suggestions?
> Is there any interest in having this functionality, i.e. should I create
> a dsource project,
> or does everybody use parser-generators for everything?
> 
> Matthias

I write lexers/parsers relatively often -- and I don't use generators...
because I'm masochistic like that!  And because there aren't many
options for D.  There was Enki for D1 a while back, which might still
work pretty well, and there's GOLD although I'm not aware of how their D
support is right now.  I might be forgetting another.

So I, for one, like the idea of it at the very least.  I'd have to see
it in action, though, to say much beyond that.

-- Chris N-S