ByToken Range

Sat Dec 11 20:41:43 PST 2010

Hi all,

I wrote a ByToken tokenizer that models Range, i.e. it can be used in a
foreach loop to read from a std.stdio.File. For it to work one has to
supply it with a delegate, taking a current buffer and a controller
class instance. It is called to extract a token from the unprocessed
part of the buffer, but can act as follows (by calling methods from the
controller class):

- It can skip some bytes.
- It can succeed, by eating some bytes and setting the token to be read
by the front() property.
- It can request more data.
- It can indicate that the data is invalid, in which case further
processing is stopped and a user-supplied delegate is invoked that may
or may not handle this failure.

It is efficient, because it reuses the same buffer every time and just
supplies the user with a slice of unprocessed data. If more data is
requested, the remaining unprocessed part is copied to the beginning and
more data is read. If there is no such unprocessed data, the buffer is
enlarged, i.e. length doubled.

The ByToken class has the type of a token as a template parameter.

Does this behavior make sense? Any further suggestions?
Is there any interest in having this functionality, i.e. should I create
a dsource project,
or does everybody use parser-generators for everything?

Matthias