ByToken Range

Sun Dec 12 09:16:03 PST 2010

On 12/12/2010 02:04 AM, Christopher Nicholson-Sauls wrote:
> On 12/11/10 22:41, Matthias Walter wrote:
>> Hi all,
>>
>> I wrote a ByToken tokenizer that models Range, i.e. it can be used in a
>> foreach loop to read from a std.stdio.File. For it to work one has to
>> supply it with a delegate, taking a current buffer and a controller
>> class instance. It is called to extract a token from the unprocessed
>> part of the buffer, but can act as follows (by calling methods from the
>> controller class):
>>
>> - It can skip some bytes.
>> - It can succeed, by eating some bytes and setting the token to be read
>> by the front() property.
>> - It can request more data.
>> - It can indicate that the data is invalid, in which case further
>> processing is stopped and a user-supplied delegate is invoked that may
>> or may not handle this failure.
>>
>>
>> It is efficient, because it reuses the same buffer every time and just
>> supplies the user with a slice of unprocessed data. If more data is
>> requested, the remaining unprocessed part is copied to the beginning and
>> more data is read. If there is no such unprocessed data, the buffer is
>> enlarged, i.e. length doubled.
>>
>> The ByToken class has the type of a token as a template parameter.
>>
>> Does this behavior make sense? Any further suggestions?
>> Is there any interest in having this functionality, i.e. should I create
>> a dsource project,
>> or does everybody use parser-generators for everything?
>>
>> Matthias
> I write lexers/parsers relatively often -- and I don't use generators...
> because I'm masochistic like that!  And because there aren't many
> options for D.  There was Enki for D1 a while back, which might still
> work pretty well, and there's GOLD although I'm not aware of how their D
> support is right now.  I might be forgetting another.
>
> So I, for one, like the idea of it at the very least.  I'd have to see
> it in action, though, to say much beyond that.
My current version can be used as follows to yield a simple word-tokenizer:

http://pastebin.com/qjH6y0Mf

As I'm going to use it for one or two real-world file formats I might
change some things, but for now I like it. If you have any suggestions
for improvements, please let me know.

Matthias