D2 byChunk

Fri Dec 10 20:36:16 PST 2010

On 12/10/2010 09:57 PM, Matthias Walter wrote:
> Hi all,
>
> I currently work on a parser for some file format. I wanted to use the
> std.stdio.ByChunk Range to read from a file and extract tokens from the
> chunks. Obviously it can happen that the current chunk ends before a
> token can be extracted, in which case I can ask for the next chunk from
> the Range. In order to keep the already-read part in mind, I need to dup
> at least the unprocessed part of the older chunk and concatenate it in
> front of the next part or at least write the code that works like they
> were concatenated. This looks like a stupid approach to me.
>
> Here is a small example:
>
> file contents: "Hello world"
> chunks: "Hello w" "orld"
>
> First I read the token "Hello" from the first chunk and maybe skip the
> whitespace. Then I have the "w" (which I need to move away from the
> buffer, because ByChunk fill overwrite it) and get "orld".
>
> My idea was to have a ByChunk-related Object, which the user can tell
> how much of the buffer he/she actually used, such that it can move this
> data to the beginning of the buffer and append the next chunk. This
> wouldn't need further allocations and give the user contiguous data
> he/she can work with.
I coded something that works like this:

foreach (ref ubyte[] data; byBuffer(file, 12))
{
  writefln("[%s]", cast(string) data);
  data = data[$-2 .. $];
}

The 2nd line in the loop tells ByBuffer that we didn't process the last
two chars and would like to get them again along with newly read data.
And as long as we do process something, the internal buffer does not get
reallocated.

It works and respects the formal requirements of ranges. Whether it
respects the intended semantics, one can discuss about. Any comments
whether the above things make sense or is an evil exploit of the
provided syntax sugar?