D2 byChunk

Christopher Nicholson-Sauls ibisbasenji at gmail.com
Fri Dec 10 22:00:17 PST 2010


On 12/10/10 22:36, Matthias Walter wrote:
> On 12/10/2010 09:57 PM, Matthias Walter wrote:
>> Hi all,
>>
>> I currently work on a parser for some file format. I wanted to use the
>> std.stdio.ByChunk Range to read from a file and extract tokens from the
>> chunks. Obviously it can happen that the current chunk ends before a
>> token can be extracted, in which case I can ask for the next chunk from
>> the Range. In order to keep the already-read part in mind, I need to dup
>> at least the unprocessed part of the older chunk and concatenate it in
>> front of the next part or at least write the code that works like they
>> were concatenated. This looks like a stupid approach to me.
>>
>> Here is a small example:
>>
>> file contents: "Hello world"
>> chunks: "Hello w" "orld"
>>
>> First I read the token "Hello" from the first chunk and maybe skip the
>> whitespace. Then I have the "w" (which I need to move away from the
>> buffer, because ByChunk fill overwrite it) and get "orld".
>>
>> My idea was to have a ByChunk-related Object, which the user can tell
>> how much of the buffer he/she actually used, such that it can move this
>> data to the beginning of the buffer and append the next chunk. This
>> wouldn't need further allocations and give the user contiguous data
>> he/she can work with.
> I coded something that works like this:
> 
> foreach (ref ubyte[] data; byBuffer(file, 12))
> {
>   writefln("[%s]", cast(string) data);
>   data = data[$-2 .. $];
> }
> 
> The 2nd line in the loop tells ByBuffer that we didn't process the last
> two chars and would like to get them again along with newly read data.
> And as long as we do process something, the internal buffer does not get
> reallocated.
> 
> It works and respects the formal requirements of ranges. Whether it
> respects the intended semantics, one can discuss about. Any comments
> whether the above things make sense or is an evil exploit of the
> provided syntax sugar?

I don't think it's a bad approach, but I have a suggestion.

It leaves a lot of room for abuse or misuse if you require the user code
to modify the data[] array in order to send this "protect some
characters" message.  I think it would be better to provide an explicit
function/method that means precisely that.  Maybe return a transparent
struct wrapping a view to the buffer's data, that further provides a
function for doing precisely this.

foreach( data; byBuffer( file, 12 )) {
  // do things with data, decide we need to keep 2 chars
  data.save( 2 );
}

Or something like it.  With regards to this, you may want to allow the
internal buffer to grow (if you aren't already) as needed.  Imagine what
would otherwise happen if you needed to 'save' the entire current buffer.

-- Chris N-S


More information about the Digitalmars-d-learn mailing list