[phobos] CSVRange: RFC

Mon Jan 31 08:52:52 PST 2011

On Sun, Jan 30, 2011 at 10:52 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> Without having studied the code closely, I could say that asking for an
> input range with slicing is quite a tall order that virtually restricts you
> to random-access ranges.

I agree, the two benefits I saw was returning the original content for
probably most data, and easier to implement for separators which are
more then one character.

> An input range only allows you to move one character forward and never save
> your position or go back. A range with slicing in this context means that we
> can confidently calculate how much of the range we need to take, and that
> automatically requires the range to be able to go forward and then restart
> from a previous position.

True, ForwardRange with slicing and appending.

> Regarding overall design and user-level API, it may be reasonable to assume
> that:
>
> 1. CSV readers are usually often for reading an entire file through the end,
> so optimizations that are mostly applicable to reading one single line are
> unnecessary. At the same time, optimizations for repeated use of
> empty/front/popFront are likely to be beneficial.

I could see streaming an infinite amount of data too, though CSV is
probably not the way to do that.

I think optimizing for repeated use of empty/front will not depend on
the approach taken.

> 2. An entire line's representation as strings must fit in memory as a
> requirement.

I don't think either implementation requires the entire record to be
in memory in string form. Both will operate on each field value and
stop processing before the entire record is read.

> As such, David's implementation that works on a character stream is the most
> general and the theoretical perfect one because one character of lookahead
> is all CSV needs. At the same time, if an implementation assuming (1) and
> (2) above has considerable advantages (speed, convenience) then it might
> trump the theoretically perfect one.

I think if we find benefits to my second approach, I think it would be
worth having an implementation for both. The InputRange version would
be restricted to just CSV text which doesn't use custom separators.

I don't have my heart set on which one should be placed in Phobos,
just want to make it clear why I changed directions, especially since
I think it will be most common to just read in an entire anyway. But
agree it is very restrictive if an implementation to handle InputRange
isn't available.