[phobos] CSVRange: RFC

Jesse Phillips jesse.k.phillips at gmail.com
Mon Jan 31 08:52:52 PST 2011


On Sun, Jan 30, 2011 at 10:52 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> Without having studied the code closely, I could say that asking for an
> input range with slicing is quite a tall order that virtually restricts you
> to random-access ranges.

I agree, the two benefits I saw was returning the original content for
probably most data, and easier to implement for separators which are
more then one character.

> An input range only allows you to move one character forward and never save
> your position or go back. A range with slicing in this context means that we
> can confidently calculate how much of the range we need to take, and that
> automatically requires the range to be able to go forward and then restart
> from a previous position.

True, ForwardRange with slicing and appending.

> Regarding overall design and user-level API, it may be reasonable to assume
> that:
>
> 1. CSV readers are usually often for reading an entire file through the end,
> so optimizations that are mostly applicable to reading one single line are
> unnecessary. At the same time, optimizations for repeated use of
> empty/front/popFront are likely to be beneficial.

I could see streaming an infinite amount of data too, though CSV is
probably not the way to do that.

I think optimizing for repeated use of empty/front will not depend on
the approach taken.

> 2. An entire line's representation as strings must fit in memory as a
> requirement.

I don't think either implementation requires the entire record to be
in memory in string form. Both will operate on each field value and
stop processing before the entire record is read.

> As such, David's implementation that works on a character stream is the most
> general and the theoretical perfect one because one character of lookahead
> is all CSV needs. At the same time, if an implementation assuming (1) and
> (2) above has considerable advantages (speed, convenience) then it might
> trump the theoretically perfect one.

I think if we find benefits to my second approach, I think it would be
worth having an implementation for both. The InputRange version would
be restricted to just CSV text which doesn't use custom separators.

I don't have my heart set on which one should be placed in Phobos,
just want to make it clear why I changed directions, especially since
I think it will be most common to just read in an entire anyway. But
agree it is very restrictive if an implementation to handle InputRange
isn't available.


More information about the phobos mailing list