Range for files by character
Jonathan M Davis
jmdavisProg at gmx.com
Mon May 20 14:47:02 PDT 2013
On Monday, May 20, 2013 23:36:39 Stephan Schiffels wrote:
> Hi,
>
> I need an Input Range that iterates a file character by
> character. In bioinformatics this is often important, and having
> a D-range is of course preferable than any foreach-byLine
> combination, since we can apply filters and other goodies from
> std.algorithm. In this implementation, I am simply filtering out
> new-lines, as an example.
>
> import std.stdio;
> import std.conv;
> import std.algorithm;
>
> void main() {
> auto f = File("someFile.txt", "r");
> foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n'))
> write(to!char(c[0]));
> }
>
> Is this the right way to do it? I was a bit surprised that
> std.stdio doesn't provide a "byChar" or "byByte" range. Is there
> a reason for this, or is this a too special need?
The reality is that what you're doing is horribly inefficient. You never really
want to read a file a byte at a time. You want to read more along the lines of
kilobytes at a time and then process it byte by byte. And for that, you
basically want streams, and work has been done in that area, but it's not
complete yet.
What you will probably need to do is create a range that wraps ByChunk so that
the outer range returns a byte (or char) at a type, but the file gets read
kilobytes at a time (it iterates over ByChunk's buffer until it hits the end
and then pops off ByChunks front and starts at the front of the buffer again).
And if you're stripping out newlines, you might as well just wrap ByLine
instead of ByChunk, since that'll strip out the newlines for you.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list