What's the simplest way to read a file token by token?

Jonathan M Davis jmdavisProg at gmx.com
Sat Aug 10 17:12:55 PDT 2013


On Saturday, August 10, 2013 19:34:20 Carl Sturtivant wrote:
> On Saturday, 10 August 2013 at 17:09:29 UTC, Carl Sturtivant
> 
> wrote:
> > What's the simplest way in D to read a file token by token,
> > where the read tokens are D strings, and they are separated in
> > the file by arbitrary non-zero amounts of white space
> > (including spaces, tabs and newlines at a minimum)?
> 
> I couldn't find a function that did just this, and various ways I
> implemented it seemed too complex. Is there such a function in a
> D library?

If you have a string (or any range of dchar) already, you can use 
std.algorith.splitter:

import std.algorithm;

void main()
{
    auto str = "hello world    goodbye charlie.";
    assert(equal(splitter(str),
           ["hello", "world", "goodbye", "charlie."]));
}

However, reading from a file is quite a bit more problematic, as we don't have 
proper stream stuff yet (we're still waiting on std.io to be finished so that we 
can have that). And that means that what we have for reading files is a lot 
less flexible. In general, you're probably going to be reading it in line by 
line with std.stdio.byLine, in chunks of bytes via std.stdio.byChunk, or all 
at once with std.file.readText.

Something that does what you want could certainly be built on top of either 
byLine or byChunk without a lot of effort, but it obviously doesn't work right 
out of the box. readText will work great (since you can just use splitter on 
its result), but it does mean reading the entire file in at once. Still, in 
most cases, that's what I'd do. It's only going to be a problem if the file is 
going to be particularly large, and since splitter is just slicing the string 
that you give it (rather than copying it), you shouldn't end up with the file 
in memory more than once.

At some point, we will have full, range-compatible stream support in Phobos, 
and the situation will definitely improve, but for now, those are probably your 
best options.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list