block file reads and lazy utf-8 decoding
Jon D via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Dec 9 16:36:27 PST 2015
I want to combine block reads with lazy conversion of utf-8
characters to dchars. Solution I came with is in the program
below. This works fine. Has good performance, etc.
Question I have is if there is a better way to do this. For
example, a different way to construct the lazy 'decodeUTF8Range'
rather than writing it out in this fashion. There is quite a bit
of power in the library and I'm still learning it. I'm wondering
if I overlooked a useful alternative.
--Jon
Program:
-----------
import std.algorithm: each, joiner, map;
import std.conv;
import std.range;
import std.stdio;
import std.traits;
import std.utf: decodeFront;
auto decodeUTF8Range(Range)(Range charSource)
if (isInputRange!Range && is(Unqual!(ElementType!Range) ==
char))
{
static struct Result
{
private Range source;
private dchar next;
bool empty = false;
dchar front() @property { return next; }
void popFront() {
if (source.empty) {
empty = true;
next = dchar.init;
} else {
next = source.decodeFront;
}
}
}
auto r = Result(charSource);
r.popFront;
return r;
}
void main(string[] args)
{
if (args.length != 2) { writeln("Provide one file name.");
return; }
ubyte[1024*1024] rawbuf;
auto inputStream = args[1].File();
inputStream
.byChunk(rawbuf) // Read in blocks
.joiner // Join the blocks into a single
input char range
.map!(a => to!char(a)) // Cast ubyte to char for
decodeFront. Any better ways?
.decodeUTF8Range // utf8 to dchar conversion.
.each; // Real work goes here.
writeln("done");
}
More information about the Digitalmars-d-learn
mailing list