Lazily parse a JSON text file using stdx.data.json?

Jonathan M Davis newsgroup.d at jmdavisprog.com
Sun Dec 17 09:44:21 UTC 2017


On Saturday, December 16, 2017 21:34:22 David Gileadi via Digitalmars-d 
wrote:
> I'm a longtime fan of dlang, but haven't had a chance to do much
> in-depth dlang programming, and especially not range programming. Today
> I thought I'd use stdx.data.json to read from a text file. Since it's a
> somewhat large file, I thought I'd create a text range from the file and
> parse it that way. stdx.data.json has a great interface for lazily
> parsing text into JSON values, so all I had to do was turn a text file
> into a lazy range of UTF-8 chars that stdx.data.json's lexer could use.
> (In my best Clarkson voice:) How hard could it be?
>
> Several hours later, I've finally given up and am just reading the whole
> file into a string. There may be a magic incantation I could use to make
> it work, but I can't find it, and frankly I can't see why I should need
> an incantation in the first place. It really ought to just be a method
> of std.stdio.File.
>
> Apparently some of the complexity is caused by autodecoding (e.g. joiner
> returns a range of dchar from char ranges), and some of the fault may be
> in stdx.data.json, but either way I'm surprised that I couldn't do it.
> This is the kind of thing I expected to be ground level stuff.

I don't know what problems specifically you were hitting, but a lot of
range-based stuff (especially parsing) requires forward ranges so that there
can be some amount of lookahead (having just a basic input range can be
incredibly restrictive), and forward ranges and lazily reading from a file
don't tend to go together very well, because it tends to require allocating
buffers that then have to be copied on save. It gets to be rather difficult
to do it efficiently. std.stdio.File does support lazily reading in a file,
which works well with foreach, but if you're trying to process the entire
file as a range, it's usually just way easier to read in the entire file at
once and operate on it as a dynamic array. The option halfway in between is
to use std.mmfile so that the file gets treated as a dynamic array but the
OS is reading it in piecemeal for you. If I were seriously looking at
reading in a file lazily as a forward range, I'd look at
http://code.dlang.org/packages/iopipe, though as I understand it, it's very
much a work in progress.

As for auto-decoding, yeah, it sucks. You can work around it with stuff like
std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one
that we're likely stuck with, because unfortunately, we haven't found a way
to remove it without breaking everything.

- Jonathan M Davis



More information about the Digitalmars-d mailing list