What is the best way to use requests and iopipe on gzipped JSON file
Andrew Edwards
edwards.ac at gmail.com
Fri Oct 13 22:24:01 UTC 2017
On Friday, 13 October 2017 at 21:53:12 UTC, Steven Schveighoffer
wrote:
> On 10/13/17 4:27 PM, Andrew Edwards wrote:
>> On Friday, 13 October 2017 at 19:17:54 UTC, Steven
>> Schveighoffer wrote:
>>> On 10/13/17 2:47 PM, Andrew Edwards wrote:
>>>> A bit of advice, please. I'm trying to parse a gzipped JSON
>>>> file retrieved from the internet. The following naive
>>>> implementation accomplishes the task:
>>>>
>>>> auto url =
>>>> "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
>>>> getContent(url)
>>>> .data
>>>> .unzip
>>>> .runEncoded!((input) {
>>>> ubyte[] content;
>>>> foreach (line; input.byLineRange!true) {
>>>> content ~= cast(ubyte[])line;
>>>> }
>>>> auto json = (cast(string)content).parseJSON;
>>>
>>> input is an iopipe of char, wchar, or dchar. There is no need
>>> to cast it around.
>>
>> In this particular case, all three types (char[], wchar[], and
>> dchar[]) are being returned at different points in the loop. I
>> don't know of any other way to generate a unified buffer than
>> casting it to ubyte[].
>
> This has to be a misunderstanding. The point of runEncoded is
> to figure out the correct type (based on the BOM), and run your
> lambda function with the correct type for the whole thing.
Maybe I'm just not finding the correct words to express my
thoughts. This is what I mean:
// ===========
void main()
{
auto url =
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
getContent(url)
.data
.unzip
.runEncoded!((input) {
char[] content; // Line 20
foreach (line; input.byLineRange!true) {
content ~= line;
}
});
}
output:
source/app.d(20,13): Error: cannot append type wchar[] to type
char[]
Changing line 20 to wchar yields:
source/app.d(20,13): Error: cannot append type char[] to type
wchar[]
And changing it to dchar[] yields:
source/app.d(20,13): Error: cannot append type char[] to type
dchar[]
> I'm not sure actually this is even needed, as the data could be
> coming through without a BOM. Without a BOM, it assumes UTF8.
>
>>> Note also that getContent returns a complete body, but unzip
>>> may not be so forgiving. But there definitely isn't a reason
>>> to create your own buffer here.
>>>
>>> this should work (something like this really should be in
>>> iopipe):
>>>
>>> while(input.extend(0) != 0) {} // get data until EOF
>>
>> This!!! This is what I was looking for. Thank you. I
>> incorrectly assumed that if I didn't process the content of
>> input.window, it would be overwritten on each .extend() so my
>> implementation was:
>>
>> ubyte[] json;
>> while(input.extend(0) != 0) {
>> json ~= input.window;
>> }
>>
>> This didn't work because it invalidated the Unicode data so I
>> ended up splitting by line instead.
>>
>> Sure enough, this is trivial once one knows how to use it
>> correctly, but I think it would be better to put this in the
>> library as extendAll().
>
> ensureElems(size_t.max) should be equivalent, though I see you
> responded cryptically with something about JSON there :)
:) I'll have to blame it on my Security+ training. Switching out
the while loop with ensureElements() in the following results in
an error:
void main()
{
auto url =
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
getContent(url)
.data
.unzip
.runEncoded!((input) {
// while(input.extend(0) != 0){} // this works
input.ensureElems(size_t.max); // this doesn't
auto json = input.window.parseJSON;
foreach (size_t ndx, _; json) {
if (ndx == 0) continue;
auto title = json[ndx]["title"].str;
auto author = json[ndx]["writer"].str;
writefln("title: %s", title);
writefln("author: %s\n", author);
}
});
}
output:
Running ./uhost
std.json.JSONException at std/json.d(1400): Unexpected end of data.
(Line 1:8192)
----------------
4 uhost 0x000000010b671112 pure
@safe void std.json.parseJSON!(char[]).parseJSON(char[], int,
std.json.JSONOptions).error(immutable(char)[]) + 86
[etc]
More information about the Digitalmars-d-learn
mailing list