What is the best way to use requests and iopipe on gzipped JSON file

Andrew Edwards edwards.ac at gmail.com
Fri Oct 13 22:24:01 UTC 2017


On Friday, 13 October 2017 at 21:53:12 UTC, Steven Schveighoffer 
wrote:
> On 10/13/17 4:27 PM, Andrew Edwards wrote:
>> On Friday, 13 October 2017 at 19:17:54 UTC, Steven 
>> Schveighoffer wrote:
>>> On 10/13/17 2:47 PM, Andrew Edwards wrote:
>>>> A bit of advice, please. I'm trying to parse a gzipped JSON 
>>>> file retrieved from the internet. The following naive 
>>>> implementation accomplishes the task:
>>>>
>>>>      auto url = 
>>>> "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
>>>>      getContent(url)
>>>>          .data
>>>>          .unzip
>>>>          .runEncoded!((input) {
>>>>              ubyte[] content;
>>>>              foreach (line; input.byLineRange!true) {
>>>>                  content ~= cast(ubyte[])line;
>>>>              }
>>>>              auto json = (cast(string)content).parseJSON;
>>>
>>> input is an iopipe of char, wchar, or dchar. There is no need 
>>> to cast it around.
>> 
>> In this particular case, all three types (char[], wchar[], and 
>> dchar[]) are being returned at different points in the loop. I 
>> don't know of any other way to generate a unified buffer than 
>> casting it to ubyte[].
>
> This has to be a misunderstanding. The point of runEncoded is 
> to figure out the correct type (based on the BOM), and run your 
> lambda function with the correct type for the whole thing.

Maybe I'm just not finding the correct words to express my 
thoughts. This is what I mean:

// ===========

void main()
{
	auto url = 
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
	getContent(url)
		.data
		.unzip
		.runEncoded!((input) {
			char[] content; // Line 20
			foreach (line; input.byLineRange!true) {
				content ~= line;
			}
		});
}

output:
source/app.d(20,13): Error: cannot append type wchar[] to type 
char[]

Changing line 20 to wchar yields:
source/app.d(20,13): Error: cannot append type char[] to type 
wchar[]

And changing it to dchar[] yields:
source/app.d(20,13): Error: cannot append type char[] to type 
dchar[]

> I'm not sure actually this is even needed, as the data could be 
> coming through without a BOM. Without a BOM, it assumes UTF8.
>
>>> Note also that getContent returns a complete body, but unzip 
>>> may not be so forgiving. But there definitely isn't a reason 
>>> to create your own buffer here.
>>>
>>> this should work (something like this really should be in 
>>> iopipe):
>>>
>>> while(input.extend(0) != 0) {} // get data until EOF
>> 
>> This!!! This is what I was looking for. Thank you. I 
>> incorrectly assumed that if I didn't process the content of 
>> input.window, it would be overwritten on each .extend() so my 
>> implementation was:
>> 
>> ubyte[] json;
>> while(input.extend(0) != 0) {
>>      json ~= input.window;
>> }
>> 
>> This didn't work because it invalidated the Unicode data so I 
>> ended up splitting by line instead.
>> 
>> Sure enough, this is trivial once one knows how to use it 
>> correctly, but I think it would be better to put this in the 
>> library as extendAll().
>
> ensureElems(size_t.max) should be equivalent, though I see you 
> responded cryptically with something about JSON there :)

:) I'll have to blame it on my Security+ training. Switching out 
the while loop with ensureElements() in the following results in 
an error:

void main()
{
	auto url = 
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
	getContent(url)
		.data
		.unzip
		.runEncoded!((input) {
			// while(input.extend(0) != 0){} // this works
			input.ensureElems(size_t.max); // this doesn't
			auto json = input.window.parseJSON;
			foreach (size_t ndx, _; json) {
				if (ndx == 0) continue;
				auto title = json[ndx]["title"].str;
				auto author = json[ndx]["writer"].str;
				writefln("title: %s", title);
				writefln("author: %s\n", author);
			}
		});
}

output:

Running ./uhost
std.json.JSONException at std/json.d(1400): Unexpected end of data. 
(Line 1:8192)
----------------
4   uhost                               0x000000010b671112 pure 
@safe void std.json.parseJSON!(char[]).parseJSON(char[], int, 
std.json.JSONOptions).error(immutable(char)[]) + 86

[etc]




More information about the Digitalmars-d-learn mailing list