What is the best way to use requests and iopipe on gzipped JSON file

Steven Schveighoffer schveiguy at yahoo.com
Tue Oct 17 13:17:42 UTC 2017


On 10/17/17 4:33 AM, ikod wrote:
> Hello, Steve
> 
> On Friday, 13 October 2017 at 22:22:54 UTC, Steven Schveighoffer wrote:
>> On 10/13/17 6:18 PM, ikod wrote:
>>> On Friday, 13 October 2017 at 19:17:54 UTC, Steven Schveighoffer wrote:
>>>>
>>>> Eventually, something like this will be possible with jsoniopipe (I 
>>>> need to update and release this too, it's probably broken with some 
>>>> of the changes I just put into iopipe). Hopefully combined with some 
>>>> sort of networking library you could process a JSON stream without 
>>>> reading the whole thing into memory.
>>>
>>> This can be done with requests. You can ask not to load whole content 
>>> in memory, but instead produce input range, which will continue to 
>>> load data from server when you will  be ready to consume:
>>>
>>>      auto rq = Request();
>>>      rq.useStreaming = true;
>>>      auto rs = rq.get("http://httpbin.org/image/jpeg");
>>>      auto stream = rs.receiveAsRange();
>>>      while(!stream.empty) {
>>>          // stream.front contain next data portion
>>>          writefln("Received %d bytes, total received %d from document 
>>> legth %d", stream.front.length, rq.contentReceived, rq.contentLength);
>>>          stream.popFront; // continue to load from server
>>>      }
>>
>> Very nice, I will add a component to iopipe that converts a 
>> "chunk-like" range like this into an iopipe source, as this is going 
>> to be needed to interface with existing libraries. I still will want 
>> to skip the middle man buffer at some point though :)
>>
>> Thanks!
>>
> 
> Just in order to have complete picture here - getContent returns not 
> just ubyte[], but more rich structure (which can be convered to ubyte[] 
> if needed). Basically it is an immutable(immutable(ubyte)[]) and almost 
> all data there are just data received from network without any data copy.

Right, iopipe can use it just fine, without copying, as all arrays are 
also iopipes. In that case, it skips allocating a buffer, because there 
is no need.

However, I prefer the need to avoid allocating the whole thing in 
memory, which is why I would prefer the range interface. However, in 
this case, iopipe needs to copy each chunk to its own buffer.

In terms of the most useful/least copying, direct access to the stream 
itself would be the best, which is why I said "skip the middle man". I 
feel like this won't be possible directly with requests and iopipe, 
because you need buffering to deal with parsing the headers. I think 
it's probably going to be a system built on top of iopipe, using its 
buffers, that would be the most optimal.

> There are more details and docs on 
> https://github.com/ikod/nbuff/blob/master/source/nbuff/buffer.d. Main 
> goal behind Buffer is to minimize data movement, but it also support 
> many range properties, as long as some internal optimized methods.

I will take a look when I get a chance, thanks.

-Steve


More information about the Digitalmars-d-learn mailing list