iopipe v0.0.4 - RingBuffers!

Fri May 11 23:46:16 UTC 2018

On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer 
wrote:
> On 5/11/18 1:30 AM, Dmitry Olshansky wrote:
>> On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer 
>> wrote:
>>> OK, so at dconf I spoke with a few very smart guys about how 
>>> I can use mmap to make a zero-copy buffer. And I implemented 
>>> this on the plane ride home.
>>>
>>> However, I am struggling to find a use case for this that 
>>> showcases why you would want to use it. While it does work, 
>>> and works beautifully, it doesn't show any measurable 
>>> difference vs. the array allocated buffer that copies data 
>>> when it needs to extend.
>> 
>> I’d start with something clinicaly synthetic.
>> Say your record size is exactly half of buffer + 1 byte. If 
>> you were to extend the size of buffer, it would amortize.
>
> Hm.. this wouldn't work, because the idea is to keep some of 
> the buffer full. What will happen here is that the buffer will 
> extend to be able to accomodate the extra byte, and then you 
> are back to having less of the buffer full at once. Iopipe is 
> not afraid to increase the buffer :)

Then you cannot test it in such way.

>
>> 
>> Basically:
>> 16 Mb buffer fixed
>> vs
>> 16 Mb mmap-ed ring
>> 
>> Where you read pieces in 8M+1 blocks.Yes, we are aiming to 
>> blow the CPU cache there. Otherwise CPU cache is so fast that 
>> ocasional copy is zilch, once we hit primary memory it’s not. 
>> Adjust sizes for your CPU.
>
> This isn't how it will work. The system looks at the buffer and 
> says "oh, I can just read 8MB - 1 byte," which gives you 2 
> bytes less than you need. Then you need the extra 2 bytes, so 
> it will increase the buffer to hold at least 2 records.
>
> I do get the point of having to go outside the cache. I'll look 
> and see if maybe specifying a 1000 line context helps ;)

Nope. Consider reading binary records where you know length in 
advance and skip over it w/o need to touch every byte. There it 
might help. If you touch every byte and do something the cost of 
copying the tail is zilch.

One example is net string which is:

13,Hello, world!

Basically length in ascii digits ‘,’ followed by tgat much UTF-8 
codeunits.
No decoding nessary.

Torrent files use that I think, maybe other files. Is a nice 
example that avoids scans to find delimiters.

>
> Update: nope, still pretty much the same.
>
>> The amount of work done per byte though has to be minimal to 
>> actually see anything.
>
> Right, this is another part of the problem -- if copying is so 
> rare compared to the other operations, then the difference is 
> going to be lost in the noise.
>
> What I have learned here is:
>
> 1. Ring buffers are really cool (I still love how it works) and 
> perform as well as normal buffers

This is also good. Normal ring buffers usually suck  in speed 
department.

> 2. The use cases are much smaller than I thought
> 3. In most real-world applications, they are a wash, and not 
> worth the OS tricks needed to use it.
> 4. iopipe makes testing with a different kind of buffer really 
> easy, which was one of my original goals. So I'm glad that 
> works!
>
> I'm going to (obviously) leave them there, hoping that someone 
> finds a good use case, but I can say that my extreme excitement 
> at getting it to work was depressed quite a bit when I found it 
> didn't really gain much in terms of performance for the use 
> cases I have been doing.
>> Should be mostly trivial in fact. I mean our first designs for 
>> IOpipe is where I wanted regex to work with it.
>> 
>> Basically - if we started a match, extend window until we get 
>> it or lose it. Then release up to the next point of potential 
>> start.
>
> I'm thinking it's even simpler than that. All matches are dead 
> on a line break (it's how grep normally works), so you simply 
> have to parse the lines and run each one via regex. What I 
> don't know is how much it costs regex to startup and run on an 
> individual line.

It is malloc/free/addRange/removeRange for each call. I optimized 
2.080 to reuse last recently used engine w/o these costs but I’ll 
have to check if it covers all cases.

>
> One thing I could do to amortize is keep 2N lines in the 
> buffer, and run the regex on a whole context's worth of lines, 
> then dump them all.

I believe integrating iopipe awareness it in regex will easily 
make it 50% faster. A guestimate though.

>
> I don't get why grep is so bad at this, since it is supposedly

grep on Mac is a piece of sheat, sadly and I don’t know why 
exactly (too old?). Use some 3-rd party thing like ‘sift’ written 
in Go.

>
> -Steve