iopipe alpha 0.0.1 version

Fri Oct 13 18:39:46 UTC 2017

On 10/13/17 11:59 AM, Martin Nowak wrote:
> On Thursday, 12 October 2017 at 04:22:01 UTC, Steven Schveighoffer wrote:
>> I added a tag for iopipe and added it to the dub registry so people 
>> can try it out.
>>
>> I didn't want to add it until I had fully documented and unittested it.
>>
>> http://code.dlang.org/packages/iopipe
>> https://github.com/schveiguy/iopipe
> 
> Great news to see continued work on this.
> 
> I'll just use this thread to get started on design discussions. If there 
> is there a better place for that, let me know ;).

This is as good a place as any :) I may create some issue reports on 
github to track things better.

> Questions/Ideas
> 
> - You can move docs out of the repo to fix search, e.g. by pushing them 
> to a `gh-pages` branch of your repo.

When I tried the search it seemed to work...

> See 
> https://github.com/MartinNowak/bloom/blob/736dc7a7ffcd2bbca7997f273a09e272e0484596/travis.sh#L13 
> for an automated setup using Travis-CI and ddox/scod.

I admit complete ignorance on this, I need to look into it, but at the 
moment, I'm OK with committing the generated docs directly as an ugly 
extra step. When I looked at the options under adding a "pages" piece 
for the project that if I put things under "docs" directory, it could 
use that, so that's what I went with.

> - Standard device implementation?
> 
>    You library already has the notion of devices as thin abstractions 
> over file/socket handles.
>    Should we start with such an unbuffered IO library as foundation 
> including support hooks for Fiber based event loops. Something along the 
> lines of https://code.dlang.org/packages/io? Without a standard device 
> lib, IOPipe could not be used in APIs.

I absolutely think this would be a great idea. In fact, you could use 
Jason White's io package with iopipes directly, as his low-level types 
have the necessary read function: 
https://github.com/jasonwhite/io/blob/master/source/io/file/stream.d#L335

Perhaps we could coax the basic types out of that library to provide a 
base for both iopipe and his high-level stuff. The stream portion of my 
library is really just a throwaway piece that is not a focus of the 
library. Indeed, I created it because unbuffered stream types didn't 
exist anywhere (the IODev type predates iopipe, as it was part of my 
original attempt to rewrite Phobos io).

> - What's the plan for @safe buffer/window invalidation, right now you're 
> handing out raw access to internal buffers with an inherent memory 
> safety problem.

I don't plan to put any restrictions on this. In fact the core purpose 
of iopipe is to give raw buffer access to aid in writing higher-level 
routines around it. As I said here: 
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/buffer.d#L217

If the Allocator supports deallocation I call it, but it may not be the 
correct thing to do. There is a sticky point in 
std.experiemental.allocator: the GC allocator defines deallocate, 
because it's available, but the *presence* of that member may be taken 
to mean you have to call it to deallocate. There is no member saying 
whether deallocation is optional.

In my wrapper GCNoPointerAllocator (which I needed to support allocating 
ubyte buffers without having to scan them), I leave out the deallocate 
function, so technically it's @safe with that allocator.

I will say though, at some point, I'm going to focus on making @safe as 
much as possible in iopipe. That may require using the GC for buffering.

> 
>    ```d
>    auto w = f.window();
>    f.extend(random());
>    w[0]; // ⚡ dangling pointer ⚡
>    ```
> 
>    I can see how the compiler could catch that if we'd go with 
> compile-time enforced safety for RC and friends. But that's still 
> unclear atm. and we might end up with a runtime RC/weak ptr mechanism 
> instead, which wouldn't be too good a fit for that window mechanism.

What would be nice is a mechanism to detect this situation, since the 
above is both un- at safe and incorrect code.

Possibly you could instrument a window with a mechanism to check to see 
if it's still correct on every access, to be used when compiled in 
non-release mode for checking program correctness.

But in terms of @safe code in release mode, I think the only option is 
really to rely on the GC or reference counting to allow the window to 
still exist.

> 
> - What about the principle that the caller should choose 
> allocation/ownership?

It can, BufferManager takes an Allocator compile-time option.

It's also possible to create your own ownership or allocation scheme as 
long as you implement the required iopipe methods.

>    Having an extend methods means the IOPipe is responsible for 
> growing/allocating buffers, so you'll end up with IOPipeMalloc, 
> IOPipeGC, IOPipeAllocatorGrowExp (or their template alternatives), not 
> very nice for APIs.

extend is a core part of the iopipe system. The point of the library is 
that you don't have to manage the buffering or allocation of your 
higher-level code in terms of memory ownership or allocation. I've used 
so many buffered streams where I have to still create my own buffer 
because of a quirk in the way I have to process the data doesn't fit the 
API of the stream. This mitigates that by giving you direct control over 
how much data should be buffered, but not burdening you with the details 
of managing that memory. The mechanism was clear to me in Dmitry 
Olshansky's simple back-reference toy library that he made a while back 
(and actually was the inspiration for making iopipe instead of what I 
was doing before).

I can't find his library any more, but here is the post he made:

https://forum.dlang.org/post/l9q66g$2he3$1@digitalmars.com

> 
> - Why continuous memory? The current implementations reallocs and even 
> weirder memmoves data in extend.
> https://github.com/schveiguy/iopipe/blob/3589a4c9fc72b844eb4efd3ae718773faf9ab9ed/source/iopipe/buffer.d#L171 
> 
>    Shouldn't a modern IO library be as zero-copy as possible?
>    The docs say random access, that should be supported by ringbuffers 
> or lists/arrays of buffers. Any plans towards that direction?

Yes and no :)

My original idea was that once I got simple array buffers working, I 
would move on to circular buffers, and linked lists of buffers, etc, 
with all the details hidden by the range itself. I still might implement 
this. Windows and Posix support the notion of scatter read so you can 
easily implement a way for streams to fit perfectly on top of these things.

But what I realized is that in practice (and especially when battling to 
beat Phobos byLine and libc's getline), avoiding copying may not be as 
important as I thought. For one thing, the focused data (the data you 
care about currently) is generally much smaller than the real buffer 
size. So when it is calling memmove, you are generally only moving a 
tiny piece of the buffer.

Second, the CPU is really good at dealing with arrays (and searching 
through arrays), especially when dereferencing data.

Third, every single access to a non-array is going to have to go through 
some mechanism to check which actual array the index falls into. When 
implementing iopipe's byline, I got a SIGNIFICANT speedup by copying 
members of the ByLine struct (e.g. the dchar being searched for) into a 
local variable. If you have a custom range for a circular buffer whose 
division point has to be read on every element index, the penalties are 
going to add up.

The trade-offs might still be worth it. For instance if your focused 
data is a larger percentage of the total buffer (like 70%), moving it to 
the front of the buffer is going to hurt performance. I don't know 
whether it would overcome slower access per element. The good news is, I 
can implement it, and see how it fares, since the higher level code is 
abstracted to the buffer type.

And of course, any existing (non-infinite) random-access range can be 
hooked as a non-extendable iopipe (see how arrays are hooked).

Thanks for all your thoughts on this, Martin!

-Steve