randomIO, std.file, core.stdc.stdio

Tue Jul 26 13:01:17 PDT 2016

On 7/26/16 3:30 PM, Charles Hixson via Digitalmars-d-learn wrote:
> On 07/26/2016 11:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:

>> Now, C i/o's buffering may not suit your exact needs. So I don't know
>> how it will perform. You may want to consider mmap which tells the
>> kernel to link pages of memory directly to disk access. Then the
>> kernel is doing all the buffering for you. Phobos has support for it,
>> but it's pretty minimal from what I can see:
>> http://dlang.org/phobos/std_mmfile.html
>>
> I've considered mmapfile often, but when I read the documentation I end
> up realizing that I don't understand it.  So I look up memory mapped
> files in other places, and I still don't understand it.  It looks as if
> the entire file is stored in memory, which is not at all what I want,
> but I also can't really believe that's what's going on.

Of course that isn't what is happening :)

What happens is that the kernel says memory page 0x12345 (or whatever) 
is mapped to the file. Then when you access a mapped page, the system 
memory management unit gets a page fault (because that memory isn't 
loaded), which triggers the kernel to load that page of memory. Kernel 
sees that the memory is really mapped to that file, and loads the page 
from the file instead. As you write to the memory location, the page is 
marked dirty, and at some point, the kernel flushes that page back to disk.

Everything is done behind the scenes and is in tune with the filesystem 
itself, so you get a little extra benefit from that.

> I know that
> there was an early form of this in a version of BASIC (the version that
> RISS was written in, but I don't remember which version that was) and in
> *that* version array elements were read in as needed.  (It wasn't
> spectacularly efficient.)  But memory mapped files don't seem to work
> that way, because people keep talking about how efficient they are.  Do
> you know a good introductory tutorial?  I'm guessing that "window size"
> might refer to the number of bytes available, but what if you need to
> append to the file?  Etc.

To be honest, I'm not super familiar with actually using them, I just 
have a rough idea of how they work. The actual usage you will have to 
look up.

> A part of the problem is that I don't want this to be a process with an
> arbitrarily high memory use.

You should know that you can allocate as much memory as you want, as 
long as you have address space for it, and you won't actually map that 
to physical memory until you use it. So the management of the memory is 
done lazily, all supported by the MMU hardware. This is true for actual 
memory too!

Note that the only "memory" you are using for the mmaped file are page 
buffers in the kernel which are likely already being used to buffer the 
disk reads. It's not like it's loading the entire file into memory, and 
probably doesn't even load all sequential pages into memory. It only 
loads the ones you use.

I'm pretty much at my limit for knowledge of this subject (and maybe I 
have a few things incorrect), I'm sure others here know much more. I 
suggest you play a bit with it to see what the performance is like. I 
have also heard that it's very fast.

-Steve