randomIO, std.file, core.stdc.stdio

Tue Jul 26 12:30:35 PDT 2016

On 07/26/2016 11:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:
> On 7/26/16 1:57 PM, Charles Hixson via Digitalmars-d-learn wrote:
>
>> Thanks.  Since there isn't any excess overhead I guess I'll use stdio.
>> Buffering, however, isn't going to help at all since I'm doing
>> randomIO.  I know that most of the data the system reads from disk is
>> going to end up getting thrown away, since my records will generally be
>> smaller than 8K, but there's no help for that.
>>
>
> Even for doing random I/O buffering is helpful. It depends on the size 
> of your items.
>
> Essentially, to read 10 bytes from a file probably costs the same as 
> reading 100,000 bytes from a file. So may as well buffer that in case 
> you need it.
>
> Now, C i/o's buffering may not suit your exact needs. So I don't know 
> how it will perform. You may want to consider mmap which tells the 
> kernel to link pages of memory directly to disk access. Then the 
> kernel is doing all the buffering for you. Phobos has support for it, 
> but it's pretty minimal from what I can see: 
> http://dlang.org/phobos/std_mmfile.html
>
> -Steve
I've considered mmapfile often, but when I read the documentation I end 
up realizing that I don't understand it.  So I look up memory mapped 
files in other places, and I still don't understand it.  It looks as if 
the entire file is stored in memory, which is not at all what I want, 
but I also can't really believe that's what's going on.  I know that 
there was an early form of this in a version of BASIC (the version that 
RISS was written in, but I don't remember which version that was) and in 
*that* version array elements were read in as needed.  (It wasn't 
spectacularly efficient.)  But memory mapped files don't seem to work 
that way, because people keep talking about how efficient they are.  Do 
you know a good introductory tutorial?  I'm guessing that "window size" 
might refer to the number of bytes available, but what if you need to 
append to the file?  Etc.

A part of the problem is that I don't want this to be a process with an 
arbitrarily high memory use.  Buffering would be fine, if I could use 
it, but for my purposes sequential access is likely to be rare, and the 
working layout of the data in RAM doesn't (can't reasonably) match the 
layout on disk.  IIUC (this is a few decades old) the system buffer size 
is about 8K.  I expect to never need to read that large a chunk, but I'm 
going to try to keep the chunks in multiples of 1024 bytes, and if it's 
reasonable to exactly 1024 bytes.  So I should never need two reads or 
writes for a chunk.  I guess to be sure of this I'd better make sure the 
file header is also 1024 bytes.  (I'm guessing that the seek to position 
results in the appropriate buffer being read into the system buffer, so 
if my header were 512 bytes I might occasionally need to do double reads 
or writes.)

I'm guessing that memory mapped files trade off memory use against speed 
of access, and for my purposes that's probably a bad trade, even though 
databases are doing that more and more.  I'm likely to need all the 
memory I can lay my hands on, and even then thrashing wouldn't surprise 
me.  So a fixed buffer size seems a huge advantage.