randomIO, std.file, core.stdc.stdio
Charles Hixson via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Tue Jul 26 12:30:35 PDT 2016
On 07/26/2016 11:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:
> On 7/26/16 1:57 PM, Charles Hixson via Digitalmars-d-learn wrote:
>
>> Thanks. Since there isn't any excess overhead I guess I'll use stdio.
>> Buffering, however, isn't going to help at all since I'm doing
>> randomIO. I know that most of the data the system reads from disk is
>> going to end up getting thrown away, since my records will generally be
>> smaller than 8K, but there's no help for that.
>>
>
> Even for doing random I/O buffering is helpful. It depends on the size
> of your items.
>
> Essentially, to read 10 bytes from a file probably costs the same as
> reading 100,000 bytes from a file. So may as well buffer that in case
> you need it.
>
> Now, C i/o's buffering may not suit your exact needs. So I don't know
> how it will perform. You may want to consider mmap which tells the
> kernel to link pages of memory directly to disk access. Then the
> kernel is doing all the buffering for you. Phobos has support for it,
> but it's pretty minimal from what I can see:
> http://dlang.org/phobos/std_mmfile.html
>
> -Steve
I've considered mmapfile often, but when I read the documentation I end
up realizing that I don't understand it. So I look up memory mapped
files in other places, and I still don't understand it. It looks as if
the entire file is stored in memory, which is not at all what I want,
but I also can't really believe that's what's going on. I know that
there was an early form of this in a version of BASIC (the version that
RISS was written in, but I don't remember which version that was) and in
*that* version array elements were read in as needed. (It wasn't
spectacularly efficient.) But memory mapped files don't seem to work
that way, because people keep talking about how efficient they are. Do
you know a good introductory tutorial? I'm guessing that "window size"
might refer to the number of bytes available, but what if you need to
append to the file? Etc.
A part of the problem is that I don't want this to be a process with an
arbitrarily high memory use. Buffering would be fine, if I could use
it, but for my purposes sequential access is likely to be rare, and the
working layout of the data in RAM doesn't (can't reasonably) match the
layout on disk. IIUC (this is a few decades old) the system buffer size
is about 8K. I expect to never need to read that large a chunk, but I'm
going to try to keep the chunks in multiples of 1024 bytes, and if it's
reasonable to exactly 1024 bytes. So I should never need two reads or
writes for a chunk. I guess to be sure of this I'd better make sure the
file header is also 1024 bytes. (I'm guessing that the seek to position
results in the appropriate buffer being read into the system buffer, so
if my header were 512 bytes I might occasionally need to do double reads
or writes.)
I'm guessing that memory mapped files trade off memory use against speed
of access, and for my purposes that's probably a bad trade, even though
databases are doing that more and more. I'm likely to need all the
memory I can lay my hands on, and even then thrashing wouldn't surprise
me. So a fixed buffer size seems a huge advantage.
More information about the Digitalmars-d-learn
mailing list