Reading a structured binary file?
monarch_dodra
monarchdodra at gmail.com
Sat Aug 3 14:29:01 PDT 2013
On Friday, 2 August 2013 at 23:51:27 UTC, H. S. Teoh wrote:
> On Fri, Aug 02, 2013 at 06:38:20PM -0500, captaindet wrote:
> [...]
>> FWIW
>> i have to deal with big data files that can be a few GB. for
>> some data
>> analysis software i wrote in C a while back i did some testing
>> with
>> caching and such. turns out that for Win7-64 the automatic
>> caching
>> done by the OS is really good and any attempt to speed things
>> up
>> actually slowed it down. no kidding, i have seen more than 2GB
>> of data
>> being automatically cached. of course the system RAM must be
>> larger
>> than the file size (if i remember my tests correctly by a
>> factor of
>> ~2, but this is maybe not a linear relationship, i did not
>> actually
>> change the RAM just the size of the data file) and it will
>> hold it in
>> the cache only as long as there are no concurrent applications
>> requiring RAM or caching. i guess my point is, if your target
>> is Win7
>> and your files are >5x smaller than the installed RAM i would
>> not
>> bother at all trying to optimize file access. i suppose -nix
>> machine
>> will do a similar good job these days.
> [...]
>
> IIRC, Linux has been caching files (or disk blocks, rather) in
> memory
> since the days of Win95. Of course, memory in those days was
> much
> scarcer, but file sizes were smaller too. :) There's still a
> cost to
> copy the kernel buffers into userspace, though, which should
> not be
> disregarded. But if you use mmap, then you're essentially
> accessing that
> memory cache directly, which is as good as it gets.
>
> I don't know how well mmap works on windows, though, IIRC it
> doesn't
> have the same semantics as Posix, so you could accidentally run
> into
> performance issues by using it the wrong way on windows.
>
>
> T
I did some benching a while back with user bioinfornatics. He had
to do some pretty large file reads, preferably in very little
time. Observations showed my algo was *much* faster under windows
then linux.
What we observed is that under windows, as soon as you open a
file for reading, windows starts buffering the file in a parallel
thread.
What we did was create two threads. The first did nothing but
read the file, store it into chunks of memory, and then pass it
to a worker thread. The worker thread did the parsing proper.
Doing this *halved* the linux runtime, tying it with the
"monothreaded" windows run time. Windows saw no change.
FYI, the full thread is here:
forum.dlang.org/thread/gmfqwzgtjfnqiajghmsx at forum.dlang.org
More information about the Digitalmars-d-learn
mailing list