Reading a structured binary file?

H. S. Teoh hsteoh at quickfur.ath.cx
Sat Aug 3 14:37:17 PDT 2013


On Sat, Aug 03, 2013 at 11:29:01PM +0200, monarch_dodra wrote:
> On Friday, 2 August 2013 at 23:51:27 UTC, H. S. Teoh wrote:
> >On Fri, Aug 02, 2013 at 06:38:20PM -0500, captaindet wrote:
> >[...]
> >>FWIW
> >>i have to deal with big data files that can be a few GB. for some
> >>data analysis software i wrote in C a while back i did some testing
> >>with caching and such. turns out that for Win7-64 the automatic
> >>caching done by the OS is really good and any attempt to speed
> >>things up actually slowed it down. no kidding, i have seen more than
> >>2GB of data being automatically cached. of course the system RAM
> >>must be larger than the file size (if i remember my tests correctly
> >>by a factor of ~2, but this is maybe not a linear relationship, i
> >>did not actually change the RAM just the size of the data file) and
> >>it will hold it in the cache only as long as there are no concurrent
> >>applications requiring RAM or caching. i guess my point is, if your
> >>target is Win7 and your files are >5x smaller than the installed RAM
> >>i would not bother at all trying to optimize file access. i suppose
> >>-nix machine will do a similar good job these days.
> >[...]
> >
> >IIRC, Linux has been caching files (or disk blocks, rather) in memory
> >since the days of Win95. Of course, memory in those days was much
> >scarcer, but file sizes were smaller too. :) There's still a cost to
> >copy the kernel buffers into userspace, though, which should not be
> >disregarded. But if you use mmap, then you're essentially accessing
> >that memory cache directly, which is as good as it gets.
> >
> >I don't know how well mmap works on windows, though, IIRC it doesn't
> >have the same semantics as Posix, so you could accidentally run into
> >performance issues by using it the wrong way on windows.
[...]
> I did some benching a while back with user bioinfornatics. He had to
> do some pretty large file reads, preferably in very little time.
> Observations showed my algo was *much* faster under windows then
> linux.

Sorry, I lost the context of this discussion, what algo are you
referring to?


> What we observed is that under windows, as soon as you open a file
> for reading, windows starts buffering the file in a parallel thread.
> 
> What we did was create two threads. The first did nothing but read
> the file, store it into chunks of memory, and then pass it to a
> worker thread. The worker thread did the parsing proper.
> 
> Doing this *halved* the linux runtime, tying it with the
> "monothreaded" windows run time. Windows saw no change.

Interesting. I wonder if you could, under Linux, mmap a file then have
one thread access the first byte of each file block while another thread
does the real work with the data.


> FYI, the full thread is here:
> forum.dlang.org/thread/gmfqwzgtjfnqiajghmsx at forum.dlang.org

I'll take a look, thanks.


T

-- 
The diminished 7th chord is the most flexible and fear-instilling chord. Use it often, use it unsparingly, to subdue your listeners into submission!


More information about the Digitalmars-d-learn mailing list