[OT] Re: How to read fastly files ( I/O operation)

Wed Dec 18 14:17:40 PST 2013

On Friday, 8 February 2013 at 06:22:18 UTC, Denis Shelomovskij 
wrote:
> 06.02.2013 19:40, bioinfornatics пишет:
>> On Wednesday, 6 February 2013 at 13:20:58 UTC, bioinfornatics 
>> wrote:
>> I agree the spec format is really bad but it is heavily used 
>> in biology
>> so i would like a fast parser to develop some D application 
>> instead to
>> use C++.
>
> Yes, lets also create 1 GiB XML files and ask for fast 
> encoding/decoding!
>
> The situation can be improved only if:
> 1. We will find and kill every text format creator;
> 2. We will create a really good binary format for each such 
> task and support it in every application we create. So after 
> some time text formats will just die because of evolution as 
> everything will support better formats.
>
> (the second proposal is a real recommendation)

There is a binary resource format for emf models, which normally 
use xml files, and some timing improvements stated at this link.  
It might be worth looking at this if you are thinking about 
writing your own binary format.
http://www.slideshare.net/kenn.hussey/performance-and-extensibility-with-emf

There is also a fast binary compression library named blosc that 
is used in some python utilities, measured and presented here, 
showing that it is faster than doing a memcpy if you have 
multiple cores.
http://blosc.pytables.org/trac

On the sequential accesses ... I found that windows writes blocks 
of data all over the place, but the best way to get it to write 
something in more contiguous locations is to modify the file 
output routines to use specify write through.  The sequential 
accesses didn't improve read times on ssd.

Most of the decent ssds can read big files at 300MB/sec or more 
now, and you can raid 0 a few of them and read 800MB/sec.