unzip parallel, 3x faster than 7zip

Jay Norwood jayn at prismnet.com
Sat Apr 7 10:08:31 PDT 2012


On Saturday, 7 April 2012 at 11:41:41 UTC, Rainer Schuetze wrote:
  >
> Maybe it is the trim command being executed on the sectors 
> previously occupied by the file.
>

No, perhaps I didn't make it clear that the rmdir slowness is 
only an issue on hard drives.  I can unzip the 2GB archive in 
about 17.5 sec on the ssd drive, and delete it using the rmd 
multi-thread delete example program in less than 17 secs on the 
ssd drive.   The same operations on a hard drive take around 60 
seconds to extract, but 1.5 to 3 minutes to delete.

H:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 17405 ms

H:\>rmd tz
removing: .\tz
finished! time:16671 ms


I've been doing some reading on the web and studying the procmon 
logs. I am convinced the slow hard drive delete is an issue with 
seek times, since it is not an issue on the ssd.  It may be 
caused by fragmentation of the stored data or the mft itself, or 
else it could be that ntfs is doing some book-keeping journaling. 
  You are right that it could be doing delete notifications to any 
application watching the disk activity.  I've already turned off 
the virus checker and the indexing, but I'm going to try the 
tweaks in the second link and also try the  mydefrag program in 
the third link and see if anything improves the hd delete times.


http://ixbtlabs.com/articles/ntfs/index3.html
http://www.gilsmethod.com/speed-up-vista-with-these-simple-ntfs-tweaks
http://www.mydefrag.com/index.html


That mydefrag has some interesting ideas about sorting folders by 
full pathname on the disk as one of the defrag algorithms.  
Perhaps using it, and also using  unzip and zip algorithms that 
match the defrag algorithm, would be a nice combination.  In 
other words, if the zip algorithm processes the files in a 
sorted-by-pathname order, and if the defrag algorithm has created 
folders that are sorted on disk by the same order, then you would 
expect optimally short seeks while processing the files in the 
order they are stored.

The mydefrag program uses the ntfs defrag api.  There is an 
article at the following link showing how to access it to get the 
Logical Cluster Numbers on disk for a file.  I suppose you could 
sort your file operations  by start LCN, of the file, for example 
during compression, and that might reduce the seek related delays.

http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx





More information about the Digitalmars-d-announce mailing list