A few measurements of stat()'s speed

Tue Mar 26 18:06:08 UTC 2019

The current process of searching for imports spans the following 
directories:

* the current directory
* each of the paths specified in the cmdline with -I, in that order
* each of the paths specified in DFLAGS, in that order

For each of these paths, first the ".di" extension is tried, then the 
".d" extension. The function used is stat(). For a majority of cases, 
the ".di" files doesn't exist so at least 50% of stat() calls fail. The 
number of failed stat() calls increases with the -I flags, i.e. with the 
size of the project. (For std imports, that means each will be looked up 
two times in each of the project directories.)

One alternative would be to use opendir()/readdir()/closedir() once for 
each directory searched, and cache the directory's contents. Then, 
subsequent attempts can simply look up the local cache and avoid stat() 
calls in directories that have been previously visited. This approach 
would accelerate imports if stat() is slow "enough".

On a Linux moderately-loaded local directory (146 files) mounted from an 
SSD drive, one failed stat() takes only about 0.5 microseconds. That 
means e.g. if a module imports std.all (which fails 142 times), the 
overhead accountable to failed stat() calls is about 70 microseconds, 
i.e. negligible.

The results change drastically when network mounts are tested. For sftp 
and sshfs mounts on a high speed local connection, one failed stat() 
takes 6-7 milliseconds, so an import like std.all (and many other 
imports liable to transitively pull others) would cause significant 
overheads.

So the question is whether many projects are likely to import files over 
network mounts, which would motivate the optimization. Please share your 
thoughts, thanks.

Andrei