A few measurements of stat()'s speed
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Tue Mar 26 18:06:08 UTC 2019
The current process of searching for imports spans the following
directories:
* the current directory
* each of the paths specified in the cmdline with -I, in that order
* each of the paths specified in DFLAGS, in that order
For each of these paths, first the ".di" extension is tried, then the
".d" extension. The function used is stat(). For a majority of cases,
the ".di" files doesn't exist so at least 50% of stat() calls fail. The
number of failed stat() calls increases with the -I flags, i.e. with the
size of the project. (For std imports, that means each will be looked up
two times in each of the project directories.)
One alternative would be to use opendir()/readdir()/closedir() once for
each directory searched, and cache the directory's contents. Then,
subsequent attempts can simply look up the local cache and avoid stat()
calls in directories that have been previously visited. This approach
would accelerate imports if stat() is slow "enough".
On a Linux moderately-loaded local directory (146 files) mounted from an
SSD drive, one failed stat() takes only about 0.5 microseconds. That
means e.g. if a module imports std.all (which fails 142 times), the
overhead accountable to failed stat() calls is about 70 microseconds,
i.e. negligible.
The results change drastically when network mounts are tested. For sftp
and sshfs mounts on a high speed local connection, one failed stat()
takes 6-7 milliseconds, so an import like std.all (and many other
imports liable to transitively pull others) would cause significant
overheads.
So the question is whether many projects are likely to import files over
network mounts, which would motivate the optimization. Please share your
thoughts, thanks.
Andrei
More information about the Digitalmars-d
mailing list