A few measurements of stat()'s speed
Vladimir Panteleev
thecybershadow.lists at gmail.com
Tue Mar 26 22:04:05 UTC 2019
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu
wrote:
> On a Linux moderately-loaded local directory (146 files)
> mounted from an SSD drive, one failed stat() takes only about
> 0.5 microseconds. That means e.g. if a module imports std.all
> (which fails 142 times), the overhead accountable to failed
> stat() calls is about 70 microseconds, i.e. negligible.
I have some related experience with this:
- The eternal battle of keeping The Server's load levels down
involves some deal of I/O profiling. The pertinent observation
was that opening a file by name can be much faster than
enumerating files in a directory. The reason for that is many
filesystems implementing directories using some variant of hash
table, with accessing a file by name being one hash table lookup,
while enumerating all files meaning reading the entire thing.
- stat() is slow. It fetches a lot of information. Many
filesystems do not have all of that information as readily
accessible as a file name. This is observable through a simple
test: on Ubuntu, drop caches, then, in a big directory, compare
the execution time of `ls|cat` vs. `ls`. Explanation: when ls's
output is a terminal, it will fetch extra information to colorize
objects depending on their properties. These are fetched using
stat(), but that's not done when it's piped into a file / another
program. I had to take this into account when implementing a fast
directory iterator [1] (stat only until necessary). dirEntries
from std.file does some of this too, but not to the full extent.
My suggestion is: if we are going to read the file if it exists,
don't even stat(), just open it. It might result in faster total
performance as a result.
I would not recommend tricks like readdir() and caching. This
ought to be done at the filesystem layer, and smells of problems
like TOCTOU / cache invalidation. In any case, I would not
suggest spending time on it unless someone encounters a specific,
real-life situation where the additional complexity would make it
worthwhile to research workarounds.
> So the question is whether many projects are likely to import
> files over network mounts, which would motivate the
> optimization. Please share your thoughts, thanks.
Honestly, this sounds like you have a solution in search of a
problem.
[1]:
https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740
More information about the Digitalmars-d
mailing list