Caching D compiler - preview version
Dmitry Olshansky
dmitry.olsh at gmail.com
Tue Oct 24 13:19:15 UTC 2017
What is dcache?
It's a patch for dmd that enables a *persistent* shared-memory
hash-map, protected by a spin-lock from races. Dmd processes with
-cache flag would detect the following pattern:
enum/static variable = func(args..);
And if mangle of func indicates it is from std.* we use a cache
to store D source code form of a result of function call (a
literal) produced by CTFE.
In action:
https://github.com/dlang/dmd/pull/7239
(Watch as 2.8s - 4.4s to compile various ctRegex programs becomes
constant ~1.0s.)
Caching is done per expression so it stays active even after you
change various parts of your files.
Broadening the scope to 3rd party libraries is planned but cache
invalidation is going to be tricky. Likewise there is a trove of
things aside from CTFE that can be easily cached and shared
across both parallel and sequental compiler invocations.
Why caching compiler?
It became apparent that CTFE computations could be quite
time-consuming and memory intensive. The fact that each CTFE
invocation depends on a set of constant arguments, makes it a
perfect candidate for caching.
Motivating example is ctRegex, patterns are hardly ever change
and std.library changes only on compiler upgrade, yet each
change to a file causes complete re-evaluation of all patterns in
a module.
With presistent per-expression cache we can precompile all of
CTFE evluations for regexes, so we get to use ctRegex and
maintain sane compile-times.
----
How to use
Pass new option to dmd:
-cache=mmap
This enables persistent cache using memory-mapped file.
Future backends would take the form of e.g.:
-cache=memcache:memcached.my.network:11211
----
Implementation
Caveats emptor: this is alpha version, use at your own risk!
https://github.com/DmitryOlshansky/dmd/tree/dcache
Keeping things simple - it's a patch of around 200 SLOCs.
I envision it becoming a hundred lines more if we get to do
things cleanly.
Instead of going with strangely popular idea of compilation
servers I opted for simple distributed cache, as it doesn't
require changing any of the build systems.
Shared memory mapping split in 3 sections: Metadata (spinlock) +
ToC (hash-table index) + Data (chunks)
For now it's an immutable cache w/o eviction.
A ToC entry is as follows:
hash(64-bit), data index, data size, last_recent_use
Indexes point to Data section of memory map.
Data itself is a linked list of blocks, where a header contains:
(isFree, next, 0-terminated key, padding to 16 bytes)
last_recent_use is a ts of the start of the respective
compilation. last_recent < now - 24h is considered unutilized
and may be reused.
In theory we can cache result of any compilation step with a
proper key and invalidation strategy.
1. Lexing - key is compiler-version + abs path + timestamp, store
as is. Lexing from cache is simply taking slices of memory.
2. Parsing to Ast - key is compiler-version + abs path +
timestamp + version/debug flags
3. CTFE invocations - key is tricky, for now only enabled for
std.* as follows:
enum/static varname = func(args...);
Use compiler-version + compiler-flags + mangleof + stringof args.
More information about the Digitalmars-d-announce
mailing list