[Dlang-internal] multi threading in dmd
Sebastian Wilzbach
seb at wilzba.ch
Fri Oct 11 18:52:59 UTC 2019
On 11/10/2019 14.49, Robert Schadek via Dlang-internal wrote:
> Compiling is IMHO is getting painfully slow with growing projects.
> One thing I'm working on is up to 30+ seconds, for 20k lines of somewhat
> heavy code.
> But lets not argue whether or not or not I'm doing it wrong, for the
> sake of
> the arguments lets assume compiling is slow.
Are you aware that the official release of dmd is built with dmd?
Compiling with LDC improves dmd about 2x as fast in my last tests (this
as without LTO, PGO and an older LLVM backend).
> One thing I see is that dub passes many files at once to dmd.
Dub has the same problem (built with dmd), but the semi-official
binaries that you can grab here are built with LDC:
https://github.com/dlang/dub/releases (and of course the ones shipped
with LDC).
> And dmd runs one thread on that input.
>
> I think there is some opportunity to start multiple threads to do at
> least some of
> the work in parallel.
Yes, but I don't think lexing is an important part here. It's too cheap.
> 1. Has anybody done any work on doing work in dmd with threads?
https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-am-forking-the-compiler/
> 2. Am I correct that in theory dmd should be able to lex all passed
> files in
> parallel (given enough cpu cores).
Yes, but lexing is __very__ cheap. Your performance problems come from
code with heavy templates + CTFE usage and other expensive semantics
check. Benchmark before you optimize!
> 3. Is it correct that currently one token is created at a time on
> request by the
> parser.
The parser generally calls nextToken(), but it can also ask for more
e.g. with peekNext2() or peekPastParen(tk). Though note that the entire
file is loaded into one buffer
(https://github.com/dlang/dmd/blob/7c90cf18cf2ff8bea7eb9aa372b09fc4870efe9e/src/dmd/dmodule.d#L560).
> 4. This would currently require the classes Identifier and StringTable
> be made
> thread safe
Lexing doesn't touch Identifer or StringTable. It simply slices the
string from the fully allocated blob (see e.g.
https://github.com/dlang/dmd/blob/7c90cf18cf2ff8bea7eb9aa372b09fc4870efe9e/src/dmd/lexer.d#L1657)
and new allocations are malloc and copied (see e.g.
https://github.com/dlang/dmd/blob/7c90cf18cf2ff8bea7eb9aa372b09fc4870efe9e/src/dmd/tokens.d#L736).
> 5. AsyncRead in mars.d is dead code?
Yes.
> 6. Is there any way to test all the different version statements and
> static if's
> used for the same purpose?
No.
> 7. Is there a change to parse all the initially given files in parallel?
No. I think Async changes were abandoned when it become apparent that it
the work/benefit ratio was low.
> 8. Any other ideas on how to do threading in dmd?
Do not focus on lexing. Focus on CTFE + templates.
You want to do the following:
- cache (e.g. https://github.com/dlang/dmd/pull/7843)
- even entire modules could be cached and loaded for subsequent runs
- be more lazy (i.e. DMD could be a lot more conservative)
- reduce DMD's memory comsumption (there are still many low-hanging fruits)
example:
https://github.com/dlang/dmd/pull/10396#issuecomment-531454363 or
https://github.com/dlang/dmd/pull/10427
- optimize DMD's CTFE + template code (there are still many low-hanging
fruits)
- example: https://github.com/dlang/dmd/pull/10395,
https://github.com/dlang/dmd/pull/10394 or even things like
https://github.com/dlang/dmd/pull/10391
- focus on running semantics in parallel (hard, but should be easier for
when working on independent modules)
Also, I recommend to look for real culprits (dub does come with a real
overhead too) or easy low hanging fruits. For example, on Linux DMD
could use mmaped files to speed-up file reading.
More information about the Dlang-internal
mailing list