xdc: A hypothetical D cross-compiler and AST manipulation tool.
Etienne
etcimon at gmail.com
Fri Nov 8 20:46:08 PST 2013
Many vendors would have their processors supported in D if we had
a D to C compiler. I feel like it would be simpler than going for
native code directly. Did this idea follow-through?
On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
> I'd like to present my vision for a new D compiler. I call it
> xdc, a loose abbreviation for "Cross D Compiler" (if confused,
> see
> http://english.stackexchange.com/questions/37394/why-do-some-words-have-x-as-a-substitute).
> It could also mean other fun things like "Crossbone D
> Compiler" (I imagine a logo with some crossbones having a metal
> D atop where the skull normally goes), "eXperimental D
> Compiler", or something sounding like "ectasy" ;)
>
> We usually think of language features as being rewritten into
> simpler features. The simple features eventually get rewritten
> into machine instructions. Compilers are, fundamentally,
> responsible for performing "lowering" operations.
>
> It makes sense to me, then, to make a compiler whose internals
> /look/ like a bunch of these rewrites and lowering operations.
> There should be a bunch of input patterns matched to the
> desired results. This has the happy side-effect of giving us a
> pleasant way to do AST manipulations from within D code.
>
> I've also had a long-standing desire to see D on many more
> platforms. It should make an appearance on console platforms
> and on smartphones. I've tried doing this with a retargetable
> compiler like GCC before, and the work was surprisingly large.
> Even if the compiler already emits code for the target system's
> CPU, there are still a large number of details involving
> calling conventions, executable format, and any number of
> CPU/OS specific tweaks to object file output. It makes a lot
> more sense to me to just output C/C++ code and feed that to a
> native toolchain. That would skip a lot of the
> platform-specific nonsense that creates a barrier to entry for
> people who, say, just want to write a simple game for
> Android/iPhone/PS(3|4)/etc in D, and don't want to become
> compiler experts first. Ideally, some day, this compiler would
> also emit code or bytecode for Javascript, AS3/Flash, Java, and
> any other popular targets that the market desires. This can
> probably be done with DMD, but I'd like to make the process
> more approachable, and make backend authoring as easy as
> possible. It should be possible (and easy) to tell the
> compiler exactly what lowerings should be applied before the
> AST hits the backend.
>
> xdc should bring all of that cross-platform targeting together
> with a compiler infrastructure that can blow everything else
> away (I hope!).
>
> xdc is my dream for a D compiler that gives us our first step
> (of few) towards having what haXe has already
> (http://haxe.org/) : a compiler that can land code just about
> anywhere.
>
> What follows is a collection of my thoughts written down as
> notes.
>
> == Ideal Outcomes ==
>
> .- D to C/C++ compiler that can easily reach target platforms
> that are
> . currently either unsupported or poorly supported by
> current D
> . compilers.
> . - Useful for game developers wishing to write D code on the
> . next big console platform.
> . - Useful for embedded systems developers wishing to write D
> code
> . on obscure or potentially proprietary microcontrollers.
>
> .- Other backends (ex: llvm, Java bytecode, AS3/Flash bytecode,
> etc)
> . possible in the future. Community desires considered when
> . selecting new backend targets.
>
> .- Interpreter backend: a notable backend that would be
> implemented as
> . a necessity for making CTFE work. A dedicated interpreter
> . backend would hopefully be much faster and easier on
> memory than
> . DMD's souped-up constant folder. (Even though DMD has
> become
> . considerably better at this in the past year or two.)
>
> .- Abstract Syntax Tree (AST) manipulation toolchain, possibly
> usable
> . in CTFE. It would work like this:
> . (1) Pass some D code or Domain Specific Language (DSL) of
> your
> . choice (as text) into xdc at compile-time.
> . (2) xdc returns an AST.
> . (3) Use xdc's pattern-matching and substitution DSL to
> . manipulate the AST.
> . (4) xdc consumes the AST and emits modified D code.
> . (5) mixin(...) the result.
> . - If xdc is the compiler used to execute itself in CTFE,
> then
> . it might be possible to optimize this by having it
> expose
> . itself as a set of intrinsics.
>
> .- Reference counting available by default on all platforms.
> . - Gets you into the action with minimal effort and little
> or no
> . compiler hacking. (More complete GC tends to require
> platform
> . specific ASM and/or operating system API support).
>
> .- Full garbage collection available if supported.
> . - Ex: The C backends would default to ref-counting until
> the ASM
> . and OS-level code is written to support full GC.
> . - Ex: A Java backend would probably use the Java JVM by
> default.
>
> .- Threading model determined by compiler configuration or
> available
> . platform hints.
> . - Ex: The user may have a posix-threads implementation
> available,
> . but know little other details about the target system.
> It
> . should be possible for xdc to use pthreads to emulate
> the
> . TLS and synchronization mechanisms needed to make D
> tick.
> . (Or at least emulate as many features as possible.)
> . - Ex: Possible "no threading" target for people who don't
> need
> . threading but DO need other D features to be available
> NOW
> . on an alien platform. Errors when the source code
> passed
> . into xdc assumes that threading features are present.
>
> .- D compiler that is easy to hack on.
> . - "Looks like the problem it solves."
> . (To quote Walter's DConf2013 keynote.)
> . - Made of a bunch of patterns that describe
> . code rewrites/lowerings.
> . - Few or no null value checks necessary.
> . - null checks don't look like pattern matching or
> lowering.
> . - Few or no convoluted if-while-if-for-etc nests.
> . - These also don't look like pattern matching or
> lowering.
> . - It should be largely made of "pattern handlers" (see
> below).
> . - Each pattern handler will have one part that closely
> resembles
> . the AST fragment for the D code that it recognizes, and
> . another part that resembles the lowered form that it
> outputs.
> . - Dependency analysis that prevents your AST manipulation
> from
> . happening either too early or too late.
> . - Because the code that actually does lowerings is
> generated from
> . a DSL, it is possible to make it automate a lot of
> tedious
> . tasks, like updating the symbol table when nodes are
> added or
> . removed from the AST.
> . - This makes it easier to test experimental features.
>
> .- A step-by-step view of what the compiler is doing to your
> code.
> . - Since the semantic analysis of xdc would be composed of
> . "pattern handlers" (see below), then each time one of
> them
> . completes the compiler could output the result of calling
> . .toString() (or .toDCode() or whatever) on the entire
> AST.
> . - This could be attached to an ncurses interface that would
> be
> . activated by passing a flag to the compiler, which would
> then
> . proceed to show the AST at every stage of compilation.
> . Press ENTER to see the next step, etc.
> . - This could also be exposed as API functionality that IDEs
> could
> . use to show developers how the compiler sees their code.
>
> .- D code analysis engine that might be usable to automatically
> . translate D1 code into D2 code, or maybe D2 into D3 in the
> far
> . future.
>
> == Architectural Overview ==
>
> .- xdc will largely consist of "pattern handlers" that recognize
> . patterns in its AST and replace them with AST fragments
> that
> . contain successively fewer high-level features (lowering).
> . - These pattern handlers would feature a DSL that should
> make
> . the whole task fairly easy.
> . - The DSL proposed would be similar to regular expressions
> in
> . semantics but different in syntax.
> . - It will have operators for choice, repetition, optional
> . matches, capturing, and so on.
> . - The DSL must support nested structures well.
> . - The DSL must support vertical layout of patterns well.
> . - Because of the vertical patterns, most operators will
> either
> . be prefix or will be written in block style:
> . some_block_header { block_stmt1; block_stmt2; etc; }
> . - Actions like entering and leaving nodes are given
> their own
> . special syntax. The machine will treat them like
> tokens
> . that can be matched the same as any AST node.
> Notably,
> . node-entry and node-exit do not require introducing
> . non-regular elements to the DSL. node-entry and
> node-exit
> . may be subsumed into Deterministic Finite Automatons
> (DFAs).
> . - An example pattern handler might look like this:
>
> const lowerWhileStatement =
> {
> // Apologies in advance if this isn't actually valid D code:
> // This is a design sketch and I currently don't have a way
> to compile it.
> //
> // The Pattern template, PatternMatch template, and
> PatternHandler class
> // have not yet been written. This is an example of how I
> might expect
> // them to be used.
> //
>
> auto consumes = "while_statement";
> auto produces = "if_statement","goto","label");
>
> auto recognizer = Pattern!
> "WhileStatement has
> {
> // Capture the conditional expression (call it \"expr\") and
> // capture the loop body (call it \"statement\").
> .expression $expr;
> .statement $statement has
> {
> // Capture any continue/break statements.
> any_amount_of {
> any_amount_of .; // Same as .* in regexes.
> one_of
> {
> ContinueStatement $continues;
> BreakStatement $breaks;
> }
> }
> any_amount_of .;
> }
> }";
>
> auto action = (PatternMatch!(recognizer) m)
> {
> m.captures.add("uniqLoopAgain",
> getUniqueLabel(syntaxNode.enclosingScope))
> m.captures.add("uniqExitLoop",
> getUniqueLabel(syntaxNode.enclosingScope))
>
> // The "recognizes" clause defines m.getCapture!"continues"
> with:
> // "ContinueStatement $continues;"
> // That line appears in a repitition context
> ("any_amount_of") and is
> // therefore typed as an array.
> foreach( ref node; m.getCapture!"continues" )
> node.replaceWith( m, "GotoStatement has $uniqLoopAgain" )
>
> // Ditto for m.getCapture!"breaks" and "BreakStatement
> $breaks;".
> foreach( ref node; m.getCapture!"breaks" )
> node.replaceWith( m, "GotoStatement has $uniqExitLoop" )
> };
>
> auto synthesizer = Pattern!
> "Label has $uniqLoopAgain
> IfStatement has
> {
> OpNegate has $expr
> GotoStatement has $uniqExitLoop
> }
> $statement
> GotoStatement has $uniqLoopAgain
> Label has $uniqExitLoop
> ";
>
> return new PatternHandler(produces, consumes, recognizer,
> action, synthesizer);
> };
>
> (Also available at: http://pastebin.com/0mBQxhLs )
>
> .- Dispatch to pattern handlers is performed by the execution
> of a
> . DFA/Packrat hybrid instead of the traditional OOP
> inheritance
> . with method calls.
> . - Each pattern handler's recognizer gets treated like a
> regex
> . or Parsing Expression Grammar (PEG) fragment.
> . - All of the recognizers in the same semantic pass are
> pasted
> . together in an ordered-choice expression. The ordering
> is
> . determined by dependency analysis.
> . - A recognizer's pattern handler is invoked when the
> recognizer's
> . AST expression is matched.
> . - Once any substitutions are completed, then the machine
> executing
> . the pattern engine will set its cursor to the beginning
> of
> . the newly substituted AST nodes and continue running.
> . - Executing potentially hundreds of pattern handlers in a
> single
> . ordered-choice expression would be obnoxious for a
> packrat
> . parser (packrat machine?). Thankfully, ordered-choice
> is
> . possible in regular grammars, so it can be lowered into
> regex
> . operations and the whole thing turned into a DFA.
> . - If pattern recognizers end up needing recursive elements,
> . then they will probably not appear at the very
> beginning of
> . the pattern. Patterns with enough regular elements at
> the
> . start will be able to merge those regular elements into
> the
> . DFA with the rest of the pattern recognizers, and it all
> . becomes very fast table lookups in small tables.
>
> .- This compiler would involve the creation of a
> parser-generator
> . API that allows code to programmatically create grammars,
> and
> . to do so without a bunch of clumsy string formatting and
> string
> . concatenation.
> . - These grammars could be written such that things like AST
> nodes
> . are seen as terminals. This expands possibilities and
> allows
> . all of the pattern handlers to be coalesced into a
> grammar
> . that operates on ASTs and fires off semantic actions
> whenever
> . one of the recognizer patterns gets tickled by the
> right AST
> . fragment.
> . - Using strings as terminals is still cool; and necessary
> for
> . xdc's text/D-code parser.
> . - A simple parser-generator API example:
>
> ---------------------------------------
> string makeParser()
> {
> auto builder = new ParserBuilder!char;
> builder.pushSequence();
> builder.literal('x');
> builder.pushMaybe();
> builder.literal('y');
> builder.pop();
> builder.pop();
> return builder.toDCode("callMe");
> }
>
> const foo = makeParser();
>
> pragma(msg, foo);
> ---------------------------------------
> Current output:
> http://pastebin.com/V3E0Ubbc
> ---------------------------------------
>
> . - Humans would probably never directly write grammars using
> this
> . API; it is intended for use by code that needs to write
> . grammars. xdc would be such code: it's given a bunch of
> . pattern handlers and needs to turn them into a grammar.
> . - This API could also make it easier to write the parser
> . generators that humans /would/ use. For example, it
> could be
> . used as an experimental backend for a regular expression
> . engine that can handle limited recursion.
> . - The packrats usually generated from PEGs are nice and
> all, but
> . I'd really like to generate DFAs whenever possible,
> because
> . those seem to be regarded as being /very fast/.
> . - DFAs can't handle the recursive elements of PEGs, but they
> . should be able to handle everything non-recursive that
> . precedes or follows the recursive elements.
> . - The parser-generator API would be responsible for
> aggressively
> . converting PEG-like elements into regex/DFA elements
> whenever
> . possible.
> . - Regular expressions can be embedded in PEGs as long as
> you tell
> . them how much text to match. You have to give them
> concrete
> . success/failure conditions that can be determined
> without
> . help from the rest of the PEG: things like "match as
> many
> . characters as possible" or "match as few characters as
> . possible". Without that, the regex's backtracking
> (DFA'd
> . or otherwise) won't mesh with the PEG. Give it a
> concrete
> . win/fail condition, however, and the embedded regex
> becomes
> . just another PEG building block that chews through some
> . source material and yields a yes/no result. Such
> regular
> . expressions allow DFAs to be introduced into a recursive
> . descent or packrat parser.
> . - Many PEG elements can be converted into these well-behaved
> . regular expressions.
> . - PEG repetition is just regular expression repetition
> with
> . a wrapper around it that says "match as many
> characters
> . as possible".
> . - PEG ordered choice can be lowered into regular
> expression
> . unordered choice, which can then be converted into
> DFAs:
> . I suspect that this is true: (uv/xy)c ==
> (uv|(^(uv)&xy))c
> . (or, by De Morgan's law: (uv/xy)c ==
> (uv|(^(uv|^(xy))))c )
> . & is intersection.
> . ^ is negation.
> . Each letter (u,v,x,y,c) can be a subexpression
> . (non-recursive).
> . - PEG label matching can be inlined up to the point where
> . recursion occurs, thus allowing more elements to be
> . considered for DFA conversion.
> . - etc.
>
> .- The parser would be defined using a PEG (most likely using
> Pegged
> . specifically).
> . - Although Pegged is an awesome achievement, I suspect its
> output
> . could be improved considerably. The templated code it
> . generates is slow to compile and ALWAYS allocates parse
> . tree nodes at every symbol.
> . - I want to experiment with making Pegged (or a branch of
> it) emit
> . DFA/Packrat parser hybrids. This could be done by
> making a
> . version of Pegged that uses the aforementioned
> . parser-generator API to create its parsers.
> . - Design principle: avoid memory allocations like the
> plague.
> . The output should be a well-pruned AST, and not just a
> parse
> . tree that causes a bunch of allocations and needs
> massaging to
> . become useful.
> . - I really like Pegged and would contribute this stuff
> upward, if
> . accepted.
>
> .- When hacking on xdc, you don't need to be aware of WHEN your
> code
> . code gets executed in semantic analysis. The dependency
> analysis
> . will guarantee that it always gets performed both
> . (a) when it's needed, and (b) when it has what it needs.
> . - This is what the "consumes" and "produces" variables are
> all
> . about in the above example.
>
> .- Successfully lowering a D AST into the target backend's
> input will
> . almost certainly require multiple passes. xdc's dependency
> . analyzer would automatically minimize the number of passes
> by
> . looking for patterns that are "siblings" in the dependency
> graph
> . (eg. neither depends on the other) and bunching as many
> such
> . patterns as possible into each pass.
> . - It really shouldn't generate very many more than the
> number of
> . passes that DMD has coded into it. Ideally: no more
> than DMD,
> . if not fewer.
> . - I'd like to make the dependency analyzer output a graph
> that
> . can be used to track which patterns cause which passes
> to
> . exist, and show which patterns are in which passes.
>
> .- Planned availability of backends.
> . - My first pick for a backend would be an ANSI C89 target.
> I feel
> . that this would give it the most reach.
> . - The interpreter backend is along for the ride, as
> mentioned.
> . - Because the semantic analysis is composed of distinct and
> . loosely-coupled patterns, it is possible for xdc to
> generate
> . an analysis chain with the minimum number of lowerings
> needed
> . for a given backend.
> . - The interpreter backend would benefit from having the
> most
> . lowerings. By requiring a lot of lowering, the
> interpreter
> . would only need to support a small number of
> constructs:
> . - if statements
> . - gotos
> . - function calls
> . - arithmetic expression evaluation
> . - builtin types (byte, short, int, long, float,
> double, etc)
> . - pointers
> . - Even structs are unnecessary: they can be seen as
> . typed dereferencing of untyped pointers.
> . - The C backend would benefit from slightly less
> lowering than
> . the interpreter backend. It is useful for debugging
> if
> . you can mostly-sorta read the resulting C code, and
> your
> . C compiler will appreciate the extra optimization
> . opportunities.
> . - Looping constructs like while and for are welcome
> here.
> . - structs would be more readable.
> . - In the future, a Java or C# backend might use an
> entirely
> . different set of lowerings in later passes.
> . - Pointers are no longer considered "low".
> . - Classes should be kept as long as possible;
> . I'm pretty sure they bytecode (at least for Java)
> . has opcodes dedicated to classes. Removing them
> . may cause pessimisation.
> . - The backend writer should not have to worry about
> rewriting
> . the semantic analysis to suit their needs. They
> just define
> . some features and say which ones they need available
> in the
> . AST, and xdc's semantic-analysis-generator will
> handle the
> . rest.
> . - Notably, a backend should just be more lowerings, with the
> . result being text or binary code instead of AST nodes.
> . - Backends are essentially defined by the set of
> AST/language
> . features that they consume and any special lowerings
> needed
> . to convert generic AST/language features into
> . backend-specific AST/language features.
>
>
> == Closing Thoughts ==
>
> I am realizing that there are multiple reasons that compel me
> to write this document:
> - To share my ideas with others, on the off-chance that someone
> else might see this vision too and be better equipped to
> deliver.
> - To suggest capabilities that any community-endorsed compiler
> tool (ex: compiler-as-a-ctfe-library) should have.
> - To see if I might be able to get the help I need to make it a
> reality.
>
> I just can't decide which reasons are more important. But
> there is a common thread: I want this vision to become reality
> and do really cool things while filling a bunch of missing
> links in D's ecosystem.
>
> I have to ask:
>
> Would you pay for this?
> If so, then I might be able to do a kickstarter at some point.
> I am not independently wealthy or retired (or both?) like
> Walter, nor am I able to survive on zero hours of sleep each
> night like Andrei, and this would be a big project. I think it
> would need full-time attention or it would never become useful
> in a reasonable timeframe.
>
> Also, assuming you understand the design, are there any gaping
> holes in this?
> This is my first attempt to share these ideas with a larger
> group, and thus an opportunity to anticipate troubles.
>
> ...
>
> Well, I'm anxious to see how well the venerable D community
> receives this bundle of ideas. Be chatty. I'll try to keep up.
>
> Thank you for reading.
More information about the Digitalmars-d
mailing list