[SAOC 2023] dfmt rewrite - Weekly update #1
deadalnix
deadalnix at gmail.com
Sun Sep 24 21:11:34 UTC 2023
On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
> Hi everyone,
>
> For SAOC 2023, I'm working on refactoring
> [dfmt](https://github.com/dlang-community/dfmt) to use the AST
> from DMD-as-a-library instead of libdparse.
>
> The past week has been very interesting. I got up to speed with
> the dfmt codebase, and managed to do a 1-to-1 port of the lexer
> dependency from libdparse to DMD-as-a-library. Most parts were
> pretty straightforward, and the bulk of the work was replacing
> every `tok!"<token>"` instance with `TOK.<token>` and making
> sure the token coming from DMD was the same as what was
> previously being used. So far so good!
>
> You can see the draft PR tracking the work
> [here](https://github.com/dlang-community/dfmt/pull/589).
>
> Going forward, my mentor and I have decided that it would be
> impractical to try and replace the parser directly, for
> multiple reasons:
>
> - It's a lot of work to replace the parser and use the DMD AST
> instead of libdparse's, and all of this work will happen
> without a working version of dfmt. If, at the end of this, dfmt
> is broken or refuses to compile, it could very well mean that
> all that effort went down the drain.
> - Doing a brute force replacement of the parser will prevent us
> from testing the transformation passes in dfmt individually,
> and also brings us back to the point above.
>
> Hence, we've decided to do an incremental rewrite of the files
> that use the parser, initially with no passes (just to ensure
> the AST is being built in the first place), and then adding
> each pass along with relevant unit tests.
I don't want to sound alarming or anything, but an AST is not
really what you want to work with as a formatter.
The main reason is that you want to carry around a lot of
information that the AST generally doesn't care about (comments,
informations about layout, etc...). Consider the following
example:
```d
int a; // this is an int.
int b;
```
We immediately recognize that the comment refers to a. However:
```d
int a;
// this is an int.
int b;
```
Now we recognize that the comment refers to b.
There is a lot of subtle semantic in there that is very hard to
convey through an AST and are very hard to work with in that form.
There is a lot of prior art on the matter of code formatting, and
the best explanation is probably the one from dartfmt's author:
https://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ . clang-format and many others do use this approach.
Shameless plug: sdfmt uses that approach. You can get it there:
https://code.dlang.org/packages/sdc%3Asdfmt .
I understand this is probably out of scope to turn things around
at this time, but holly hell, do we really need, as a community,
to redo all the mistake other communities have done instead of
learning from them, and, to add insult to injury, involve junior
devs in that madness?
More information about the Digitalmars-d
mailing list