[SAOC 2023] dfmt rewrite - Weekly update #1

RazvanN razvan.nitu1305 at gmail.com
Mon Sep 25 08:26:33 UTC 2023


On Sunday, 24 September 2023 at 21:11:34 UTC, deadalnix wrote:
> On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
>> Hi everyone,
>>
>> For SAOC 2023, I'm working on refactoring 
>> [dfmt](https://github.com/dlang-community/dfmt) to use the AST 
>> from DMD-as-a-library instead of libdparse.
>>
>> The past week has been very interesting. I got up to speed 
>> with the dfmt codebase, and managed to do a 1-to-1 port of the 
>> lexer dependency from libdparse to DMD-as-a-library. Most 
>> parts were pretty straightforward, and the bulk of the work 
>> was replacing every `tok!"<token>"` instance with 
>> `TOK.<token>` and making sure the token coming from DMD was 
>> the same as what was previously being used. So far so good!
>>
>> You can see the draft PR tracking the work 
>> [here](https://github.com/dlang-community/dfmt/pull/589).
>>
>> Going forward, my mentor and I have decided that it would be 
>> impractical to try and replace the parser directly, for 
>> multiple reasons:
>>
>> - It's a lot of work to replace the parser and use the DMD AST 
>> instead of libdparse's, and all of this work will happen 
>> without a working version of dfmt. If, at the end of this, 
>> dfmt is broken or refuses to compile, it could very well mean 
>> that all that effort went down the drain.
>> - Doing a brute force replacement of the parser will prevent 
>> us from testing the transformation passes in dfmt 
>> individually, and also brings us back to the point above.
>>
>> Hence, we've decided to do an incremental rewrite of the files 
>> that use the parser, initially with no passes (just to ensure 
>> the AST is being built in the first place), and then adding 
>> each pass along with relevant unit tests.
>
> I don't want to sound alarming or anything, but an AST is not 
> really what you want to work with as a formatter.
>
> The main reason is that you want to carry around a lot of 
> information that the AST generally doesn't care about 
> (comments, informations about layout, etc...). Consider the 
> following example:
>
> ```d
> int a; // this is an int.
> int b;
> ```
>
> We immediately recognize that the comment refers to a. However:
>
> ```d
> int a;
>
> // this is an int.
> int b;
> ```
>
> Now we recognize that the comment refers to b.
>
> There is a lot of subtle semantic in there that is very hard to 
> convey through an AST and are very hard to work with in that 
> form.
>

But dmd attaches comments to declarations in the parsing phase. 
You can get the comments from the AST. Why is that not a good way 
to represent the data?

> There is a lot of prior art on the matter of code formatting, 
> and the best explanation is probably the one from dartfmt's 
> author: 
> https://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ . clang-format and many others do use this approach.
>

The article states the following:

"At this point, you’re probably thinking, “Wait. What’s so hard 
about formatting?” After you’ve parsed, can’t you just walk the 
AST and pretty-print it with some whitespace?

If every statement fit within the column limit of the page, yup. 
It’s a piece of cake. (I think that’s what gofmt does.) But our 
formatter also keeps your code within the line length limit. That 
means adding line breaks (or “splits” as the formatter calls 
them), and determining the best place to add those is famously 
hard."

So the fundamental problem comes from the fact that the author 
wants to do smart stuff with line/column limits. dfmt does not 
even have a line limit option, so that problem does not apply 
here. Even if we were to add it, that could be viewed as a 
separate project of transitioning dfmt to a better representation.

> Shameless plug: sdfmt uses that approach. You can get it there: 
> https://code.dlang.org/packages/sdc%3Asdfmt .
>
> I understand this is probably out of scope to turn things 
> around at this time, but holly hell, do we really need, as a 
> community, to redo all the mistake other communities have done 
> instead of learning from them, and, to add insult to injury, 
> involve junior devs in that madness?

As things stand dfmt already exists and is implemented by using 
an AST to output the formatted code. Maybe that is not the ideal 
approach and could be improved, but that has nothing to do with 
this project. This project is all about replacing a dependency 
(libdparse) with another one (dmd-as-a-lib) so that when dmd is 
upgraded, dfmt ideally would not be affected.

Re-engineering the formatter such that it uses a different 
technique represents a separate project that is orthogonal to the 
current effort.


More information about the Digitalmars-d mailing list