AST files instead of DI interface files for faster compilation and easier distribution

Tue Jun 12 11:47:12 PDT 2012

On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky  
<dmitry.olsh at gmail.com> wrote:

> On 12.06.2012 16:09, foobar wrote:
>> On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
>>> On 12/06/12 11:07, timotheecour wrote:
>>>> There's a current pull request to improve di file generation
>>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>>> suggest further ideas.
>>>> As far as I understand, di interface files try to achieve these
>>>> conflicting goals:
>>>>
>>>> 1) speed up compilation by avoiding having to reparse large files over
>>>> and over.
>>>> 2) hide implementation details for proprietary reasons
>>> > 3) still maintain source code in some form to allow inlining
>>> and CTFE
>>> > 4) be human readable
>>>
>>> Is that actually true? My recollection is that the original motivation
>>> was only goal (2), but I was fairly new to D at the time (2005).
>>>
>>> Here's the original post where it was implemented:
>>> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
>>> and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
>>> in DMD0.142
>>>
>>> Personally I believe that.di files are *totally* the wrong approach
>>> for goal (1). I don't think goal (1) and (2) have anything in common
>>> at all with each other, except that C tried to achieve both of them
>>> using header files. It's an OK solution for (1) in C, it's a failure
>>> in C++, and a complete failure in D.
>>>
>>> IMHO: If we want goal (1), we should try to achieve goal (1), and stop
>>> pretending its in any way related to goal (2).
>>
>> I absolutely agree with the above and would also add that goal (4) is an
>> anti-feature. In order to get a human readable version of the API the
>> programmer should use *documentation*. D claims that one of its goals is
>> to make it a breeze to provide documentation by bundling a standard tool
>> - DDoc. There's no need to duplicate this just to provide another format
>> when DDoc itself supposed to be format agnostic.
>>
> Absolutely. DDoc being built-in didn't sound right to me at first, BUT  
> it allows us to essentially being able to say that APIs are covered in  
> the DDoc generated files. Not header files etc.
>
>> This is a solved problem since the 80's (E.g. Pascal units).
>
> Right, seeing yet another newbie hit it everyday is a clear indication  
> of a simple fact: people would like to think & work in modules rather  
> then seeing guts of old and crappy OBJ file technology. Linking with C  
> != using C tools everywhere.
>

I completely agree with this. The interactions between the D module system  
and D toolchain are utterly confusing to newcomers, especially those from  
other C-like languages. There are better ways, see .NET Assemblies and  
Pascal Units. These problems were solved decades ago. Why are we still  
using 40-year-old paradigms?

>  >Per Adam's
>> post, the issue is tied to DMD's use of OMF/optlink which we all would
>> like to get rid of anyway. Once we're in proper COFF land, couldn't we
>> just store the required metadata (binary AST?) in special sections in
>> the object files themselves?
>>
> Seconded. At least lexed form could be very compact, I recall early  
> compressors tried doing the Huffman thing on source code tokens with a  
> certain success.
>

I don't see the value of compression. Lexing would already reduce the size  
significantly and compression would only add to processing times. Disk is  
cheap.

Beyond that though, this is absolutely the direction D must head in. In my  
mind the DI generation patch was mostly just a stop-gap to bring DI-gen  
up-to-date with the current system thereby giving us enough time to tackle  
the (admittedly huge) task of building COFF into the backend, emitting the  
lexed source into a special section and then giving the compiler *AND*  
linker the ability to read out the source. For example the giving the  
linker the ability to read out source code essentially requires a  
brand-new linker. Although, it is my personal opinion that the linker  
should be integrated with the compiler and done as one step, this way the  
linker could have intimate knowledge of the source and would enable some  
spectacular LTO options. If only DMD were written in D, then we could  
really open the compile speed throttles with an MT build model...

>> Another related question - AFAIK the LLVM folks did/are doing work to
>> make their implementation less platform-depended. Could we leverage this
>> in ldc to store LLVM bit code as D libs which still retain enough info
>> for the compiler to replace header files?
>>
>
>

-- 
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/