Compilation strategy

foobar foo at bar.com
Mon Dec 17 12:09:21 PST 2012


On Monday, 17 December 2012 at 04:49:46 UTC, Michel Fortin wrote:
> On 2012-12-17 03:18:45 +0000, Walter Bright 
> <newshound2 at digitalmars.com> said:
>
>> Whether the file format is text or binary does not make any 
>> fundamental difference.
>
> I too expect the difference in performance to be negligible in 
> binary form if you maintain the same structure. But if you're 
> translating it to another format you can improve the structure 
> to make it faster.
>
> If the file had a table of contents (TOC) of publicly visible 
> symbols right at the start, you could read that table of 
> content alone to fill symbol tables while lazy-loading symbol 
> definitions from the file only when needed.
>
> Often, most of the file beyond the TOC wouldn't be needed at 
> all. Having to parse and construct the syntax tree for the 
> whole file incurs many memory allocations in the compiler, 
> which you could avoid if the file was structured for 
> lazy-loading. With a TOC you have very little to read from disk 
> and very little to allocate in memory and that'll make 
> compilation faster.
>
> More importantly, if you use only fully-qualified symbol names 
> in the translated form, then you'll be able to load lazily 
> privately imported modules because they'll only be needed when 
> you need the actual definition of a symbol. (Template 
> instantiation might require loading privately imported modules 
> too.)
>
> And then you could structure it so a whole library could fit in 
> one file, putting all the TOCs at the start of the same file so 
> it loads from disk in a single read operation (or a couple of 
> *sequential* reads).
>
> I'm not sure of the speedup all this would provide, but I'd 
> hazard a guess that it wouldn't be so negligible when compiling 
> a large project incrementally.
>
> Implementing any of this in the current front end would be a 
> *lot* of work however.

Precisely. That is the correct solution and is also how [turbo?] 
pascal units (==libs) where implemented *decades ago*.

I'd like to also emphasize the importance of using a *single* 
encapsulated file. This prevents synchronization hazards that D 
inherited from the broken c/c++ model.


More information about the Digitalmars-d mailing list