Compilation strategy

Dmitry Olshansky dmitry.olsh at gmail.com
Mon Dec 17 14:08:55 PST 2012


12/18/2012 12:34 AM, Paulo Pinto пишет:
> Am 17.12.2012 21:09, schrieb foobar:
>> On Monday, 17 December 2012 at 04:49:46 UTC, Michel Fortin wrote:
>>> On 2012-12-17 03:18:45 +0000, Walter Bright
>>> <newshound2 at digitalmars.com> said:
>>>
>>>> Whether the file format is text or binary does not make any
>>>> fundamental difference.
>>>
>>> I too expect the difference in performance to be negligible in binary
>>> form if you maintain the same structure. But if you're translating it
>>> to another format you can improve the structure to make it faster.
>>>
>>> If the file had a table of contents (TOC) of publicly visible symbols
>>> right at the start, you could read that table of content alone to fill
>>> symbol tables while lazy-loading symbol definitions from the file only
>>> when needed.
>>>
>>> Often, most of the file beyond the TOC wouldn't be needed at all.
>>> Having to parse and construct the syntax tree for the whole file
>>> incurs many memory allocations in the compiler, which you could avoid
>>> if the file was structured for lazy-loading. With a TOC you have very
>>> little to read from disk and very little to allocate in memory and
>>> that'll make compilation faster.
>>>
>>> More importantly, if you use only fully-qualified symbol names in the
>>> translated form, then you'll be able to load lazily privately imported
>>> modules because they'll only be needed when you need the actual
>>> definition of a symbol. (Template instantiation might require loading
>>> privately imported modules too.)
>>>
>>> And then you could structure it so a whole library could fit in one
>>> file, putting all the TOCs at the start of the same file so it loads
>>> from disk in a single read operation (or a couple of *sequential*
>>> reads).
>>>
>>> I'm not sure of the speedup all this would provide, but I'd hazard a
>>> guess that it wouldn't be so negligible when compiling a large project
>>> incrementally.
>>>
>>> Implementing any of this in the current front end would be a *lot* of
>>> work however.
>>
>> Precisely. That is the correct solution and is also how [turbo?] pascal
>> units (==libs) where implemented *decades ago*.
>>
>> I'd like to also emphasize the importance of using a *single*
>> encapsulated file. This prevents synchronization hazards that D
>> inherited from the broken c/c++ model.
>

I really loved the way Turbo Pascal units were made. I wish D go the 
same route.  Object files would then be looked at as minimal and stupid 
variation of module where symbols are identified by mangling (not plain 
meta data as (would be) in module) and no source for templates is emitted.

AFAIK Delphi is able to produce both DCU and OBJ files (and link with). 
Dunno what it does with generics (and which kind these are) and how.

> I really miss it, but at least it has been picked up by Go as well.
>
> Still find strange that many C and C++ developers are unaware that we
> have modules since the early 80's.
>
+1

I suspect it's one of prime examples where UNIX philosophy of combining 
a bunch of simple (~ dumb) programs together in place of one more 
complex program was taken *far* beyond reasonable lengths.

Having a pipe-line:
preprocessor -> compiler -> (still?) assembler -> linker

where every program tries hard to know nothing about the previous ones 
(and be as simple as possibly can be) is bound to get inadequate results 
on many fronts:
- efficiency & scalability
- cross-border error reporting and detection (linker errors? errors for 
expanded macro magic?)
- cross-file manipulations (e.g. optimization, see _how_ LTO is done in GCC)
- multiple problems from a loss of information across pipeline*

*Semantic info on interdependency of symbols in a source file is 
destroyed right before the linker and thus each .obj file is included as 
a whole or not at all. Thus all C run-times I've seen _sidestep_ this by 
writing each function in its own file(!). Even this alone should have 
been a clear indication.

While simplicity (and correspondingly size in memory) of programs was 
the king in 70's it's well past due. Nowadays I think is all about 
getting highest throughput and more powerful features.

> --
> Paulo


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list