Proposed improvements to the separate compilation model

Roald Ribe rr at pogostick.net
Sun Jul 24 10:57:11 PDT 2011


On Sat, 23 Jul 2011 21:14:27 -0300, Walter Bright  
<newshound2 at digitalmars.com> wrote:

> On 7/23/2011 3:50 PM, Andrei Alexandrescu wrote:
>> On 7/23/11 5:39 PM, bearophile wrote:
>>> I have suggested some fine-grained hashing. Compute a hash from a
>>> class definition, and later quickly compare this value with a value
>>> stored elsewhere (like automatically written in the .di file).
>>
>> I discussed four options with Walter, and this was one of them. It has  
>> issues.
>> The proposal as in this thread is the simplest and most effective I  
>> could find.
>
> The only way the linker can detect mismatches is by embedding the hash  
> into the name, i.e. more name mangling. This has serious issues:
>
> 1. The hashing cannot be reversed. Hence, the user will be faced with  
> really, really ugly error messages from the linker that will make  
> today's mangled names look like a marvel of clarity. Consider all the  
> users today, who have a really hard time with things like:
>
>      undefined symbol: _foo
>
> from the linker. Now imagine it's:
>
>      undefined symbol:  
> _foo12345WQERTYHBVCFDERTYHGFRTYHGFTYUHGTYUHGTYUJHGTYU
>
> They'll run screaming, and I would, too.

A simplistic suggestion:

This could be made better by specifying a hash introduction character,  
known
by or specifyable in all tools. That could give
_foo^12345WQERTYHBVCFDERTYHGFRTYHGFTYUHGTYUHGTYUJHGTYU
in tools not yet aware of the hash intro character, and just
_foo
in tools that has been adapted to take advantage of it.
Both cases are easier to read IMHO, and the system enables easy  
implementation
of the second case in varous tools.

> 2. This hash will get added to all struct/class names, so there will be  
> an explosion in the length of names the linker sees. This can make tools  
> that deal with symbolic names in the executable (like debuggers,  
> disassemblers, profilers, etc.) much more messy to deal with.
> 3. Hashes aren't perfect, they can have collisions, unless you want to  
> go with really long ones like MD5.

The system above would make the length of the hash almost irrelevant,  
because
it would simplify the adaption of tools to not display the symbols hash,  
while
also make the symbol easier to read in old tools not yet adapted.

I do not know if other compiled languages has the same problem, but if they
do such a convention might be nice for them as well.

Roald


More information about the Digitalmars-d mailing list