Why C++ compiles slowly

Steven Schveighoffer schveiguy at yahoo.com
Tue Aug 24 13:56:01 PDT 2010


On Tue, 24 Aug 2010 14:31:26 -0400, Walter Bright  
<newshound2 at digitalmars.com> wrote:

> Steven Schveighoffer wrote:
>> On Tue, 24 Aug 2010 03:58:57 -0400, Walter Bright  
>> <newshound2 at digitalmars.com> wrote:
>>
>>> Steven Schveighoffer wrote:
>>>> With profiling enabled, gprof outputs this as the top hitters:
>>>>   Flat profile:
>>>>  Each sample counts as 0.01 seconds.
>>>>   %   cumulative   self              self     total
>>>>  time   seconds   seconds    calls  ms/call  ms/call  name
>>>>  77.76      6.68     6.68     2952     2.26     2.26   
>>>> elf_findstr(Outbuffer*, char const*, char const*)
>>>>   2.10      6.86     0.18     4342     0.04     0.04  searchfixlist
>>>
>>> elf_findstr definitely looks like a problem area. I can't look at it  
>>> right now, so can you post this to bugzilla please?
>>  http://d.puremagic.com/issues/show_bug.cgi?id=4721
>
> Also, putting a printf in elf_findstr to print its arguments will be  
> helpful.

Through some more work with printf, I have to agree with bearophile, this  
lookup function is horrid.

I think it's supposed to look for a symbol in the symbol table, but it  
uses a linear search through all symbols to find it.  Not only that, but  
the table is stored in one giant buffer, so once it finds the current  
symbol it's checking against doesn't match, it has to still loop through  
the remaining characters of the unmatched symbol to find the next 0 byte.

I added a simple running printout of how many times the function has been  
called, along with how large the symbol table has grown.

The code is as follows:


static IDXSTR elf_findstr(Outbuffer *strtab, const char *str, const char  
*suffix)
{
+    static int ncalls = 0;
+    ncalls++;
+    printf("\r%d\t%d", ncalls, strtab->size());
+    fflush(stdout);
     const char *ent = (char *)strtab->buf+1;
     const char *pend = ent+strtab->size() - 1;


At the end, the symbol table is over 4 million characters and the number  
of calls is 12677.  You can watch it slow down noticeably.

I also added some code to count the number of times a symbol is matched --  
648, so about 5% of the time.  This means that 95% of the time, the whole  
table is searched.

If you multiply those factors together, and take into account the nature  
of how it grows, you have probably 20 billion loop iterations.  Whereas, a  
hash table would probably be much faster.  I'm thinking a correct  
compilation time should be on the order of 3-4 seconds vs. 67 seconds it  
now takes.

I am not sure how to fix it, but that's the gist of it.  I think the  
symbol table is so large because of the template proliferation of  
dcollections, and the verbosity of D symbol names.

-Steve


More information about the Digitalmars-d mailing list