tooling quality and some random rant

Mon Feb 14 14:22:25 PST 2011

retard wrote:
> Mon, 14 Feb 2011 13:00:00 -0800, Walter Bright wrote:
> 
>> In particular, instruction scheduling no longer seems to matter, except
>> for the Intel Atom, which benefits very much from Pentium style
>> instruction scheduling. Ironically, dmc++ is the only available current
>> compiler which supports that.
> 
> I can't see how dmc++ is the only available current compiler which 
> supports that. For example this article (April 15, 2010) [1] tells:
> 
> "The GCC 4.5 announcement was made at GNU.org. Changes from GCC 4.4, 
> which was released almost one year ago, include the
>  * use of the MPC library to evaluate complex arithmetic at compile time
>  * C++0x improvements
>  * automatic parallelization as part of Graphite
>  * support for new ARM processors
>  * Intel Atom optimizations and tuning support, and
>  * AMD Orochi optimizations too"
> 
> GCC has supported i586 scheduling as long as I can remember.

"Optimizations and tuning support" is not necessarily scheduling. dmc 
specifically does scheduling for the U and V pipes on the Pentium, and does a 
near perfect job of it (better than any other compiler of the time that I 
checked, most of which didn't even attempt it).

The only way to tell if a compiler does it is by trying it and examining the 
emitted instructions. Reading the marketing literature isn't good enough.

> [1] http://www.phoronix.com/scan.php?page=news_item&px=ODE1Ng
> 
>>  > or whole program
>>
>> I looked into that, there's not a lot of oil in that well.
> 
> How about [2]:
> 
> "LTO is quite promising.  Actually it is in line or even better with
> improvement got from other compilers (pathscale is the most convenient
> compiler to check lto separately: lto gave there upto 5% improvement
> on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50%
> slower and generated code size upto 30% bigger).  LTO in GCC actually
> results in significant code reduction which is quite different from
> pathscale.  That is one of rare cases on my mind when a specific
> optimization works actually better in gcc than in other optimizing
> compilers."
> 
> [2] http://gcc.gnu.org/ml/gcc/2009-10/msg00155.html

LTO is different from whole program analysis.

BTW, you can sometimes get dramatic speedups by running the dmc profiler, and 
then feeding the .def file it generates back into the linker. This will reorder 
the code for optimum speed. That is LTO, but is not whole program optimization.

C++'s compilation model thwarts true whole program analysis at every step. D, on 
the other hand, is designed to support it. dmd has some initial support for 
that, as it will inline code from across any modules you hand it the source for.

> In my opinion the up to 5% improvement is pretty good compared to 
> advances in typical minor compiler version upgades. For example [3]:
> 
> "The Fortran-written NAS Parallel Benchmarks from NASA with the LU.A test 
> is running significantly faster with GCC 4.5. This new compiler is 
> causing NAS LU.A to run 15% better than the other tested GCC releases."

Yes, 5% is a decent improvement. You'd have to look closer to see where the 
improvement is coming from, though, to draw any useful conclusions. It could be 
(and this happens) one single tweak of one expression node that was crappily 
written in the first place.

> [3] http://www.phoronix.com/scan.php?
> page=article&item=gcc_45_benchmarks&num=6
> 
>>  > and instruction level optimizations the very latest GCC and LLVM are
>>  > now
>> slowly adopting.
>>
>> Huh? Every compiler in existence has done, and always has done,
>> instruction level optimizations.
> 
> I don't know this area well enough, but here is a list of optimizations 
> it does http://llvm.org/docs/Passes.html - from what I've read, GNU GCC 
> doesn't implement all of these.

Every compiler implements a list of those, and those lists vary a lot from 
compiler to compiler. dmc probably has a thousand of those patterns embedded in 
it that it specifically recognizes.

>> Note: a lot of modern compilers expend tremendous effort optimizing
>> access to global variables (often screwing up multithreaded code in the
>> process). I've always viewed this as a crock, since modern programming
>> style eschews globals as much as possible.
> 
> I only know that modern C/C++ compilers are doing more and more things 
> automatically. And that might soon include automatic vectorization + 
> multithreading of some computationally intensive code via OpenMP.

D is actually far friendlier to vectorization than C/C++ are.