Potential of a compiler that creates the executable at once

Fri Feb 11 06:36:50 UTC 2022

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:
> On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:
>> A couple of months ago, I found out about a language called 
>> [Vox](https://github.com/MrSmith33/vox) which uses a design 
>> that I haven't seen before by any other compiler which is to 
>> not create object files and then link them together but 
>> instead, always create an executable at once.
>
>  TCC (*Tiny C Compiler*) does this like 10 years ago. TCC was 
> originally made as part of the obfuscation programming 
> challenge, and then got updated to be more complete.
>
>  https://www.bellard.org/tcc/
>
>  I believe most of the compilers base is involving optimization 
> for various architectures and versions of CPU's, along with 
> cross-compiling. GNU/GCC has tons of legacy code in the back 
> that it still uses i believe.
>
>  To note, back in 1996 or about there i wrote an assembler that 
> took x86 and could compiler itself. But wasn't compatible with 
> any other code and couldn't use object files or anything (*as 
> it was all made from scratch when i was 12-14*). However it did 
> compiler directly to a COM file. I'll just say from experience, 
> there are advantages but they don't outweigh the disadvantages. 
> That's my flat opinion going from here.

Optimizations are slow, and optimizations that aren't a total 
mess when implemented require abstraction. Making those 
abstractions cheap is difficult, so you end up with LLVM and GCC 
being slower even on debug builds because they have more layers 
of abstraction (or rather take less shortcuts). It's probably 
very possible to equalise this performance with a more niche 
compiler, but it would also probably require a really immense 
effort and probably starting from scratch around a new concept (a 
la LLVM).

As for legacy code, there probably are branches being tested for 
old processors in places, but for the most part GCC's algorithms 
may look a bit crude (i.e. some of GCC's development practices 
are very 1980s compared to LLVM and will probably scare off new 
money and minds and kill the project in the long run) because of 
their C heritage, but they are still the benchmark to beat. The 
Itanium scheduler won't be running on an X86 target, to be clear.

I'm also not convinced the compiler assembling code itself is all 
that useful, it probably is marginally faster but on a modern 
system I couldn't measure it as significant on basically any 
workload. It's basically performance theatre, the performance of 
the semantic analysis or moving bytes around prior to object code 
however it's emitted is much more important.

The dmd backend gets a 6/10 for me when it comes to performance. 
The algorithms are very simple, it should really be faster than 
it is. The parts that actually emit the object code are 
particularly slow.