Thoughts about "Compile-time types" talk

Thu May 16 03:37:21 UTC 2019

On Wednesday, 15 May 2019 at 19:28:21 UTC, H. S. Teoh wrote:
> On Wed, May 15, 2019 at 06:57:02PM +0000, Alex via 
> Digitalmars-d wrote: [...]
>> D already does all this but the language clearly was not 
>> designed with the intention to unify them.
>> 
>> For example,
>> 
>> int x = 3;
>> if (x == 3) fart;
>> 
>> Hear the compiler should be able to reason that x = 3 and to 
>> optimize it all to
>> 
>> fart;
>> 
>> 
>> This requires flow analysis but if everything is 100% analyzed 
>> correctly the compiler could in theory determine if any 
>> statement is reducible and cascade everything if necessary 
>> until all that's left are things that are not reducible.
>> 
>> It's pretty simple in theory but probably very difficult to 
>> modify a pre-existing compiler that was designed without it to 
>> use it.
>
> Good news: this is not just theory, and you don't have to 
> modify any compiler to achieve this.  Just compile the above 
> code with LDC (ldc2 -O3) and look at the disassembly.  :-)
>

Yeah, I was just using it as an example.... most modern compilers 
can do some mixture and in fact do.

> In fact, the LDC optimizer is capable of far more than you 
> might think. I've seen it optimize an entire benchmark, 
> complete with functions, user-defined types, etc., into a 
> single return instruction because it determined that the 
> program's output does not depend on any of it.

Yes, this is because the evolution of compilers is moving towards 
what we are talking about. We are not the first to think about 
things this way, in fact, all things evolve in such ways. The 
"first compiler"(the abacus?) was meta programming at the time.

> Writing the output into a file (cause a visible effect to 
> prevent the optimizer from eliding the entire program) doesn't 
> always help either, because the optimizer would then run the 
> entire program at compile-time then emit the equivalent of:
>
> 	enum answer = 123;
> 	outputfile.writeln(answer);
>
> The only way to get an accurate benchmark is to make it do 
> non-trivial work -- non-trivial meaning every operation the 
> program does contributes to its output, and said operations 
> cannot be (easily) simplified into a precomputed result.  The 
> easiest way to do this is for the program to read some input 
> that can only be known at runtime, then perform the calculation 
> on this input.  (Of course, there's a certain complexity limit 
> to LDC's aggressive optimizer; I don't think it'd optimize away 
> an NP-complete problem with fixed input at compile-time, for 
> example.  But it's much easier to read an integer at runtime 
> than to implement a SAT solver just so LDC won't optimize it 
> all away at compile-time. :-D  Well actually, I'm sure the LDC 
> optimizer will give up long before you give it an NP-complete 
> problem, but in theory it *could* just run the entire program 
> at compile-time if all inputs are already known.)

A programming is, is a very complicated mathematical equation. A 
compiler "simplifies" the equation so that it is easier to work 
with(faster, less space, etc). What's more mind blowing is that 
this is actually true... that is, the universe seems to be one 
giant mathematical processing machine. Men 200 years ago working 
on the foundations of computing had no idea about this stuff and 
that there would be these deep relationships between math, 
computers, and life itself. I think humanity is just scratching 
the surface though.

In any case, a program is just an equation, a compiler a 
simplifier. A compiler attempts to compile everything down to a 
final result, certain inputs are not known at compile time so 
they are determined at "run time".

Imagine this: Imagine you have some complex program, say a pc 
video game. What is the purpose of this program? Is it to run it 
and experience? NAY! It is a final result ultimately! If the 
compiler could, hypothetically, compile it down to a final value, 
that would be ideal. What is the final value though? Well, it is 
the experience of game in to the human mind. Imagine you could 
experience it without having to waste hours and hours... that 
would be the ideal compiler.

Obviously here I'm taking a very general definition of 
compiler... but again, this is where things are headed. The 
universe has time and space... the only way to make more time is 
to reduce the costs and increase the space. Eventually humans 
will have uC in their brains where they can experience things 
much quicker, interface with the "compiler" much quicker, etc... 
[probably thousands of years off if humanity makes it].

> The DMD optimizer, by comparison, is a sore loser in this game. 
> That's why these days I don't even bother considering it where 
> performance is concerned.
>
>
> But the other point to all this, is that while LDC *can* do all 
> of this at compile-time, meaning the LLVM backend can do all of 
> this, for other languages like C and C++ too, what it *cannot* 
> do without language support is to use the result of such a 
> computation to influence program structure.  That's where D's 
> CTFE + AST manipulation becomes such a powerful tool.  And 
> that's where further unification of RT/CT concepts in the 
> language will give us an even more powerfully-expressive 
> language.
>

It may be able to do such things as you describe. I'm not 
claiming anything is impossible, quite the contrary. I'm mainly 
talking about what is and what could be. LDC may do quite more 
work in this area. Ultimately syntax is an abstraction and it is 
a hurdle. Ideally we would have one symbol for one code, a unique 
hashing for all programs(Godel theory). Then it would be very 
easy to write a program! ;) Of course looking up the right symbol 
would take for ever. In fact, one could describe programming as 
precisely looking up this code, and it is quite complex and easy 
to get the wrong code(e.g., all programs are correct, we just 
choose the wrong one).

My main point with D's language is that it has separate CT and RT 
constructs. enum is CT. This complicates things. If D was 
designed with the concept of LDC and was able to simply optimize 
all "RT" code that could be optimized then the distinction isn't 
needed... although the separation does make it easier to reason 
about... everyone knows an enum is CT.

My way of thinking is this:

All programming should be thought of as in "CT". That is, all 
data is known, it just may be specifically known in the future 
only. A compiler cannot reduce the future state since it is 
"unknown" at the present. So it delays the compilation until the 
future(when you click the button or press a key or insert the USB 
device).

This is of course just thinking of CT and RT slightly different. 
It is more functional. But what it does is shift the emphasis in 
the right direction.

Why?

Because if one writes code as if it were all RT then one tends to 
prevent optimizations from occurring(as DMD does). If one thinks 
of CT one usually has the implicit idea that things are going to 
be reduced(as LDC does). It's more of a mindset but it has 
repercussions.

e.g., if one always use static if as default then one is thinking 
in CT. If one always defaults to if then one is thinking of RT. 
The difference is that the compiler always can optimize the 
static if while it may or may not(LDC or DMD) the standard if.

Which, as you have pointed out, usually there is no true 
difference. Either the if can or cannot be optimized depending on 
the state.

LDC is simply a more powerful compiler that "reasons" about the 
program. That is, it understands what it is doing more than DMD. 
DMD does things blind. Makes a lot of assumptions.

Until there is a massive paradigm shift in programming(e.g., from 
punch cards to assembly) the only way to optimize code is going 
to be to design languages and compilers that are optimal. That is 
the progression of compilers as we see with LDC vs DMD.

Programming is getting more complex, not less. But it is also 
becoming more optimal... that is the progression of all things... 
even compilers evolve(in the real sense of evolution).

I think my overall point is that D's language design itself has 
made the artificial distinction between CT and RT and now we have 
to live with it... The distinction was made up front(in the 
language) rather than waiting to the last minute(in the 
compiler). Of course, this problem started way back with the 
"first" programming language. Some point in the future someone 
will be saying the same types of things about LDC and some other 
"advanced" concept.

The more I program in D, the more I find meta programming a 
burden. Not because it is not powerful but that I have to think 
with two hats on at the same time. It's not difficult until I 
stop programming in D for a while and have to find the other hat 
and get good balancing both of them on my head again.

D's meta programming is powerful but it is not natural. Ideally 
we would have a language that is both powerful and natural. I 
think Haskell might be like this but it is unnatural in other 
ways.

Of course, at the end of the day, it is what it is...