Thoughts about "Compile-time types" talk
Alex
AJ at gmail.com
Thu May 16 03:37:21 UTC 2019
On Wednesday, 15 May 2019 at 19:28:21 UTC, H. S. Teoh wrote:
> On Wed, May 15, 2019 at 06:57:02PM +0000, Alex via
> Digitalmars-d wrote: [...]
>> D already does all this but the language clearly was not
>> designed with the intention to unify them.
>>
>> For example,
>>
>> int x = 3;
>> if (x == 3) fart;
>>
>> Hear the compiler should be able to reason that x = 3 and to
>> optimize it all to
>>
>> fart;
>>
>>
>> This requires flow analysis but if everything is 100% analyzed
>> correctly the compiler could in theory determine if any
>> statement is reducible and cascade everything if necessary
>> until all that's left are things that are not reducible.
>>
>> It's pretty simple in theory but probably very difficult to
>> modify a pre-existing compiler that was designed without it to
>> use it.
>
> Good news: this is not just theory, and you don't have to
> modify any compiler to achieve this. Just compile the above
> code with LDC (ldc2 -O3) and look at the disassembly. :-)
>
Yeah, I was just using it as an example.... most modern compilers
can do some mixture and in fact do.
> In fact, the LDC optimizer is capable of far more than you
> might think. I've seen it optimize an entire benchmark,
> complete with functions, user-defined types, etc., into a
> single return instruction because it determined that the
> program's output does not depend on any of it.
Yes, this is because the evolution of compilers is moving towards
what we are talking about. We are not the first to think about
things this way, in fact, all things evolve in such ways. The
"first compiler"(the abacus?) was meta programming at the time.
> Writing the output into a file (cause a visible effect to
> prevent the optimizer from eliding the entire program) doesn't
> always help either, because the optimizer would then run the
> entire program at compile-time then emit the equivalent of:
>
> enum answer = 123;
> outputfile.writeln(answer);
>
> The only way to get an accurate benchmark is to make it do
> non-trivial work -- non-trivial meaning every operation the
> program does contributes to its output, and said operations
> cannot be (easily) simplified into a precomputed result. The
> easiest way to do this is for the program to read some input
> that can only be known at runtime, then perform the calculation
> on this input. (Of course, there's a certain complexity limit
> to LDC's aggressive optimizer; I don't think it'd optimize away
> an NP-complete problem with fixed input at compile-time, for
> example. But it's much easier to read an integer at runtime
> than to implement a SAT solver just so LDC won't optimize it
> all away at compile-time. :-D Well actually, I'm sure the LDC
> optimizer will give up long before you give it an NP-complete
> problem, but in theory it *could* just run the entire program
> at compile-time if all inputs are already known.)
A programming is, is a very complicated mathematical equation. A
compiler "simplifies" the equation so that it is easier to work
with(faster, less space, etc). What's more mind blowing is that
this is actually true... that is, the universe seems to be one
giant mathematical processing machine. Men 200 years ago working
on the foundations of computing had no idea about this stuff and
that there would be these deep relationships between math,
computers, and life itself. I think humanity is just scratching
the surface though.
In any case, a program is just an equation, a compiler a
simplifier. A compiler attempts to compile everything down to a
final result, certain inputs are not known at compile time so
they are determined at "run time".
Imagine this: Imagine you have some complex program, say a pc
video game. What is the purpose of this program? Is it to run it
and experience? NAY! It is a final result ultimately! If the
compiler could, hypothetically, compile it down to a final value,
that would be ideal. What is the final value though? Well, it is
the experience of game in to the human mind. Imagine you could
experience it without having to waste hours and hours... that
would be the ideal compiler.
Obviously here I'm taking a very general definition of
compiler... but again, this is where things are headed. The
universe has time and space... the only way to make more time is
to reduce the costs and increase the space. Eventually humans
will have uC in their brains where they can experience things
much quicker, interface with the "compiler" much quicker, etc...
[probably thousands of years off if humanity makes it].
> The DMD optimizer, by comparison, is a sore loser in this game.
> That's why these days I don't even bother considering it where
> performance is concerned.
>
>
> But the other point to all this, is that while LDC *can* do all
> of this at compile-time, meaning the LLVM backend can do all of
> this, for other languages like C and C++ too, what it *cannot*
> do without language support is to use the result of such a
> computation to influence program structure. That's where D's
> CTFE + AST manipulation becomes such a powerful tool. And
> that's where further unification of RT/CT concepts in the
> language will give us an even more powerfully-expressive
> language.
>
It may be able to do such things as you describe. I'm not
claiming anything is impossible, quite the contrary. I'm mainly
talking about what is and what could be. LDC may do quite more
work in this area. Ultimately syntax is an abstraction and it is
a hurdle. Ideally we would have one symbol for one code, a unique
hashing for all programs(Godel theory). Then it would be very
easy to write a program! ;) Of course looking up the right symbol
would take for ever. In fact, one could describe programming as
precisely looking up this code, and it is quite complex and easy
to get the wrong code(e.g., all programs are correct, we just
choose the wrong one).
My main point with D's language is that it has separate CT and RT
constructs. enum is CT. This complicates things. If D was
designed with the concept of LDC and was able to simply optimize
all "RT" code that could be optimized then the distinction isn't
needed... although the separation does make it easier to reason
about... everyone knows an enum is CT.
My way of thinking is this:
All programming should be thought of as in "CT". That is, all
data is known, it just may be specifically known in the future
only. A compiler cannot reduce the future state since it is
"unknown" at the present. So it delays the compilation until the
future(when you click the button or press a key or insert the USB
device).
This is of course just thinking of CT and RT slightly different.
It is more functional. But what it does is shift the emphasis in
the right direction.
Why?
Because if one writes code as if it were all RT then one tends to
prevent optimizations from occurring(as DMD does). If one thinks
of CT one usually has the implicit idea that things are going to
be reduced(as LDC does). It's more of a mindset but it has
repercussions.
e.g., if one always use static if as default then one is thinking
in CT. If one always defaults to if then one is thinking of RT.
The difference is that the compiler always can optimize the
static if while it may or may not(LDC or DMD) the standard if.
Which, as you have pointed out, usually there is no true
difference. Either the if can or cannot be optimized depending on
the state.
LDC is simply a more powerful compiler that "reasons" about the
program. That is, it understands what it is doing more than DMD.
DMD does things blind. Makes a lot of assumptions.
Until there is a massive paradigm shift in programming(e.g., from
punch cards to assembly) the only way to optimize code is going
to be to design languages and compilers that are optimal. That is
the progression of compilers as we see with LDC vs DMD.
Programming is getting more complex, not less. But it is also
becoming more optimal... that is the progression of all things...
even compilers evolve(in the real sense of evolution).
I think my overall point is that D's language design itself has
made the artificial distinction between CT and RT and now we have
to live with it... The distinction was made up front(in the
language) rather than waiting to the last minute(in the
compiler). Of course, this problem started way back with the
"first" programming language. Some point in the future someone
will be saying the same types of things about LDC and some other
"advanced" concept.
The more I program in D, the more I find meta programming a
burden. Not because it is not powerful but that I have to think
with two hats on at the same time. It's not difficult until I
stop programming in D for a while and have to find the other hat
and get good balancing both of them on my head again.
D's meta programming is powerful but it is not natural. Ideally
we would have a language that is both powerful and natural. I
think Haskell might be like this but it is unnatural in other
ways.
Of course, at the end of the day, it is what it is...
More information about the Digitalmars-d
mailing list