DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Sat Jun 14 08:05:59 PDT 2014

On Thursday, 12 June 2014 at 16:42:38 UTC, Dmitry Olshansky wrote:
> It's always nice to ask something on D NG, so many good answers 
> I can hardly choose whom to reply ;) So this is kind of 
> broadcast.
>
> Yes, the answer seems spot on - reflection! But allow me to 
> retort.
>
> I'm not talking about completely stand-alone generator. Just as 
> well generator tool could be written in D using the same exact 
> sources as your D program does. Including the static 
> introspection and type-awareness. Then generator itself is a 
> library + "an invocation script" in D.
>
> The Q is specifically of CTFE in this scenario, including not 
> only obvious shortcomings of design, but fundamental ones of 
> compilation inside of compilation. Unlike proper compilation is 
> has nothing persistent to back it up. It feels backwards, a bit 
> like C++ TMP but, of course, much-much better.
>
>> 1)
>>
>> Reflection. It is less of an issue for pure DSL solutions 
>> because those
>> don't provide any good reflection capabilities anyway, but 
>> other code
>> generation approaches have very similar problems.
>>
>> By doing all code generation in separate build step you 
>> potentially lose
>> many of guarantees of keeping various parts of your 
>> application in sync.
>>
>
> Use the same sources for the generator. In essence all is the 
> same, just relying on separate runs and linkage, not mixin. 
> Necessary "hooks" to link to later could indeed be generated 
> with a tiny bit of CTFE.
>
> Yes, deeply embedded stuff might not be that easy. The scope 
> and damage is smaller though.
>
>> 2)
>>
>> Moving forward. You use traditional reasoning of DSL generally 
>> being
>> something rare and normally stable. This fits most common DSL 
>> usage but
>> tight in-language integration D makes possible brings new 
>> opportunities
>> of using DSL and code generation casually all other your 
>> program.
>>
>
> Well, I'm biased by heavy-handed ones. Say I have a (no longer) 
> secret plan of doing a next-gen parser generator in D. Needless 
> to say swaths of non-trivial code generation. I'm all for 
> embedding nicely but I see very little _practical_ gains in 
> CTFE+mixin here EVEN if CTFE wouldn't suck. See the point above 
> about using the same metadata and types as the user application 
> would.

Consider something like REST API generator I have described 
during DConf. There is different code generated in different 
contexts from same declarative description - both for server and 
client. Right now simple fact that you import very same module 
from both gives solid 100% guarantee that API usage between those 
two programs stays in sync.

In your proposed scenario there will be two different generated 
files imported by server and client respectively. Tiny typo in 
writing your build script will result in hard to detect run-time 
bug while code itself still happily compiles.

You may keep convenience but losing guarantees hurts a lot. To be 
able to verify static correctness of your program / group of 
programs type system needs to be aware how generated code relates 
to original source.

Also this approach does not scale. I can totally imagine you 
doing it for two or three DSL in single program, probably even 
dozen. But something like 100+? Huge mess to maintain. According 
to my experience all builds systems are incredibly fragile 
beasts, trusting them something that impacts program correctness 
and won't be detected at compile time is just too dangerous.

>> I totally expect programming culture to evolve to the point 
>> where
>> something like 90% of all application code is being generated 
>> in typical
>> project. D has good base for promoting such paradigm switch 
>> and reducing
>> any unnecessary mental context switches is very important here.
>>
>> This was pretty much the point I was trying to make with my 
>> DConf talk (
>> and have probably failed :) )
>
> I liked the talk, but you know ... 4th or 5th talk with 
> CTFE/mixin I think I might have been distracted :)
>
> More specifically this bright future of 90%+ concise DSL driven 
> programs is undermined by the simple truth - no amount of 
> improvement in CTFE would make generators run faster then 
> optimized standalone tool invocation. The tool (library written 
> in D) may read D metadata just fine.
>
> I heard D builds times are important part of its adoption so...

Adoption - yes. Production usage - less so (though still 
important). Difference between 1 second and 5 seconds is very 
important. Between 10 seconds and 1 minute - not so much.

JIT will be probably slower than stand-alone generators but not 
that slower.

> It might solve most of _current_ problems, but I foresee 
> fundamental issues of "no global state" in CTFE that in say 10 
> years from now would look a lot like `#include` in C++.

I hope 10 years ago from now we will consider having global state 
in RTFE stone age relict :P

> A major one is there is no way for compiler to not recompile 
> generated code as it has no knowledge of how it might have 
> changed from the previous run.

Why can't we merge basic build system functionality akin to rdmd 
into compiler itself? It makes perfect sense to me as build 
process can benefit a lot from being semantically aware.