DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Sat Jun 14 09:34:34 PDT 2014

14-Jun-2014 19:05, Dicebot пишет:
> On Thursday, 12 June 2014 at 16:42:38 UTC, Dmitry Olshansky wrote:
[snip]
>> Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret
>> plan of doing a next-gen parser generator in D. Needless to say swaths
>> of non-trivial code generation. I'm all for embedding nicely but I see
>> very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't
>> suck. See the point above about using the same metadata and types as
>> the user application would.
>
> Consider something like REST API generator I have described during
> DConf. There is different code generated in different contexts from same
> declarative description - both for server and client. Right now simple
> fact that you import very same module from both gives solid 100%
> guarantee that API usage between those two programs stays in sync.

But let's face it - it's a one-time job to get it right in your favorite 
build tool. Then you have fast and cached (re)build. Comparatively costs 
of CTFE generation are paid in full during _each_ build.

> In your proposed scenario there will be two different generated files
> imported by server and client respectively. Tiny typo in writing your
> build script will result in hard to detect run-time bug while code
> itself still happily compiles.

Or a link error if we go a hybrid path where the imported module is 
emitting declarations/hooks via CTFE to be linked to by the proper 
generated code. This is something I'm thinking that could be a practical 
solution.

I.e. currently to get around wasting cycles again and again:

module a;
bool verify(string s){
   static re = ctRegex!"...."; return match(s, re);
}
//
module b;
import a;
void foo(){
	...
	verify("blah");
	...
}

vs would-be hybrid approach:

module gen_re;

void main() //or wrap it in tiny template mixin
{
generateCtRegex(
	//all patterns
);
}

module b;
import std.regex;
//notice no import of a

void foo(){
	...
	static re = ctRegex!(...); //
	...
}
and using ctRegex as usual in b, but any miss of compiled cache would 
lead to a link error.

In fact it might be the best of both worlds if there is a switch to try 
full CTFE vs link-time external option.

>
> You may keep convenience but losing guarantees hurts a lot. To be able
> to verify static correctness of your program / group of programs type
> system needs to be aware how generated code relates to original source.

Build system does it. We have this problem with all of external deps 
anyway (i.e. who verifies the right version of libXYZ is linked not some 
other?)

> Also this approach does not scale. I can totally imagine you doing it
> for two or three DSL in single program, probably even dozen. But
> something like 100+?

Not everything is suitable, of course. Some stuff  is good only inline 
and on spot. But it does use the same sources, it may look a lot like 
this in case of REST generators:

import everything;

void main(){
	foreach(m; module){
	//... generate client code from meta-data
	}
}

Waiting for 100+ DSL compiled in a JIT interpreter that can't optimize a 
thing (pretty much by definition or use separate flags for that?) is not 
going to be fun too.

> Huge mess to maintain. According to my experience
> all builds systems are incredibly fragile beasts, trusting them
> something that impacts program correctness and won't be detected at
> compile time is just too dangerous.

Could be, but we have dub which should be simple and nice.
I had very positive experience with scons and half-generated sources.

>>
>> I heard D builds times are important part of its adoption so...
>
> Adoption - yes. Production usage - less so (though still important).
> Difference between 1 second and 5 seconds is very important. Between 10
> seconds and 1 minute - not so much.
>
> JIT will be probably slower than stand-alone generators but not that
> slower.
>
>> It might solve most of _current_ problems, but I foresee fundamental
>> issues of "no global state" in CTFE that in say 10 years from now
>> would look a lot like `#include` in C++.
>
> I hope 10 years ago from now we will consider having global state in
> RTFE stone age relict :P

Well, no amount of purity dismisses the point that a cache is a cache. 
When I say global in D I mean thread/fiber local.

>
>> A major one is there is no way for compiler to not recompile generated
>> code as it has no knowledge of how it might have changed from the
>> previous run.
>
> Why can't we merge basic build system functionality akin to rdmd into
> compiler itself? It makes perfect sense to me as build process can
> benefit a lot from being semantically aware.

I wouldn't cross my fingers, but yes ideally it would need to have 
powers of a build system making it that much more complicated. Then it 
can cache results including templates instantiations across module and 
separate invocations of the tool. It's a distant dream though.

Currently available caching at the level of object files is very coarse 
grained and not really helpful to our problem at hand.

-- 
Dmitry Olshansky