DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Sun Jun 15 14:38:12 PDT 2014

15-Jun-2014 20:21, Dicebot пишет:
> On Saturday, 14 June 2014 at 16:34:35 UTC, Dmitry Olshansky wrote:
>> But let's face it - it's a one-time job to get it right in your
>> favorite build tool. Then you have fast and cached (re)build.
>> Comparatively costs of CTFE generation are paid in full during _each_
>> build.
>
> There is no such thing as one-time job in programming unless you work
> alone and abandon any long-term maintenance. As time goes any mistake
> that can possibly happen will inevitably happen.

The frequency of such event is orders of magnitude smaller. Let's not 
take arguments to supreme as then doing anything is futile due to the 
potential of mistake it introduces sooner or later.

>>> In your proposed scenario there will be two different generated files
>>> imported by server and client respectively. Tiny typo in writing your
>>> build script will result in hard to detect run-time bug while code
>>> itself still happily compiles.
>>
>> Or a link error if we go a hybrid path where the imported module is
>> emitting declarations/hooks via CTFE to be linked to by the proper
>> generated code. This is something I'm thinking that could be a
>> practical solution.
>>
>> <snip>
>
> What is the benefit of this approach over simply keeping all ctRegex
> bodies in separate package, compiling it as a static library and
> referring from actual app by own unique symbol? This is something that
> can does not need any changes in compiler or Phobos, just matter of
> project layout.

Automation. Dumping the body of ctRegex is manual work after all, 
including putting it with the right symbol. In proposed scheme it's just 
a matter of copy-pasting a pattern after initial setup has been done.

> It does not work for more complicated cases were you actually need
> access to generated sources (generate templates for example).

Indeed, this is a limitation, and the import of generated source would 
be required.

>>> You may keep convenience but losing guarantees hurts a lot. To be able
>>> to verify static correctness of your program / group of programs type
>>> system needs to be aware how generated code relates to original source.
>>
>> Build system does it. We have this problem with all of external deps
>> anyway (i.e. who verifies the right version of libXYZ is linked not
>> some other?)
>
> It is somewhat worse because you don't routinely change external
> libraries, as opposed to local sources.
>

But surely we have libraries that are built as separate project and are 
"external" dependencies, right? There is nothing new here except that 
"d-->obj-->lib file" is changed to "generator-->generated D file--->obj 
file".

>>> Huge mess to maintain. According to my experience
>>> all builds systems are incredibly fragile beasts, trusting them
>>> something that impacts program correctness and won't be detected at
>>> compile time is just too dangerous.
>>
>> Could be, but we have dub which should be simple and nice.
>> I had very positive experience with scons and half-generated sources.
>
> dub is terrible at defining any complicated build models. Pretty much
> anything that is not single step compile-them-all approach can only be
> done via calling external shell script.

I'm not going to like dub then ;)

> If using external generators is
> necessary I will take make over anything else :)

Then I understand your point about inevitable mistakes, it's all in the 
tool.

>> <snip>
>
> tl; dr: I believe that we should improve compiler technology to achieve
> same results instead of promoting temporary hacks as the true way to do
> things. Relying on build system is likely to be most practical solution
> today but it is not solution I am satisfied with and hardly one I can
> accept as accomplished target.
> Imaginary compiler that continuously runs as daemon/service, is capable
> of JIT-ing and provides basic dependency tracking as part of compilation
> step should behave as good as any external solution with much better
> correctness guarantees and overall user experience out of the box.

What I want to point out is to not mistake goals and the means to an 
end. No matter how we call it CTFE code generation is just a means to an 
end, with serious limitations (especially as it stands today, in the 
real world).

Seamless integration is not about packing everything into single 
compiler invocation:

dmd src/*.d

Generation is generation, as long as it's fast and automatic it solves 
the problem(s) meta programming was established to solve.

For instance if D compiler allowed external tools as plugins (just an 
example to show means vs ends distinction) with some form of the 
following construct:

mixin(call_external_tool("args", 3, 14, 15, .92));

it would make any generation totally practical *today*. This was 
proposed before, and dismissed out of fear of security risks, never 
identifying the proper set of restrictions. After all we have textual 
mixins of potential security risk no problem.

Let's focus on the facts that this has the benefits of:
- sane debugging of the plug-in (it's just a program with the usual symbols)
- fast, as the tool could be built with full optimization flags or run 
as service
- trivially able to cache things across builds and even per each AST node
- easy to implement (as in next release)
- may include things inexpressible in CTFE like calling into external 
systems and vendor-specific tools

That will for instance give as ability to have practical C-->D 
transparent header inclusion as say:

extern mixin(htod("some_header.h"));

How long till C preprocessor is working at CTFE? How long till it's 
practical to do:

mixin(htod(import("some_header.h")));

and have it done optimally fast at CTFE?

My answer is - no amount of JITing CTFE and compiler architecture 
improvements in foreseeable future will get it better then standalone 
tool(s), due to the mentioned _fundamental_ limitations.

There are real practical boundaries on where an internal interpreter can 
stay competitive.

-- 
Dmitry Olshansky