Would there be interest in a SERIOUS compile-time regex parser?

Reiner Pope reiner.pope at REMOVE.THIS.gmail.com
Tue Oct 17 23:40:58 PDT 2006


Knud Sørensen wrote:
> On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:
> 
>> Don Clugston wrote:
>>> There's a potential piece of syntax sugar that IMHO would be a killer 
>>> feature. If it were possible to overload a function based on whether a 
>>> parameter is a compile-time literal, or a variable, the use of template 
>>> metaprogramming could become completely invisible to the user.
>> That is a good idea.
> 
> That put a crazy idea into my head.
> 
> disclaimer: My education is in physics not computer science, 
> I have no idea of how to write a compiler so my view may be very wrong ;-)
> 
> I have previously suggested that D handle functions on compile-time
> literal by making them inline and let the optimizer evaluate them. 
>  
> Would it be possible to combine the template-engine and the optimizer
> such that one of them could be refactored away ?? 
This is a very interesting question and I've thought a fair bit about this.

Three short answers:

1. This would be easy to do (ie it is only heavy syntax sugar) for a 
limited subset of the language (this subset would exclude reference 
semantics).

2. It would be possible to do for almost the entire language, but would 
require a real interpreter at compile time, which means a fair bit more 
work (but still possible and very appealing).

3. Neither of these two mechanisms would be enough to do away with the 
primary motivation for templates -- parameterised types. However, they 
would remove the need for a Turing-complete template system.


<rant>
Long answers:

1. Value semantics are inherently easier for a compiler to handle, 
because evaluation of them is just done by inlining, const folding and 
const propagation. Right now, an automated conversion of normal D 
functions to template form is pretty much possible, using the following 
rules:
    - convert every function to a template
    - convert every assignment to a variable into static single 
assignment form:
         a = 5;
         a = 6;
             becomes
         const a__1 = 5;
         const a__2 = 6;
    - convert function calls into template instantiations
         a = foo();
             becomes
         const a__1 = foo!().__val;
    - convert returns into
        const __val = whatever;
    - now that everything is constant, we can use static if instead of 
if everywhere. So, just convert all ifs into static ifs.
    - convert loops into recursive functions:
       while (condition)
       {  ... }
         becomes
       template __Loop1(local_variables....)
       {
          // Do stuff
          static if (condition) __val = _Loop10!(new_local_variables).__val;
       }

This basically does it but, since the template system only allows value 
semantics, this conversion must only allow them, so it means:
   - no pointers
   - no classes
   - no index-setting/slice-setting of arrays
   - no inout/out parameters
and also
   - no assembly
   - no C code
because the template system can't be expected to handle them.


2. By implementing a real interpreter in the compiler, you can handle 
reference semantics. The only things you can't handle are
   - assembly
   - other languages
   - global non-const variables (which should generally be avoided anyway)




Both of these systems would incorporate a way of saying, 'evaluate this 
at compile time', even if it were just as simple as declaring the 
variable const, using it as a template parameter or in a static if:

   version (good) { const int N = 80; }
   else { const int N = 16 }
   const foo = sqrt(N); // foo is 4
   static if (sqrt(foo) > 3) // that expression must be a compile-time 
constant
     alias ubyte MyByte;
   else
     alias byte MyByte;
   List!(foo, MyByte) myList;

The main motivations for this are, (in decreasing order of importance):
   - ability to use runtime functions for static if and template 
parameters (where static if is important because of
   - early catching of errors (eg incorrect regexes)
   - efficiency (eg not having to compile the regex at runtime; also, 
having cheap reflection at compile time)



In my opinion, fully supporting an interpreter at compile time simply 
turns metaprogramming into normal D with the requirement that it be 
optimized away by the compiler, and the extensibility and opportunities 
that this could allow is a huge motivation. However, being a very large 
amount of work with a lot of details, there is no hope of it being 
completed any time soon, but I think the motivation is so much that it 
is worth devoting an entire release of D (maybe 2.0 or 3.0) to being a 
'metaprogramming release.'

</rant>

Cheers,

Reiner



More information about the Digitalmars-d mailing list