compile-time regex redux

Andrei Alexandrescu (See Website For Email) SeeWebsiteForEmail at erdani.org
Wed Feb 7 16:07:33 PST 2007


kris wrote:
> Walter Bright wrote:
>> String mixins, in order to be useful, need an ability to manipulate 
>> strings at compile time. Currently, the core operations on strings 
>> that can be done are:
>>
>> 1) indexed access
>> 2) slicing
>> 3) comparison
>> 4) getting the length
>> 5) concatenation
>>
>> Any other functionality can be built up from these using template 
>> metaprogramming.
>>
>> The problem is that parsing strings using templates generates a large 
>> number of template instantiations, is (relatively) very slow, and 
>> consumes a lot of memory (at compile time, not runtime). For example, 
>> ParseInteger would need 4 template instantiations to parse 5678, and 
>> each template instantiation would also include the rest of the input 
>> as part of the template instantiation's mangled name.
>>
>> At some point, this will prove a barrier to large scale use of this 
>> feature.
>>
>> Andrei suggested using compile time regular expressions to shoulder 
>> much of the burden, reducing parsing of any particular token to one 
>> instantiation.
>>
>> The last time I introduced core regular expressions into D, it was 
>> soundly rejected by the community and was withdrawn, and for good 
>> reasons.
>>
>> But I think we now have good reasons to revisit this, at least for 
>> compile time use only. For example:
>>
>>     ("aa|b" ~~ "ababb") would evaluate to "ab"
>>
>> I expect one would generally only see this kind of thing inside 
>> templates, not user code.
> 
> compile-time regex is only part of the picture. A small one too. I 
> rather expect we'd wind up finding the manner it was exposed was just 
> too limiting in one way or another. Exposing, as was apparently 
> suggested, the full API of RegExp inside the compiler sounds a tad 
> distasteful.

Au contraire, I think it's a definite step in the right direction. 
Writing programs that write programs is a great way of doing more with 
less effort. Various languages can do that to various extents, and it's 
very heartening that D is taking steps in that direction. Allowing the 
programmer to manipulate strings during compilation is definitely a good 
step.

> You'll perhaps forgive me if I question whether this is driven primarily 
> from an academic interest?  What I mean is this: if and when D goes 
> mainstream, perhaps just one in ten-thousand developers will actually 
> use this kind of feature more than 5 times (and still find themselves 
> limited). Perhaps I'm being generous with those numbers also?

Perhaps, just like me, you simply aren't in the position to evaluate 
them. I will notice, however, a few historical trends. C++ got a shot in 
the arm from the STL. STL = advanced programming. Interesting. The STL 
did much to educate the C++ community towards code generation, which 
continues to be the reason why many influential gurus hang out with C++.

Java tried to radically simplify things. It did get many complicated 
things right (safety, security), particularly those that were in the 
requirements early on. As of the features that Java initially stayed 
away from, a pattern I noticed in the Java circles is that pundits 
condemn, ridicule, or demean a feature or technique until Java 
implements it. Of course, implementing it while the language already has 
immovable parts is less clean. The net result is that now Java does have 
many of the advanced features that once were deemed uninteresting, and a 
history-based prediction is that it will continue to move in that direction.

C# also started simple, just to add even more advanced and more (it 
would appear) exotic features than Java. Again, it's natural to predict 
that the language will move towards recognizing and integrating advanced 
features.

To survive, D must compensate for its relative lack of clout and 
publicity by offering above and beyond what more mainstream languages 
offer.

> What is wrong with runtime execution anyway? It sure is easier to write 
> and maintain clean D code than (for many ppl) complex concepts that are, 
> what amount to, nothing more than runtime optimizations. Isn't that true?

No. Accommodating DSLs and generating code has more to do with 
correctness and avoiding duplication of source code, than anything else.

> It would seem that adding such features does not address the type of 
> things that would be useful to 80% of developers? Surely that should be 
> far more important?

No. You are missing a key point - that some code is more influential 
than other. 2% of programmers may write libraries that work for 90% of 
programmers.

> And, no ... I'm not just pooh poohing the idea ... I'm really serious 
> about D getting some realistic market traction, and I don't see how 
> adding more compile-time 'specialities' can help in any way other than 
> generating a little bit of 'novelty' interest. Isn't this a good example 
> of "premature optimization" ?

No. As I said above, optimization has exceedingly little to do with it.

Consider as an example the "white hole" and "black hole" pattern. Given 
an interface:

interface A
{
   int foo();
   void bar(int);
   float baz(char[]);
}

a "white hole" class is an implementation of A that implements all 
methods to throw, and a "black hole" class is an implementation of A 
that implements all methods to return the default value of the return type.

This pattern is very useful for either quick starting points for writing 
true classes implementing A, or as standalone degenerate implementations.

To some programmers, black and white holes might not even raise a 
"duplicated code" flag. They sit down and write:

class WhiteHoleA
{
   int foo()
   {
     throw new Exception("foo not implemented");
   }
   void bar(int);
   {
     throw new Exception("bar(int) not implemented");
   }
   float baz(char[]);
   {
     throw new Exception("baz(char[]) not implemented");
   }
}

and

class BlackHoleA
{
   int foo()
   {
     return int.init;
   }
   void bar(int);
   {
   }
   float baz(char[]);
   {
     return float.init;
   }
}

But if the language is advanced enough, it readily offers such rapid 
development goodies as library elements:

alias black_hole!(A) BlackHoleA;
alias white_hole!(A) WhiteHoleA;

This has nothing to do with optimization. It is all about abstraction, 
saving duplication, and allowing expressive code.

> Surely some of the others long-term concerns, such as solid debugging 
> support, simmering code/dataseg bloat, lib support for templates, etc, 
> etc, should deserve full attention instead? Surely that is a more 
> successful approach to getting D adopted in the marketplace?
 >
> Lot's of questions, and I hope you can give them serious consideration, 
> Walter.

I think it's good to be sure only when there's a solid basis.


Andrei



More information about the Digitalmars-d mailing list