compile-time regex redux

Wed Feb 7 18:54:11 PST 2007

Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kenny wrote:
>>> Walter, I don't hate regex -- I just don't use it. It seems to me 
>>> that to figure out regex syntax takes longer than writing quick 
>>> for/while statements, and I usually forget cases in regex too...
>>
>> I think this is an age-old issue: if you don't know something, you 
>> find it harder to do things that way. The telling sign is that people 
>> who know _both_ simple loops and regexes do use regexes, and as a 
>> consequence are way more productive at a certain category of tasks.
> 
> Hmm.  More productive, probably.   Writing better code?  Not clear.  I 
> would guess that in many cases the results are not as easy to maintain 
> as non-regexp code.

I don't think the guess is that right. Following the logic of even a 
simple parsing task (e.g. floating-point number in all of its splendor) 
is horrendous. For somebody who knows regexes, the pattern is obvious in 
a second.

I do agree that code written by somebody who knows regexes is 
hard-to-maintain by somebody who does not know regexes, but that's 
pretty much self-understood and goes with any other technique.

All I can say is that I got significantly enriched and more effective as 
a programmer at large after I sat down and understood Perl's regex 
bestiary. I now see my previous arguments against them as 
rationalizations of my resistance to go through the effort of learning. 
Again comparing myself with my former self, I understand it's hard to 
discuss relative advantages and disadvantages with someone who doesn't 
know them because of a bootstrap problem: I say they make code much 
simpler and easier to comprehend, while my former self would say exactly 
the opposite. It's pretty much like math notation, eating vegetables, or 
classical music: it's hard to bootstrap oneself into appreciating it.

> Anyway, I think the question is whether compile-time regexp is really 
> the right level of abstraction to be targeting.  Wouldn't it be 
> infinitely better to have the compile-time code facilities be so good 
> that you could just write a regexp parser as a compile-time D library?

This is possible in today's D. The problem is that it would be a Pyrrhic 
victory: the resulting engine would be very slow and big.

I do agree that it would be nice to look into creating compile-time 
amenities that make such an engine fast and small.

> I mean what is regexp, but a particular DSL?  If the new facilities are 
> trying to make DSL's easier to create, regexp is a great target DSL.  So 
> what compile-time language facilities do you need to implement an 
> efficient and clean compile-time regexp library?

Conceptually, you'd need the following: (1) compile-time functions, (2) 
compile-time mutable variables, and (3) compile-time loops. We already 
have the rest. Then you can write compile-time code as comfortably as 
writing run-of-the-mill run-time code. D is heading that way, but with 
small steps.

Implementation-wise, string-based templates must be made cheaper. If 
we'll have compile-time mutation probably this is not going to be much 
of a problem because much functional-style code can be written using 
mutation. I personally enjoy functional-style code, but it's not really 
needed during compilation and is a bit foreign from the rest of D, which 
remains largely imperative.

> It would be nice if we could write more-or-less generic D code with a 
> few compile time restrictions.  For instance you can write any function 
> you want that takes only const values as arguments and returns a const 
> value, and refers to only global const values and other such const-only 
> functions.

Templates already do that, albeit with a slightly odd syntax. But stay 
tuned, Walter is eyeing $ as the prefix to denote compile-time 
variables, and sure enough, compile-time functions will then emerge 
naturally :o).

Andrei