Compile time regex matching

Philippe Sigaud via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon Jul 14 04:42:51 PDT 2014


> I am trying to write some code that uses and matches to regular expressions
> at compile time, but the compiler won't let me because matchFirst and
> matchAll make use of malloc().
>
> Is there an alternative that I can use that can be run at compile time?

You can try Pegged, a parser generator that works at compile-time
(both the generator and the generated parser).

https://github.com/PhilippeSigaud/Pegged

docs:

https://github.com/PhilippeSigaud/Pegged/wiki/Pegged-Tutorial

It's also on dub:

http://code.dlang.org/packages/pegged

It takes a grammar as input, not a single regular expression, but the
syntax is not too different.


  import pegged.grammar;

  mixin(grammar(`
  MyRegex:
      foo <- "abc"* "def"?
  `));

  void main()
  {
      enum result = MyRegex("abcabcdefFOOBAR"); // compile-time parsing

      // everything can be queried and tested at compile-time, if need be.
      static assert(result.matches == ["abc", "abc", "def"]);
      static assert(result.begin == 0);
      static assert(result.end == 9);

      pragma(msg, result.toString()); // parse tree
  }


It probably does not implement all those regex nifty features, but it
has all the usual Parsing Expression Grammars powers. It gives you an
entire parse result, though: matches, children, subchildren, etc. As
you can see, matches are accessible at the top level.

One thing to keep in mind, that comes from the language and not this
library: in the previous code, since 'result' is an enum, it'll be
'pasted' in place everytime it's used in code: all those static
asserts get an entire copy of the parse tree. It's a bit wasteful, but
using 'immutable' directly does not work here, but this is OK:

    enum res = MyRegex("abcabcdefFOOBAR"); // compile-time parsing
    immutable result = res; // to avoid copying the enum value everywhere

The static asserts then works (not the toString, though). Maybe
someone more knowledgeable than me on DMD internals could certify it
indeed avoid re-allocating those parse results.


More information about the Digitalmars-d-learn mailing list