Empty subexpressions captures in std.regex

PC petevik38 at yahoo.com.au
Mon Jul 12 14:36:07 PDT 2010


Sorry about the lack of clarity in the last post. I actually
commented out the call to the Regex.optimize in Regex.compile.

    auto r1 = regex( "(a*)b" );
    r1.printProgram();

Prints out:

printProgram()
  0: 	REtestbit 98, 13
 18: 	REparen len=15 n=0, pc=>42
 27: 	REnm  len=2, n=0, m=4294967295, pc=>42
 40: 	REchar 'a'
 42: 	REchar 'b'
 44: 	REend

With optimize(buf); commented out I get:

printProgram()
  0: 	REparen len=15 n=0, pc=>24
  9: 	REnm  len=2, n=0, m=4294967295, pc=>24
 22: 	REchar 'a'
 24: 	REchar 'b'
 26: 	REend

I don't understand why REtestbit is inserted at the start of the
program by the optimize routine, but it will not match if there
is no "a" at the start of the input (e.g. "b").

I think I need to spend some more time looking through the
regex.d source to understand it better

- Pete


== Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s
article
> Hi PC,
> Thanks for your kind words.
> Regarding regex, we need to get a report into bugzilla so we keep
track
> of the problem. When you say "disable the call to optimize" are you
> referring to the -O compiler flag? In that case it's a compiler
problem
> (otherwise it might be a library issue). Could you please clarify?
> Thanks,
> Andrei
> On 07/11/2010 06:29 AM, PC wrote:
> > Hi, I've been lurking in this group for a few months, have read
> > through TDPL (which is great Andrei) and have started using D for
> > some
> > small programs. So far it's been a joy to use (you may have a C++
> > convert on your hands) and with the convenience of rdmd, I've been
> > using it where I'd normally use a scripting language.
> >
> > It's been pretty good for this especially as Phobos has had almost
> > everything I've wanted to do covered. I have run into some issues
> > with
> > std.regex matching empty subexpressions though (dmd 2.047, win32):
> >
> >      auto r1 = regex( "(a*)b" );
> >      auto m = match( "b", r1 );
> >      writefln( "captures = %s, empty = %s", m.captures.length,
> > m.empty );
> >
> > =>  captures = 0, empty = true
> >
> > If I disable the call to optimize, it gives the expected results:
> >
> > =>  captures = 2, empty = false
> >
> > Also, with optimize disabled:
> >
> >      auto r = regex("([^,]*),([^,]*),([^,]*)");
> >      m = match( ",,", r );
> >      writefln( "captures = %s, empty = %s", m.captures.length,
> > m.empty );
> >
> > =>  captures = 3, empty = false
> >
> > I noticed in Captures:
> >
> >          @property size_t length()
> >          {
> >              foreach (i; 0 .. matches.length)
> >              {
> >                  if (matches[i].startIdx>= input.length) return i;
> >              }
> >              return matches.length;
> >          }
> >
> > In this case matches[3].startIdx = 2 and matches[3].endIdx=2.
Should
> > this line be:
> >
> >       if (matches[i].startIdx>  input.length) return i;
> >
> >
> > Anyway kudos to everyone involved with D, I'm certainly going to
be
> > using it a lot in the future.



More information about the Digitalmars-d mailing list