Questions about builtin RegExp

Chris Sauls ibisbasenji at gmail.com
Sun Feb 19 13:42:35 PST 2006


Andrew Fedoniouk wrote:
> "Walter Bright" <newshound at digitalmars.com> wrote in message 
> news:dt9ho8$20e4$3 at digitaldaemon.com...
> 
> 
>>>>Writing a real lexer takes a lot of effort. That's why people invented 
>>>>regex, it'll handle most jobs without having to write a lexer. C's 
>>>>strtok() is embarassingly inadequate.
>>>
>>>Why?
>>
>>I'd like to see strtok() parse an email address out of a body of text.
>>
> 
> 
> I don't really understand "parse an email address out of a body of text."
> 
> Do you mean something like this:
> 
> char* pw = text;
> url u;
> 
> forever
> {
>   pw = strtok( pw, " \t\n\r" ); if( !pw ) return;
>   if( !u.parse(pw) ) continue;
>   if( u.protocol() == url::MAILTO )
>      //found - do something here
>      ;
> };
> 
> ?
> 
> Andrew. 
> 
> 

I think he meant something more like (using MatchExpr, sorry):

# char[] text = ...;
# char[] addr, user, host, tld;
# if (`([_a-z0-9]*)@([_a-z0-9]*).([_a-z0-9]*)` ~~ text) {
#   addr = _match[0];
#   user = _match[1];
#   host = _match[2];
#   tld  = _match[3];
#
#   // do something
# }

Granted, I just tossed that together in five seconds flat, so its probably not quite 
right.  I'm just recently starting to lean into the RegExp camp myself.  Its made parsing 
of Lyra scripts a dream.

One thing I miss from a scripting language in doing the above, is PHP's lovely list() 
construct.  Pretending we had this in D:

# char[] text = ...;
# char[] addr, user, host, tld;
# if (`([_a-z0-9]*)@([_a-z0-9]*).([_a-z0-9]*)` ~~ text) {
#   list(addr,user,host,tld) = _match;
#   // do something
# }

-- Chris Nicholson-Sauls



More information about the Digitalmars-d mailing list