regex problems

AsmMan via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Sep 20 12:43:17 PDT 2014


On Saturday, 20 September 2014 at 15:28:54 UTC, seany wrote:
> consider this:
>
>
> import std.conv, std.algorithm;
> import core.vararg;
> import std.stdio, std.regex;
>
> void main()
> {
>
>     string haystack = "ID : generateWorld;
> 						    Position : { &
> 										      {ID : \" absolute ; Coordinate : , NULL OMEGA;}
> 										      {ID : \" inclusion ; Coordinate : UNDEF;}
> 										      {ID : \" subarc; Coordinate : , NULL OMEGA;	}
> 								      }; ID : ";
> 								
>     // thus, something like *{B}* can not end here,
>     // but something like X can start here.
>
>     string needle = 
> "(?<!(([.\n\r])*(\\{)([.\n\r])*))(ID(\\p{White_Space})*:(\\p{White_Space})*)(?!(([.\n\r])*(\\})([.\n\r])*))";
>
>     auto r = regex(needle, "g");
>     auto m = matchAll(haystack, r);
>
>     foreach (c; m)
>       writeln(c.hit);
>
> }
>
>
> So let us break up needle:
>
> (
> ?<!
>   (
>     ([.\n\r])*(\\{)([.\n\r])*
>   )
> )
>
> Do not match somthing, that may contain a "*{*" as a leading 
> match, * this time means any character, including \n and \r
>
> (ID(\\p{White_Space})*:(\\p{White_Space})*)
>
> however, look for the form : "ID" <few blank spaces> ":" < more 
> blank spaces>
>
> (?!(([.\n\r])*(\\})([.\n\r])*))
>
> but no trailing "*}*" as a trailing match.
>
> In haystack, there are two such "ID :" -s. once at the 
> beginning, ID : generateWorld. and then the final, last ID
>
> However, this is returning all 5 ID-s as match
>
> what am I doing wrong?

Is this string a JSON string? if so, why not use a proper JSON 
parsing library?
as other already mentioned, this kind of data isn't good to parse 
using regex... write small routines to parse that data instead 
of. It isn't more hard than make it working using regexp. 
Seriously.


More information about the Digitalmars-d-learn mailing list