Limitation with current regex API

Jerry jlquinn at optonline.net
Mon Jan 16 11:28:40 PST 2012


Hi all,

In general, I'm enjoying the regex respin.  However, I ran into one
issue that seems to have no clean workaround.

Generally, I want to be able to get the start and end indices of
matches.  With the complete match, this info can be pieced together with
match.pre().length and match.hit.length().  However, I can't do this
with captures.

For an example: I have a string and the regex .*(a).*(b).*(c).*.  I want
to find where a, b, and c are located when I match.  As far as I can
tell, the only way to do this would be to capture every chunk of text,
then iterate to determine the offsets.  That seems wasteful.

If you look at the ICU and Java regex APIs, you'll see that this
information is retrievable.  I believe it's available under the covers
of the D regex library API too.

Can this please be exposed?  It's very helpful for doing text processing
where you need to be able to align the results of multiple
transformations to the input text.

Thanks
Jerry




More information about the Digitalmars-d mailing list