std.regex literal syntax (the \Q…\E escape sequence)

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Dec 18 11:21:13 PST 2013


18-Dec-2013 22:33, Andrej Mitrovic пишет:
> I'm reading through http://www.regular-expressions.info, and there's a
> feature that's missing from std.regex,
> quoted:
>
> -----
> All the characters between the \Q and the \E are interpreted as
> literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The
> \E may be omitted at the end of the regex, so \Q*\d+* is the same as
> \Q*\d+*\E.

[snip]
> Should this feature be added? I guess there's probably more regex
> features missing (I just began reading the page), I'm not sure how
> Dmitry feels about adding X number of features though.

All in all I wanted to be principled about what set of features to 
support. The initial design was:
1. Choose a syntax flavor (ECMAScript)
2. Add some powerful stuff (e.g. unlimited lookbehind, full unicode-support)
3. Add some convenient stuff that is popular enough/easy to implement 
(named captures).
4. Avoid extensions that complicate engine and preclude optimizations, 
or heavily depend on implementation. (So no recursion and similar madness)

In that light 'missing' might be on purpose. For instance std.regex 
doesn't provide 'atomic'(possessive) groups simply because it's a kludge 
invented for poor (performance of) backtracking engines.

By the end of day any feature is interesting as long as we carefully weight:

- how useful a feature is
- how widespread the syntax/how many precedents in other libraries

against

- how difficult to implement
- does it affect backwards compatibility
- any other hidden costs

I'd be glad to implement well motivated enhancement requests.

P.S. This reminds me to put a roadmap of sorts on where std.regex is 
going and what to expect.

-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list