[draft] New std.regex walkthrough

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Mar 14 02:02:06 PDT 2012


On 14.03.2012 0:32, Brad Anderson wrote:
> On Tue, Mar 13, 2012 at 1:27 PM, Dmitry Olshansky <dmitry.olsh at gmail.com
> <mailto:dmitry.olsh at gmail.com>> wrote:
>
>     For a couple of releases we have a new revamped std.regex, that as
>     far as I'm concerned works nicely, thanks to my GSOC commitment last
>     summer. Yet there was certain dark trend around std.regex/std.regexp
>     as both had severe bugs, missing documentation and what not, enough
>     to consider them unusable or dismiss prematurely.
>
>     It's about time to break this gloomy aura, and show that std.regex
>     is actually easy to use, that it does the thing and has some nice
>     extras.
>
>     Link: http://blackwhale.github.com/__regular-expression.html
>     <http://blackwhale.github.com/regular-expression.html>
>
>     Comments are welcome from experts and newbies alike, in fact it
>     should encourage people to try out a few tricks ;)
>
>     This is intended as replacement for an article on dlang.org
>     <http://dlang.org>
>     about outdated (and soon to disappear) std.regexp:
>     http://dlang.org/regular-__expression.html
>     <http://dlang.org/regular-expression.html>
>
>     [Spoiler] one example relies on a parser bug being fixed (blush):
>     https://github.com/D-__Programming-Language/phobos/__pull/481
>     <https://github.com/D-Programming-Language/phobos/pull/481>
>     Well, it was a specific lookahead inside lookaround so that's not
>     severe bug ;)
>
>     P.S. I've been following through a bunch of new bug reports
>     recently, thanks to everyone involved :)
>
>
>     --
>     Dmitry Olshansky
>
>
> Second paragraph:
> - "..,expressions, though one though one should..." has too many "though
> one"s
>
> Third paragraph:
> - "...keeping it's implementation..." should be "its"
> - "We'll see how close to built-ins one can get this way." was kind of
> confusing.  I'd consider just doing away with the distinction between
> built in and non-built in regex since it's an implementation detail most
> programmers who use it don't even need to know about.  Maybe say that it
> is not built in and explain why that is a neat thing to have (meaning,
> the language itself is powerful enough to express it in user code).
>

Yeah, the point about built-in vs library is kind of dangling in the air 
for now. Will see how to wrap it up.

> Fourth paragraph:
> - "...article you'd have..." should probably be "you'll" or, preferably,
> "you will".
> - "...utilize it's API..." should be "its"
> - "yet it's not required to get an understanding of the API." I'd
> probably change this to "...yet it's not required to understand the API"
>
> Lost track of which paragraph:
> - "... that allows writing a regex pattern in it's natural notation"
> another "its"
> - "trying to match special characters like" I'd write "trying to match
> special regex characters like" for clarity
> - "over input like e.g. search or simillar" I'd remove the e.g., write
> search as "search()" to show it's a function in other languages and fix
> the spelling of similar :P
> - "An element type is Captures for the string type being used, it is a
> random access range." I just found this confusing.  Not sure what it's
> trying to say.
> - "I won't go into full detail of the range conception, suffice to say,"
> I'd change "conception" to "concept" and remove "suffice to say". (It's
> a shame we don't a range article we can link to).
> - "At that time ancors like" misspelled "anchors"

All to the point and fixed.

> - "Needless to say, one need not" I'd remove the "Needless to say,"
> because I think it's actually important to say :P

It's not important, as it has no effect on matching if there no anchors. 
It's just cleaner to the reader, because it alerts along the way of "hm, 
this guy don't know what multi-line is, let's stay sharp and watch out 
for other problems".

> - "replace(text, regex(r"([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})","g"),
> "--");" Is this code example correct?  It references $1, $2, etc. in the
> explanatory paragraph below but they are no where to be found.

Damnable DDoc ate my dollars!
And that's inside source code section, any ideas on how to avoid this mess?

> - When you are explaining named captures it sounds like you are about to
> show them in the subsequent code example but you are actually showing
> what it'd look like without them which was a bit confusing.
> - Maybe some more words on what lookaround/lookahead do as I was lost.

> - "Amdittedly, barrage of ? and ! makes regex rather obscure, more then
> it's actually is. However" should be "Admittedly, the barrage of ? and !
> makes the regex rather obscure, more than it actually is.".  Maybe
> change "obscure" to a different adjective. Perhaps "complex looking" or
> "complicated". (note I've removed the "However" as the upcoming sentence
> isn't contradicting what you just said.
> - "Needless to say it's", again, I think it's rather important to say :P

Here I concur ;)

> - "Run-time version took around 10-20us on my machine, admittedly no
> statistics." here, borrow this "µ" :P.  Also, I'd get rid of "admittedly
> no statistics".
> - "meaningful tasks, it's features" another "its"
> - "together it's major" and another :P

Yeah, that an "it's" killing parade :)]

> - "...flexible tools: match, replace, spliter" should be spelled "splitter"
>
>
> Great article.  I didn't even know about the replacement delegate
> feature which is something I've often wished I could use in other regex
> systems.  D and Phobos need more articles like this.  We should have a
> link to it from the std.regex documentation once this is added to the
> website.
>

Thanks again.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list