[Issue 8725] segmentation fault with negative-lookahead in module-level regex
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Wed Sep 26 06:46:19 PDT 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8725
Dmitry Olshansky <dmitry.olsh at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dmitry.olsh at gmail.com
--- Comment #2 from Dmitry Olshansky <dmitry.olsh at gmail.com> 2012-09-26 06:46:49 PDT ---
I suspect that is a long standing bug with compile-time evaluation that
compiler parses regex pattern at compile time wrongly (unlike at R-T).
See also: http://d.puremagic.com/issues/show_bug.cgi?id=7810
The problem is that once D compiler sees an initialized global variable it has
to const-fold it:
int fact10 = factorial(10);
//will compute and hardcode the value of factorial(10)
then with regex ...:
auto italic = regex( ... );
// *parses* and *generates* binary object for compiled regex pattern object
with all the datastructures for matching it
All of this *at compile time* via CTFE, see about it here (near the bottom of):
http://dlang.org/function.html
Though previously it only caused unexpectedly long compilation time (CTFE is
slow) and in a select cases it failed with assert *during compilation*, it
never segfaulted.
Probably internal structure has subtle corruption that self-test failed to
catch.
E.g this one also works because italic regex is created at run-time:
import std.stdio;
import std.regex;
void main() {
auto italic = regex( r"\*
(?!\s+)
(.*?)
(?!\s+)
\*", "gx" );
string input = "this * is* interesting, *very* interesting";
writeln( replace( input, italic, "<i>$1</i>" ) );
}
Also a tip: the second lookahead should be lookbehind! As is is it will test
that \* is not a space indeed... Also both can be just \s, because \s+ matches
whenever \s matches. And since you don't capture the contents of
lookahead/lookbehind it'll be faster/simpler to use a single \s.
About SafeD: it shouldn't segfault but the program listed is @system (as this
is the default) :). Otherwise since regex is @trusted, it's my responsibilty to
verfiy that it is memory safe, so blame me (or rather the compiler).
To be actually in SafeD try putting @safe: at the top of your code or just tag
main and all functions with @safe.
AFAIK writeln in SafeD wouldn't work as it's still @system (obviously it
should be safe/trusted). To be honest SafeD hasn't been addressed properly in
the standard library yet.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list