Java > Scala

Dmitry Olshansky dmitry.olsh at gmail.com
Fri Dec 2 12:28:36 PST 2011


On 02.12.2011 15:32, Marco Leise wrote:
> The import problem in std.file has been fixed on GitHub, but I couldn't
> get FReD to compile this regex:
>
> enum regex = ctRegex!r"relay=([\w\-\.]+[\w]+)[\.\,]*\s";
>
> Instead I'm using this one:
>
> enum regex = ctRegex!r"relay=([A-Za-z0-9_\-.]+[A-Za-z0-9_]+)[.,]*\s";
>
> Both \. and \w inside seem to cause problems. \- was also troublesome,
> but easy to add a case in the parser looking at how \r is handled.
>

First of all, sorry for some messy problems with escapes in character 
classes. If we all agree to just treat anything non-special after \  as 
is then I'll add it. Second, I might take a shot at optimizing engine, 
once OSX problem is figured out.

> Then I started optimizing with these steps:
>
> 1. Run a 64-bit build instead of a 32-bit build :D
> 30.2 s => 14.4 s
>
> 2. use "auto regex = ctRegex!..." insdead of "enum regex = ctRegex!..."
> 14.4 s => 6.4 s
>

Well, another thing to try is gdc/ldc. Last time I succeeded in this 
endeavor with -O3 it yielded a small boost of ~ 5%.

> For comparison: the Java version takes 5.3 s here.

Don't kill me ;)
Seriously... they must be doing no decoding of UTF. Another option is 
Boyer-moor on "relay=". It would be interesting to search for something 
a little bit more fussy e.g. "r[eE]lay=" or something like that just to 
see if it has any effect.

>
> That left me with the following profile chart of function calls > %1
> time. The percentages don't accumulate subroutine calls. So main() is
> fairly low in the list:
>

 From this short list I'd say that opIndex could be sped up a bit. But 
nothing other catches my eye. Except for that 4% enforceEx on UTF exception.

> samples % source function
> 6934 16.7800 uni.d:601 const(@trusted bool function(dchar))
> std.internal.uni.CodepointTrie!(8).CodepointTrie.opIndex
> 4235 10.2485 (no location information) pure @safe dchar
> std.utf.decode(const(char[]), ref ulong)
> 3807 9.2128 regex.d:6395 @trusted bool
> std.regex.ctRegexImpl!("relay=([A-Za-z0-9_\-.]+[A-Za-z0-9_]+)[.,]*\s",
> []).func(ref
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher)
>
> 2240 5.4207 regex.d:3232 @property @trusted bool
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.atEnd()
>
> 2151 5.2053 regex.d:2932 @safe bool
> std.regex.Input!(char).Input.nextChar(ref dchar, ref ulong)
> 1812 4.3850 exception.d:486 pure @safe bool
> std.exception.enforceEx!(std.utf.UTFException, bool).enforceEx(bool,
> lazy immutable(char)[], immutable(char)[], ulong)
> 1686 4.0801 regex.d:6490 @trusted bool
> std.regex.ctRegexImpl!("relay=([A-Za-z0-9_\-.]+[A-Za-z0-9_]+)[.,]*\s",
> []).func(ref
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher).int
> test_11()
> 1409 3.4097 regex.d:6450 @safe
> std.regex.__T10RegexMatchTAaS613std5regex28__T19BacktrackingMatcherVb1Z19BacktrackingMatcherZ.RegexMatch
> std.regex.match!(char[],
> std.regex.StaticRegex!(char).StaticRegex).match(char[],
> std.regex.StaticRegex!(char).StaticRegex)
> 1335 3.2306 regex.d:6272 @trusted
> std.regex.__T10RegexMatchTAaS613std5regex28__T19BacktrackingMatcherVb1Z19BacktrackingMatcherZ.RegexMatch
> std.regex.__T10RegexMatchTAaS613std5regex28__T19BacktrackingMatcherVb1Z19BacktrackingMatcherZ.RegexMatch.__ctor!(std.regex.StaticRegex!(char).StaticRegex).__ctor(std.regex.StaticRegex!(char).StaticRegex,
> char[])
> 1224 2.9620 regex.d:3234 @trusted void
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.next()
>
> 1212 2.9330 regex.d:2951 @property @safe ulong
> std.regex.Input!(char).Input.lastIndex()
> 1202 2.9088 regex.d:2744 @trusted ulong
> std.regex.ShiftOr!(char).ShiftOr.search(const(char)[], ulong)
> 1051 2.5434 regex.d:3717 @trusted void
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.stackPush!(int).stackPush(int)
>
> 973 2.3546 regex.d:3717 @trusted void
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.stackPush!(ulong).stackPush(ulong)
>
> 884 2.1392 main.d:22 _Dmain
> 618 1.4955 regex.d:3726 @trusted void
> std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.stackPush!(std.regex.Group!(ulong).Group).stackPush(std.regex.Group!(ulong).Group[])
>
> 466 1.1277 (no location information) _d_arraysetlengthiT
>
> These functions sum up to ~80%. And if it is correct, the garbage
> collector functions each take a low place in the table. At this point
> I'd probably recommend an ASCII regex, but I'd like to know how Java can
> still be substantially faster with library routines. :)
>




More information about the Digitalmars-d mailing list