Java > Scala

Fri Dec 2 13:44:33 PST 2011

On 03.12.2011 1:08, Marco Leise wrote:
> Cool, thx for your answers. The source code for OpenJDK can be
> downloaded if you want to take a look at it. You are probably right
> about them not decoding the characters lazily since their strings are
> UTF-16.
> The commented version of opIndex is a bit faster on my Core 2. This is
> the first time that I witnessed such speed differences between
> processors. :)

Wow. I knew something was wrong with non-BT test code, from what I heard 
it should have been faster but it wasn't for me :)

> Also I found that the trie is usually queried twice for each matching
> character in the input string. You can't optimize opIndex any further
> (but try size_t in there instead of uint, it helped here) unless you
> make some changes on the larger scale. So if you should find out that
> the second query isn't required, that would help more than anything else.
> I said it on IRC today: This library will be my reference for compile
> time code generation in D. There is a lot of expertise in it, good work!
>

There I have two options to work through:
  - separate negative and positive character classes it would kill 
possible branching here.
- and now looking at test_11 in you profile output, I see the likely 
culprit: I should re-think lookahead tests, they used to reduce number 
of savepoints during matching.

> P.S.: I'm fine with treating anything that is escaped, but not special,
> as is. \w did cause an infinite loop though, so you may want to test

Hm can't reproduce.

> with the original regex. For \. you can assert(false, "\. is not a valid
> escape sequence")

No that was bad idea ... and I planed to change that exception. Now I'm 
more into ignore the backslash.

  or just ignore the backslash. Personally I usually
> don't escape anything just to be on the safe side. :p

Worthy of a small community poll.