Dscanner - It exists
Jonathan M Davis
jmdavisProg at gmx.com
Wed Aug 1 13:20:49 PDT 2012
On Wednesday, August 01, 2012 19:58:46 Brian Schott wrote:
> On Wednesday, 1 August 2012 at 17:36:16 UTC, Walter Bright wrote:
> > I suggest proposing the D lexer as an addition to Phobos. But
> > if that is done, its interface would need to accept a range as
> > input, and its output should be a range of tokens.
>
> It used to be range-based, but the performance was terrible. The
> inability to use slicing on a forward-range of characters and the
> gigantic block on KCachegrind labeled "std.utf.decode" were the
> reasons that I chose this approach. I wish I had saved the
> measurements on this....
If you want really good performance out of a range-based solution operating on
ranges of dchar, then you need to special case for the built-in string types
all over the place, and if you have to wrap them in other range types
(generally because of calling another range-based function), then there's a
good chance that you will indeed get a performance hit. D's range-based
approach is really nice from the perspective of usability, but you have to
work at it a bit if you want it to be efficient when operating on strings. It
_can_ be done though.
The D lexer that I'm currently writing special-cases strings pretty much
_everywhere_ (string mixins really help reduce the cost of that in terms of
code duplication). The result is that if I do it right, its performance for
strings should be very close to what dmd can do (it probably won't quite reach
dmd's performance simply because of some extra stuff it does to make it more
usable for stuff other than compilers - e.g. syntax highlighters). But you'll
still likely get a performance hit of you did something like
string source = getSource();
auto result = tokenRange(filter!"true"(source));
instead of
string source = getSource();
auto result = tokenRange(source);
It won't be quite as bad a performance hit with 2.060 thanks to some recent
optimizations to string's popFront, but you're going to lose out on some
performance regardless, because nothing can special-case for every possible
range type, and one of the keys to fast string processing is to minimizing how
much you decode characters, which generally requires special-casing.
- Jonathan M Davis
More information about the Digitalmars-d-announce
mailing list