Request for comments: std.d.lexer
Timon Gehr
timon.gehr at gmx.ch
Mon Jan 28 13:03:21 PST 2013
On 01/28/2013 01:53 AM, Brian Schott wrote:
> ...
>
> On the topic of performance, I realized that the numbers posted
> previously were actually for a debug build. Fail.
>
> For whatever reason, the current version of the lexer code isn't
> triggering my heisenbug[1] and I was able to build with -release -inline
> -O.
>
> Here's what avgtime has to say:
>
> $ avgtime -q -h -r 200 dscanner --tokenCount ../phobos/std/datetime.d
>
> ------------------------
> Total time (ms): 51409.8
> Repetitions : 200
> Sample mode : 250 (169 ocurrences)
> Median time : 255.57
> Avg time : 257.049
> Std dev. : 4.39338
> Minimum : 252.931
> Maximum : 278.658
> 95% conf.int. : [248.438, 265.66] e = 8.61087
> 99% conf.int. : [245.733, 268.366] e = 11.3166
> EstimatedAvg95%: [256.44, 257.658] e = 0.608881
> EstimatedAvg99%: [256.249, 257.849] e = 0.800205
> Histogram :
> msecs: count normalized bar
> 250: 169 ########################################
> 260: 22 #####
> 270: 9 ##
>
> Which works out to 1,327,784 tokens per second on my Ivy Bridge i7.
>
Better, but still slow.
> I created a small program that demangles the output of valgrind so that
> tools like KCachegrind can display profiling information more clearly.
> It's now on the wiki[2]
>
> The bottleneck in std.d.lexer as it stands is the appender instances
> that assemble Token.value during iteration and front() on the array of
> char[]. (As I'm sure everyone expected)
>
I see, probably there should be an option to do this by slicing instead.
Also try to treat narrow strings in such a way that they do not incur
undue decoding overhead.
I guess that at some point
pure nothrow TokenType lookupTokenType(const string input)
might become a bottleneck. (DMD does not generate near-optimal string
switches, I think.)
More information about the Digitalmars-d
mailing list