Request for comments: std.d.lexer

Sun Jan 27 16:53:02 PST 2013

On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
> On 1/27/2013 1:39 PM, Brian Schott wrote:
>> The file name is accepted for eventual error reporting 
>> purposes.
>
> Use an OutputRange for that.

I think you misunderstand. The file name is so that if you pass 
in "foo.d" the lexer can say "Error: unterminated string literal 
beginning on line 123 of foo.d". It's not so that error messagaes 
will be written to a file of that name.

On the topic of performance, I realized that the numbers posted 
previously were actually for a debug build. Fail.

For whatever reason, the current version of the lexer code isn't 
triggering my heisenbug[1] and I was able to build with -release 
-inline -O.

Here's what avgtime has to say:

$ avgtime -q -h -r 200 dscanner --tokenCount 
../phobos/std/datetime.d

------------------------
Total time (ms): 51409.8
Repetitions    : 200
Sample mode    : 250 (169 ocurrences)
Median time    : 255.57
Avg time       : 257.049
Std dev.       : 4.39338
Minimum        : 252.931
Maximum        : 278.658
95% conf.int.  : [248.438, 265.66]  e = 8.61087
99% conf.int.  : [245.733, 268.366]  e = 11.3166
EstimatedAvg95%: [256.44, 257.658]  e = 0.608881
EstimatedAvg99%: [256.249, 257.849]  e = 0.800205
Histogram      :
     msecs: count  normalized bar
       250:   169  ########################################
       260:    22  #####
       270:     9  ##

Which works out to 1,327,784 tokens per second on my Ivy Bridge 
i7.

I created a small program that demangles the output of valgrind 
so that tools like KCachegrind can display profiling information 
more clearly. It's now on the wiki[2]

The bottleneck in std.d.lexer as it stands is the appender 
instances that assemble Token.value during iteration and front() 
on the array of char[]. (As I'm sure everyone expected)

[1] 
http://forum.dlang.org/thread/bug-9353-3@http.d.puremagic.com%2Fissues%2F
[2] http://wiki.dlang.org/Other_Dev_Tools