Best practices for parsing files

Thu Jan 25 15:01:51 PST 2007

Reply to lurker,

> Both suggestions are very interesting and I'll be evaluating them; but
> what I was hoping was something more on the line of DMD's parser (been
> insanely fast): A hand-written parser. We also thought of translating
> it to D just as an exercise to learn how it works.
> 
> You see, one of my concerns (and the primary reason to use D) is
> parsing speed: I'm going to parse lot's and lot's of those files and
> memory consumption almost isn't an issue since we have lots of it.
>

Ah, then I guess you won't want an LL parser. 

> 
> Also, the tasks will be executed on a thread pool and we don't want to
> face locking problems with code generated by some tool. At least if we
> write the code we'll know who to blame. :D
> 

Both should be thread safe (if you stick to one thread per file)

As far as slicing goes, I'm working on a parser that read a file into memory 
(I guess it could mmap it in as well) and converts it to an array of token 
structs. A parser will then walk on the array. If you new a big array of 
struct in advance and have your lexer write directly to the array (slicing 
out of the file where the text is important, that should be fairly fast. 

That's my 2 cents, I'm not sure how much help this will be (my parser is 
/not/ performance driven) but I hope it might help.