Let's stop parser Hell
Jonathan M Davis
jmdavisProg at gmx.com
Wed Aug 1 02:21:27 PDT 2012
On Wednesday, August 01, 2012 11:14:52 Jacob Carlborg wrote:
> On 2012-08-01 08:11, Jonathan M Davis wrote:
> > I'm not using regexes at all. It's using string mixins to reduce code
> > duplication, but it's effectively hand-written. If I do it right, it
> > should be _very_ difficult to make it any faster than it's going to be.
> > It even specifically avoids decoding unicode characters and operates on
> > ASCII characters as much as possible.
>
> That's good idea. Most code can be treated as ASCII (I assume most
> people code in english). It would basically only be string literals
> containing characters outside the ASCII table.
What's of particular importance is the fact that _all_ of the language
constructs are ASCII. So, unicode comes in exclusively with identifiers, string
literals, char literals, and whitespace. And with those, ASCII is still going
to be far more common, so coding it in a way that makes ASCII faster at slight
cost to performance for unicode is perfectly acceptable.
> BTW, have you seen this:
>
> http://woboq.com/blog/utf-8-processing-using-simd.html
No, I'll have to take a look. I know pretty much nothing about SIMD though.
I've only heard of it, because Walter implemented some SIMD stuff in dmd not
too long ago.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list