std.d.lexer requirements

Jonathan M Davis jmdavisProg at gmx.com
Thu Aug 2 19:35:07 PDT 2012


On Thursday, August 02, 2012 19:52:35 Jonathan M Davis wrote:
> I suppose that we could make it operate on code units and just let ranges of
> dchar have UTF-32 as their code unit (since dchar is both a code unit and a
> code point), then ranges of dchar will still work but ranges of char and
> wchar will _also_ work. Hmmm. As I said, I'll have to think this through a
> bit.

LOL. It looks like taking this approach results in almost identical code to 
what I've been doing. The main difference is that if you're dealing with a 
range other than a string, you need to use decode instead of front, which 
means that decode is going to need to work with more than just strings 
(probably stride too). I'll have to create a pull request for that.

But unless you restrict it to strings and ranges of code units which are 
random access, you still have to worry about stuff like using range[0] vs 
range.front depending on the type, so my mixin approach is still applicable, 
and it makes it very easy to switch what I'm doing, since there are very few 
lines that need to be tweaked.

So, I guess that I'll be taking the approach of taking ranges of char, wchar, 
and dchar and treat them all as ranges of code units. So, it'll work with 
everything that it worked with before but will now also work with ranges of 
char and wchar. There's still a performance hit if you do something like 
passing it filter!"true(source), but there's no way to fix that without 
disallowing dchar ranges entirely, which would be unnecessarily restrictive. 
Once you allow arbitrary ranges of char rather than requiring strings, the 
extra code required to allow ranges of wchar and dchar is trivial. It's stuff 
like worrying about range[0] vs range.front which complicates things (even if 
front is a code unit rather than a code point), and using string mixins makes 
it so that the code with the logic is just as simple as it would be with 
strings. So, I think that I can continue almost exactly as I have been and 
still achieve what Walter wants. The main issue that I have (beyond finishing 
what I haven't gotten to yet) is changing how I handle errors and comments, 
since I currently have them as tokens, but that shouldn't be terribly hard to 
fix.

- Jonathan M Davis


More information about the Digitalmars-d mailing list