std.d.lexer requirements
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Aug 2 05:28:03 PDT 2012
On 8/2/12 6:07 AM, Walter Bright wrote:
> Why? I've never seen any UTF16 or UTF32 D source in the wild.
Here's a crazy idea that I'll hang to this one remark. No, two crazy ideas.
First, after having read the large back-and-forth Jonathan/Walter in one
sitting, it's becoming obvious to me you'll never understand each other
on this nontrivial matter through this medium. I suggest you set up a
skype/phone call. Once you get past the first 30 seconds of social
awkwardness of hearing each other's voice, you'll make fantastic
progress in communicating.
Regarding the problem at hand, it's becoming painfully obvious to me
that the lexer MUST do its own decoding internally. Hence, a very simple
thing to do is have the entire lexer only deal with ranges of ubyte. If
someone passes a char[], the lexer's front end can simply call
s.representation and obtain the underlying ubyte[].
If someone passes some range of char, the lexer uses an adapter (e.g.
map()) that casts every char to ubyte, which is a zero-cost operation.
Then it uses the same core operating on ranges of ubyte.
In the first implementation, the lexer may actually refuse any range of
16-bit or 32-bit elements (wchar[], range of wchar, dchar[], range of
dchar). Later on the core may be evolved to handle range of ushort and
range of dchar. The front-end would use again representation() against
wchar[], cast with range of wchar, and would just pass the dchar[] and
range of dchar around.
This makes the core simple and efficient (I think Jonathan's use of
static if and mixins everywhere, while well-intended, complicates
matters without necessity).
And as such we have a lexer! Which operates with ranges, just has a
simple front-end clarifying that the lexer must do its own decoding.
Works?
Andrei
More information about the Digitalmars-d
mailing list