std.d.lexer requirements

Thu Aug 2 05:28:03 PDT 2012

On 8/2/12 6:07 AM, Walter Bright wrote:
> Why? I've never seen any UTF16 or UTF32 D source in the wild.

Here's a crazy idea that I'll hang to this one remark. No, two crazy ideas.

First, after having read the large back-and-forth Jonathan/Walter in one 
sitting, it's becoming obvious to me you'll never understand each other 
on this nontrivial matter through this medium. I suggest you set up a 
skype/phone call. Once you get past the first 30 seconds of social 
awkwardness of hearing each other's voice, you'll make fantastic 
progress in communicating.

Regarding the problem at hand, it's becoming painfully obvious to me 
that the lexer MUST do its own decoding internally. Hence, a very simple 
thing to do is have the entire lexer only deal with ranges of ubyte. If 
someone passes a char[], the lexer's front end can simply call 
s.representation and obtain the underlying ubyte[].

If someone passes some range of char, the lexer uses an adapter (e.g. 
map()) that casts every char to ubyte, which is a zero-cost operation. 
Then it uses the same core operating on ranges of ubyte.

In the first implementation, the lexer may actually refuse any range of 
16-bit or 32-bit elements (wchar[], range of wchar, dchar[], range of 
dchar). Later on the core may be evolved to handle range of ushort and 
range of dchar. The front-end would use again representation() against 
wchar[], cast with range of wchar, and would just pass the dchar[] and 
range of dchar around.

This makes the core simple and efficient (I think Jonathan's use of 
static if and mixins everywhere, while well-intended, complicates 
matters without necessity).

And as such we have a lexer! Which operates with ranges, just has a 
simple front-end clarifying that the lexer must do its own decoding.

Works?

Andrei