Non-ASCII in the future in the lexer

Thu Jun 1 13:49:42 UTC 2023

On 6/1/23 08:31, Walter Bright wrote:
> On 5/31/2023 8:13 AM, H. S. Teoh wrote:
>> This is all great, but as someone else has already said, the input
>> method could be a problem area.  On my PC, I've set up XKB input with a
>> compose key such that many of these symbols are relatively easily
>> accessible; for example, Compose + < + = produces ≤; and Compose + v + /
>> produces √.  However, some symbols are more tricky to input, and some
>> are not accessible this way.
> 
> I've struggled with that, too. On MicroEmacs, I fixed ^X-U to scroll 
> through the various incarnations of a letter. So, placing the cursor on 
> a, and hitting ^X-U, changes it to a with an umlaut, a with an accent, 
> etc. On a -, it scrolls through the various - variations. On ", it 
> scrolls through the quoting symbols.
> 
> Of course, this is pretty limited.
> 

I am just using the Agda input mode in emacs, so e.g., I just type "\to" 
and I get "→", "\'a" and I get "á", etc. Many editors have similar 
plugins. This also works perfectly over ssh. In any case, the approach I 
have taken with my own lexers is that Unicode is supported, but never 
required. E.g., people can just write "->" instead of "→" and this is 
the case for all Unicode syntax elements (except if you have to match an 
identifier name I guess). After that, whether or not non-ASCII tokens 
are used at all becomes a question of code style and formatting.

In my experience, many programmers are too lazy (and/or ideologically 
against Unicode) to set up simple Unicode input and still prefer to 
write ASCII, but I much prefer reading Unicode. Further down the road, I 
plan to address this disconnect using an automatic code formatter.