Non-ASCII in the future in the lexer
Timon Gehr
timon.gehr at gmx.ch
Thu Jun 1 13:49:42 UTC 2023
On 6/1/23 08:31, Walter Bright wrote:
> On 5/31/2023 8:13 AM, H. S. Teoh wrote:
>> This is all great, but as someone else has already said, the input
>> method could be a problem area. On my PC, I've set up XKB input with a
>> compose key such that many of these symbols are relatively easily
>> accessible; for example, Compose + < + = produces ≤; and Compose + v + /
>> produces √. However, some symbols are more tricky to input, and some
>> are not accessible this way.
>
> I've struggled with that, too. On MicroEmacs, I fixed ^X-U to scroll
> through the various incarnations of a letter. So, placing the cursor on
> a, and hitting ^X-U, changes it to a with an umlaut, a with an accent,
> etc. On a -, it scrolls through the various - variations. On ", it
> scrolls through the quoting symbols.
>
> Of course, this is pretty limited.
>
I am just using the Agda input mode in emacs, so e.g., I just type "\to"
and I get "→", "\'a" and I get "á", etc. Many editors have similar
plugins. This also works perfectly over ssh. In any case, the approach I
have taken with my own lexers is that Unicode is supported, but never
required. E.g., people can just write "->" instead of "→" and this is
the case for all Unicode syntax elements (except if you have to match an
identifier name I guess). After that, whether or not non-ASCII tokens
are used at all becomes a question of code style and formatting.
In my experience, many programmers are too lazy (and/or ideologically
against Unicode) to set up simple Unicode input and still prefer to
write ASCII, but I much prefer reading Unicode. Further down the road, I
plan to address this disconnect using an automatic code formatter.
More information about the Digitalmars-d
mailing list