Why not extend do to allow unicode in ID's?
Bert
Bert at gmail.com
Fri Jul 5 01:01:25 UTC 2019
On Wednesday, 3 July 2019 at 23:21:19 UTC, XavierAP wrote:
> On Tuesday, 2 July 2019 at 04:34:42 UTC, Jonathan M Davis wrote:
>>
>> a character like ±
>
> A good example of a character that should not be allowed in
> identifiers, because it has a meaning of operator (and in
> general in theory we may want to reserve it for such future
> use).
>
> ISO or Unicode define what, not all, characters are letters or
> alphanumeric:
>
> https://dlang.org/spec/lex.html#identifiers
>
> https://docs.microsoft.com/en-us/dotnet/api/system.char.isletter#remarks
Maybe, maybe not. It could be useful in some contexts... probably
could be more confusing but -, +, ± can be very useful as sub or
superscripts for special mathematical situations(I've seen it
used many times, such as representing the even and odd sets of
things or for lower and raising operations that are encoded in
symbolic form(such as momentum operations that can be computed by
multiplication)).
It may not be worth allowing because s_-*s_++3 would be very
ambiguous... as would s±4+3. Specially if ± is also defined as an
operator...
But ± should be allowed to be used as an operator as that is the
most useful case.
4 ± 3
could be a mathematical object containing two values.
a ± b could be a mathematical object containing 2(m+n) values
depend on how many values a and b contains.
(4 ± 3)*(±6) contains 4 values = 42, -42, 6, -6.
So D could go through the unicode list and determine which
symbols are best suited for operators and which for identifiers
and then enable their usage. Many symbols that are not
appropriate for id's would be appropriate for operators: ▌╚█
These are ugly in some sense but they could have good meaning in
relation to operations. █ could mean boxing: █a means box a.
But they could also be useful for Id's... █ could mean rectangle.
Symbols are arbitrary. We know millions of symbols. Our brain has
no issues decoding them after we learn the meaning. The only
problem is that it's nice to have consistency so we don't have to
learn many different purposes for the same symbol(but we already
do, it's not a huge deal, it does slow us down a little but
usually context is clear).
I think having it more open ended is better. It might require
people exercising their neurons little bit but it is a good thing
in the long run. Obviously people could make it very difficult by
making code very terse but I doubt that would happen much. People
don't code in D to make their life more difficult, they do it to
make it less. Virtually everyone will choose the symbols in a
logical way that will make sense.
What could be done is that any unicode character in an id could
have some ascii equivalent.
someÆx is also
some::432::x
or whatever. If a good symbol could be found instead of ::. Then
IDE's could learn to support the syntax and convert between them.
A simple hotkey could work between the two and code pages could
be flipped to change the keyboard. a pragma(codepage, 43) could
inform the IDE to use use a codepage. These might have issues but
without trying different things the optimal solution can't be
found.
More information about the Digitalmars-d
mailing list