Why not extend do to allow unicode in ID's?

Bert Bert at gmail.com
Fri Jul 5 01:01:25 UTC 2019


On Wednesday, 3 July 2019 at 23:21:19 UTC, XavierAP wrote:
> On Tuesday, 2 July 2019 at 04:34:42 UTC, Jonathan M Davis wrote:
>>
>> a character like ±
>
> A good example of a character that should not be allowed in 
> identifiers, because it has a meaning of operator (and in 
> general in theory we may want to reserve it for such future 
> use).
>
> ISO or Unicode define what, not all, characters are letters or 
> alphanumeric:
>
> https://dlang.org/spec/lex.html#identifiers
>
> https://docs.microsoft.com/en-us/dotnet/api/system.char.isletter#remarks

Maybe, maybe not. It could be useful in some contexts... probably 
could be more confusing but -, +, ± can be very useful as sub or 
superscripts for special mathematical situations(I've seen it 
used many times, such as representing the even and odd sets of 
things or for lower and raising operations that are encoded in 
symbolic form(such as momentum operations that can be computed by 
multiplication)).

It may not be worth allowing because s_-*s_++3 would be very 
ambiguous... as would s±4+3. Specially if ± is also defined as an 
operator...

But ± should be allowed to be used as an operator as that is the 
most useful case.

4 ± 3

could be a mathematical object containing two values.

a ± b could be a mathematical object containing 2(m+n) values 
depend on how many values a and b contains.

(4 ± 3)*(±6) contains 4 values = 42, -42, 6, -6.


So D could go through the unicode list and determine which 
symbols are best suited for operators and which for identifiers 
and then enable their usage. Many symbols that are not 
appropriate for id's would be appropriate for operators: ▌╚█

These are ugly in some sense but they could have good meaning in 
relation to operations. █ could mean boxing: █a means box a.

But they could also be useful for Id's...  █ could mean rectangle.

Symbols are arbitrary. We know millions of symbols. Our brain has 
no issues decoding them after we learn the meaning. The only 
problem is that it's nice to have consistency so we don't have to 
learn many different purposes for the same symbol(but we already 
do, it's not a huge deal, it does slow us down a little but  
usually context is clear).

I think having it more open ended is better. It might require 
people exercising their neurons little bit but it is a good thing 
in the long run. Obviously people could make it very difficult by 
making code very terse but I doubt that would happen much. People 
don't code in D to make their life more difficult, they do it to 
make it less. Virtually everyone will choose the symbols in a 
logical way that will make sense.



What could be done is that any unicode character in an id could 
have some ascii equivalent.

someÆx is also

some::432::x

or whatever. If a good symbol could be found instead of ::. Then 
IDE's could learn to support the syntax and convert between them. 
A simple hotkey could work between the two and code pages could 
be flipped to change the keyboard. a pragma(codepage, 43) could 
inform the IDE to use use a codepage. These might have issues but 
without trying different things the optimal solution can't be 
found.




More information about the Digitalmars-d mailing list