Adding Unicode operators to D

KennyTM~ kennytm at gmail.com
Thu Oct 23 05:36:59 PDT 2008


Andrei Alexandrescu wrote:
> Please vote up before the haters take it down, and discuss:
> 
> http://www.reddit.com/r/programming/comments/78rjk/allowing_unicode_operators_in_d_similarly_to/ 
> 
> 
> 
> Andrei

I suggest not. There are problems if you adopt Unicode as operators:

======

1) My editor supports Unicode, but my keyboard don't. So how do I type ∩ 
and ∪ for a set«T»?

1.1) What if the library writer forget to provide an alternative, 
ASCII-only name? [This is also a problem of using Unicode as identifier 
as general.]

1.2) Some suggested auto-correction in the IDE. Again what if I used 
notepad/nano/TextEdit to code?



I had suggested once before, but let me put it formally here. If you 
really want to support Unicode operators in source code,

  - Firstly, ditch the ability to replace \xxx with '\xxx' when it 
appears without the quotes (so “char x = \n;” won't compile).
  - Then, replace \xxx with the character represented in source level, so

      Vector3D«real» τ = r × F;

    can be written as

      Vector3D!(real) \τ = r \× F;

  - You don't need to introduce a separate trigraph.
  - But suggestion do trigger some people's trigraph-phobia. [Yell no! 
Now! :) ]
  - It may make the source code difficult to parse grammatically.
  - It will make the source code difficult to read, just look at the 
number of semicolons in the ASCII encoded version.
  - But at least you can compile your code.

======

2) This is regarding the rejection of « & » to be supported even if the 
emacs module goes official. Of course it turns out it is not, but let's 
think of these scenarios:

2.1) OK it turns out ∩ and ∪ and «T» where just .opUnion(x) and 
.opIntersect(x) and !(T) pretty-printed in emacs; the compiler won't 
accept these characters anyway. But sometimes I forgot and just copied a 
portion of these code to nano/geany/whatever and then it stops compiling!

2.2) Well this copy&paste problem has been solved in the IDE level by 
inverting the pretty printing while copying. But now I publish my 
fantastic, pretty-printed D program in a web page/PDF/whatever, and 
people just complain the compiler won't accept it!



I still believe if you're going to transform D code to Unicode visually, 
the compiler must accept these visual replacement as well.

May I also take Mathematica as an example. The programming language 
itself uses a heavy load of non-ASCII characters, and the IDE also 
pretty-printed them as nice mathematical formulas, but in the “source 
code” level they are just escape sequences. So on screen you see

    E^(I π) + 1

but in the source code you'll see

    E^(I \[Pi]) + 1

However, if you type in “E^(I π) + 1” in a plain .nb file and open with 
the Get[] function (think of it as “import xx.d”) it can still correctly 
display the result “0”.

======

3) There are over 800 unary or binary operators in Unicode[1]. How are 
you going to opXXX all them? Assume your blog entry doesn't mean the 
simple “!=” ↦ “≠” transformation.


Use to the C++/C# approach? But I heard that's no good.

======

4) These are regarding if you are going to support overloading for all 
these 800 operators, how to define:

4.1) [Big problem] Operator precedence? (One person may want ∧ to mean 
the wedge product (so they have higher precedence than + and -) but 
another want it to mean logical AND (so lower than + and -).)

4.2) Associativity? How to determine if an operator is left-associative, 
right-associative or both? (∧ as wedge product is both, while ∧ as a 
power function pow(a,b) is right-assoc.)

4.3) [Minor problem] Commutativity? Or we'll need to write opXXX and 
opXXX_r all the time?


I don't have solutions for D on these. For 4.2 & 4.3 in C# we can 
introduce some attributes like

   [Associative, Commutative]
   FuzzyBool operator∧ (FuzzyBool x, FuzzyBool y) { return min(x,y); }

   (Not actual C# code.)

but it's not D. :)

Or predefine the meaning, precedence and associativity for the each 
operator, so e.g. ∧ always means the wedge product and not logical AND, 
just like now ^ always means XOR and not power function.

Or just require the programmer to always put the parenthesis.




Ref: [1] A rough word count in 
http://www.unicode.org/Public/math/revision-11/MathClass-11.txt. The 
actual number is higher than this.


More information about the Digitalmars-d-announce mailing list