Adding Unicode operators to D
KennyTM~
kennytm at gmail.com
Thu Oct 23 05:36:59 PDT 2008
Andrei Alexandrescu wrote:
> Please vote up before the haters take it down, and discuss:
>
> http://www.reddit.com/r/programming/comments/78rjk/allowing_unicode_operators_in_d_similarly_to/
>
>
>
> Andrei
I suggest not. There are problems if you adopt Unicode as operators:
======
1) My editor supports Unicode, but my keyboard don't. So how do I type ∩
and ∪ for a set«T»?
1.1) What if the library writer forget to provide an alternative,
ASCII-only name? [This is also a problem of using Unicode as identifier
as general.]
1.2) Some suggested auto-correction in the IDE. Again what if I used
notepad/nano/TextEdit to code?
I had suggested once before, but let me put it formally here. If you
really want to support Unicode operators in source code,
- Firstly, ditch the ability to replace \xxx with '\xxx' when it
appears without the quotes (so “char x = \n;” won't compile).
- Then, replace \xxx with the character represented in source level, so
Vector3D«real» τ = r × F;
can be written as
Vector3D!(real) \τ = r \× F;
- You don't need to introduce a separate trigraph.
- But suggestion do trigger some people's trigraph-phobia. [Yell no!
Now! :) ]
- It may make the source code difficult to parse grammatically.
- It will make the source code difficult to read, just look at the
number of semicolons in the ASCII encoded version.
- But at least you can compile your code.
======
2) This is regarding the rejection of « & » to be supported even if the
emacs module goes official. Of course it turns out it is not, but let's
think of these scenarios:
2.1) OK it turns out ∩ and ∪ and «T» where just .opUnion(x) and
.opIntersect(x) and !(T) pretty-printed in emacs; the compiler won't
accept these characters anyway. But sometimes I forgot and just copied a
portion of these code to nano/geany/whatever and then it stops compiling!
2.2) Well this copy&paste problem has been solved in the IDE level by
inverting the pretty printing while copying. But now I publish my
fantastic, pretty-printed D program in a web page/PDF/whatever, and
people just complain the compiler won't accept it!
I still believe if you're going to transform D code to Unicode visually,
the compiler must accept these visual replacement as well.
May I also take Mathematica as an example. The programming language
itself uses a heavy load of non-ASCII characters, and the IDE also
pretty-printed them as nice mathematical formulas, but in the “source
code” level they are just escape sequences. So on screen you see
E^(I π) + 1
but in the source code you'll see
E^(I \[Pi]) + 1
However, if you type in “E^(I π) + 1” in a plain .nb file and open with
the Get[] function (think of it as “import xx.d”) it can still correctly
display the result “0”.
======
3) There are over 800 unary or binary operators in Unicode[1]. How are
you going to opXXX all them? Assume your blog entry doesn't mean the
simple “!=” ↦ “≠” transformation.
Use to the C++/C# approach? But I heard that's no good.
======
4) These are regarding if you are going to support overloading for all
these 800 operators, how to define:
4.1) [Big problem] Operator precedence? (One person may want ∧ to mean
the wedge product (so they have higher precedence than + and -) but
another want it to mean logical AND (so lower than + and -).)
4.2) Associativity? How to determine if an operator is left-associative,
right-associative or both? (∧ as wedge product is both, while ∧ as a
power function pow(a,b) is right-assoc.)
4.3) [Minor problem] Commutativity? Or we'll need to write opXXX and
opXXX_r all the time?
I don't have solutions for D on these. For 4.2 & 4.3 in C# we can
introduce some attributes like
[Associative, Commutative]
FuzzyBool operator∧ (FuzzyBool x, FuzzyBool y) { return min(x,y); }
(Not actual C# code.)
but it's not D. :)
Or predefine the meaning, precedence and associativity for the each
operator, so e.g. ∧ always means the wedge product and not logical AND,
just like now ^ always means XOR and not power function.
Or just require the programmer to always put the parenthesis.
Ref: [1] A rough word count in
http://www.unicode.org/Public/math/revision-11/MathClass-11.txt. The
actual number is higher than this.
More information about the Digitalmars-d-announce
mailing list