Adding Unicode operators to D

Sun Oct 26 10:53:03 PDT 2008

Andrei Alexandrescu wrote:
> Bruno Medeiros wrote:
>> Andrei Alexandrescu wrote:
>>> Spacen Jasset wrote:
>>>> Bill Baxter wrote:
>>>>> On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
>>>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>>>> Please vote up before the haters take it down, and discuss:
>>>>>>
>>>>>> http://www.reddit.com/r/programming/comments/78rjk/allowing_unicode_operators_in_d_similarly_to/ 
>>>>>>
>>>>>>
>>>>>
>>>>> (My comment cross posted here from reddit)
>>>>>
>>>>> I think the right way to do it is not to make everything Unicode. All
>>>>> the pressure on the existing symbols would be dramatically relieved by
>>>>> the addition of just a handful of new symbols.
>>>>>
>>>>> The truth is keyboards aren't very good for inputting Unicode. That
>>>>> isn't likely to change. Yes they've dealt with the problem in Asian
>>>>> languages by using IMEs but in my opinion IMEs are horrible to use.
>>>>>
>>>>> Some people seem to argue it's a waste to go to Unicode only for a few
>>>>> symbols. If you're going to go Unicode, you should go whole hog. I'd
>>>>> argue the exact opposite. If you're going to go Unicode, it should be
>>>>> done in moderation. Use as little Unicode as necessary and no more.
>>>>>
>>>>> As for how to input unicode -- Microsoft Word solved that problem ages
>>>>> ago, assuming we're talking about small numbers of special characters.
>>>>> It's called AutoCorrect. You just register your unicode symbol as a
>>>>> misspelling for "(X)" or something unique like that and then every
>>>>> time you type "(X)" a funky unicode character instantly replaces those
>>>>> chars.
>>>>>
>>>>> Yeh, not many editors support such a feature. But it's very easy to
>>>>> implement. And with that one generic mechanism, your editor is ready
>>>>> to support input of Unicode chars in any language just by adding the
>>>>> right definitions.
>>>>>
>>>>> --bb
>>>> I am not entirely sure that 30 or (x amount) of new operators would 
>>>> be a good thing anyway. How hard is it to say m3 = 
>>>> m1.crossProduct(m2) ? vs m3 = m1 X m2 ? and how often will that 
>>>> happen? It's also going to make the language more difficult to learn 
>>>> and understand.
>>>
>>> I have noticed that in pretty much all scientific code, the f(a, b) 
>>> and a.f(b) notations fall off a readability cliff when the number of 
>>> operators grows only to a handful. Lured by simple examples like 
>>> yours, people don't see that as a problem until they actually have to 
>>> read or write such code. Adding temporaries and such is not that 
>>> great because it further takes the algorithm away from its 
>>> mathematical form just for serving a notation that was the problem in 
>>> the first place.
>>>
>>
>> But what operators would be added? Some mathematician programmers 
>> might want vector and matrix operators, others set operators, others 
>> still derivation/integration operators, and so on. Where would we stop?
>> I don't deny it might be useful for them, but it does seem like too 
>> specific a need to integrate in the language.
> 
> I was thinking of allowing a general way of defining one Unicode 
> character to stand in as one operator, and then have libraries implement 
>  the actual operators.
> 
> There's the remaining problem of different libraries defining the same 
> character to mean different operators. This may not be huge as math 
> subdomains tend to be rather consistent in their use of operators. 
> Across math subdomains, types and overloading can take care of things.
> 
> Also, ascii representation should be allowed for operators, and one nice 
> thing about Unicode characters is that many have HTML ascii and 
> human-readable names, see 
> http://www.fileformat.info/format/w3c/htmlentity.htm. So 
> \unicodecharname may be a good alternate way to enter these operators. 
> For example, the empty set could be \empty, and the cross-product could 
> be written as \times. So
> 
> c = a \times b;
> 
> doesn't quite look bad to me.
> 
> One nice thing about this is that we don't need to pore over naming and 
> such, we just use stuff that others (creators and users alike) have 
> already pored over. Saves on documentation writing too :o).
> 
> 
> Andrei

LaTeX in D? :p

Anyway we already have \&times; and \&empty; so we could reuse them in 
source code level as I've described somewhere in this thread.

   auto torque = position \&times; force;

This is uglier than

   auto torque = position \times force;

but it gives a uniform syntax between escape sequences inside and 
outside strings.

The problem is you may have to invent some names, i.e. the composition 
operator ∘ (U+2218 ring operator) has no name in SGML entities. In LaTeX 
it is represented as \circ but \&circ; is already taken by ˆ (U+02C6 
modifier letter circumflex accent).

And you'll need to predefine the associativity and operation precedence 
too. ;) See my other entry in this thread.