Why UTF-8/16 character encodings?

Sat May 25 04:20:51 PDT 2013

On 05/25/2013 05:56 AM, H. S. Teoh wrote:
> On Fri, May 24, 2013 at 08:45:56PM -0700, Walter Bright wrote:
>> On 5/24/2013 7:16 PM, Manu wrote:
>>> So when we define operators for u × v and a · b, or maybe n²? ;)
>>
>> Oh, how I want to do that. But I still think the world hasn't
>> completely caught up with Unicode yet.
>
> That would be most awesome!
>
> Though it does raise the issue of how parsing would work, 'cos you
> either have to assign a fixed precedence to each of these operators (and
> there are a LOT of them in Unicode!),

I think this is what eg. fortress is doing.

> or allow user-defined operators
> with custom precedence and associativity,

This is what eg. Haskell, Coq are doing.
(Though Coq has the advantage of not allowing forward references, and 
hence inline parser customization is straighforward in Coq.)

> which means nightmare for the
> parser (it has to adapt itself to new operators as the code is
> parsed/analysed,

It would be easier on the parsing side, since the parser would not fully 
parse expressions. Semantic analysis would resolve precedences. This is 
quite simple, and the current way the parser resolves operator 
precedences is less efficient anyways.

> which then leads to issues with what happens if two
> different modules define the same operator with conflicting precedence /
> associativity).
>

This would probably be an error without explicit disambiguation, or 
follow the usual disambiguation rules. (trying all possibilities appears 
to be exponential in the number of conflicting operators in an 
expression in the worst case though.)