Non-nullable references, again

Thu Jan 1 23:08:42 PST 2009

Daniel Keep wrote:
> Benji Smith wrote:
>> Don wrote:
>>> Denis Koroskin wrote:
>>>> Foo nonNull = new Foo();
>>>> Foo? possiblyNull = null;
>>  >
>>> Wouldn't this cause ambiguity with the "?:" operator?
>>
>> At first, thought you might be right, and that there would some 
>> ambiguity calling constructors of nullable classes (especially given 
>> optional parentheses).
>>
>> But for the life of me, I couldn't come up with a truly ambiguous 
>> example, that couldn't be resolved with an extra token or two of 
>> lookahead.
>>
>> The '?' nullable-type operator is only used  in type declarations, not 
>> in expressions, and the '?:' operator always consumes a few trailing 
>> expressions.
>>
>> Also (at least in C#) the null-coalesce operator (which converts 
>> nullable objects to either a non-null instance or a default value) 
>> looks like this:
>>
>>   MyClass? myNullableObj = getNullableFromSomewhere();
>>   MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE;
>>
>> Since the double-hook is a single token, it's also unambiguous to parse.
>>
>> --benji
> 
> Disclaimer: I'm not an expert on compilers.  Plus, I just got up.  :P
> 
> The key is that the parser has to know what "MyClass" means before it 
> can figure out what the "?" is for; that's why it's context-dependant. D 
> avoids this dependency between compilation stages, because it 
> complicates the compiler.  When the parser sees "MyClass", it *doesn't 
> know* that it's a type, so it can't distinguish between a nullable type 
> and an invalid ?: expression.
> 
> At least, I think that's how it works; someone feel free to correct me 
> if it's not.  :P
> 
>   -- Daniel

I could be wrong too. I've done a fair bit of this stuff, but I'm no 
expert either :)

Nevertheless, I still don't think there's any ambiguity, as long as the 
parser can perform syntactic lookahead predicates. The grammar would 
look something like this:

DECLARATION :=
   IDENTIFIER         // Type name
   ( HOOK )?          // Is nullable?
   IDENTIFIER         // Var name
   (
     SEMICOLON        // End of declaration
     |
     (
       OP_ASSIGN      // Assignment operator
       EXPRESSION     // Assigned value
     )
   )

Whereas the ternary expression grammar would look something like this:

TERNARY_EXPRESSION :=
   IDENTIFIER         // Type name
   HOOK               // Start of '?:' operator
   EXPRESSION         // Value if true
   COLON              // End of '?:' operator
   EXPRESSION         // Value if false

The only potential ambiguity arises because the "value if true" 
expression could also just be an identifier. But if the parser can 
construct syntactic predicates to perform LL(k) lookahead with arbitrary 
k, then it can just keep consuming tokens until it finds either a 
SEMICOLON, an OP_ASSIGN, or a COLON (potentially, recursively, if it 
encounters another identifier and hook within the expression).

Still, though, once it finds one of those tokens, the syntax has been 
successfully disambiguated, without resorting to a semantic predicate.

It requires arbitrary lookahead, but it can be done within a 
context-free grammar, and all within the syntax-processing portion of 
the parser.

Of course, I could be completely wrong too :)

--benji