Proposal: clean up semantics of array literals vs string literals

Don Clugston dac at nospam.com
Tue Oct 2 04:11:00 PDT 2012


The problem
-----------

String literals in D are a little bit magical; they have a trailing \0. 
This means that is possible to write,

printf("Hello, World!\n");

without including a trailing \0. This is important for compatibility 
with C. This trailing \0 is mentioned in the spec but only incidentally, 
and generally in connection with printf.

But the semantics are not well defined.

printf("Hello, W" ~ "orld!\n");

Does this have a trailing \0 ? I think it should, because it improves 
readability of string literals that are longer than one line. Currently 
DMD adds a \0, but it is not in the spec.

Now consider array literals.

printf(['H','e', 'l', 'l','o','\n']);

Does this have a trailing \0 ? Currently DMD does not put one in.
How about ['H','e', 'l', 'l','o'] ~ " World!\n"  ?

And "Hello " ~ ['W','o','r','l','d','\n']   ?

And "Hello World!" ~ '\n' ?
And  null ~ "Hello World!\n" ?

Currently DMD puts \0 in some cases but not others, and it's rather random.

The root cause is that this trailing zero is not part of the type, it's 
part of the literal. There are no rules for how literals are propagated 
inside expressions, they are just literals. This is a mess.

There is a second difference.
Array literals of char type, have completely different semantics from 
string literals. In module scope:

char[] x = ['a'];  // OK -- array literals can have an implicit .dup
char[] y = "b";    // illegal

This is a big problem for CTFE, because for CTFE, a string is just a 
compile-time value, it's neither string literal nor array literal!

See bug 8660 for further details of the problems this causes.


A proposal to clean up this mess
--------------------------------

Any compile-time value of type immutable(char)[] or const(char)[], 
behaves a string literals currently do, and will have a \0 appended when 
it is stored in the executable.

ie,

enum hello = ['H', 'e', 'l', 'l', 'o', '\n'];
printf(hello);

will work.

Any value of type char[], which is generated at compile time, will not 
have the trailing \0, and it will do an implicit dup (as current array 
literals do).

char [] foo()
{
     return "abc";
}

char [] x = foo();

// x does not have a trailing \0, and it is implicitly duped, even 
though it was not declared with an array literal.

-------------------
So that the difference between string literals and char array literals 
would simply be that the latter are polysemous. There would be no 
semantics associated with the form of the literal itself.


We still have this oddity:


void foo(char qqq = 'b') {

    string x = "abc";            // trailing \0
    string y = ['a', 'b', 'c'];  // trailing \0
    string z = ['a', qqq, 'c'];  // no trailing \0
}

This is because we made the (IMHO mistaken) decision to allow variables 
inside array literals.
This is the reason why I listed _compile time value_ in the requirement 
for having a \0, rather than entirely basing it on the type.

We could fix that with a language change: an array literal which 
contains a variable should not be of immutable type. It should be of 
mutable type (or const, in the case where it contains other, immutable 
values).

So char [] w = ['a', qqq, 'c']; should compile (it currently doesn't, 
even though w is allocated on the heap).

But that's a separate proposal from the one I'm making here. I just need 
a decision on the main proposal so that I can fix a pile of CTFE bugs.


More information about the Digitalmars-d mailing list