Proposal: clean up semantics of array literals vs string literals
Don Clugston
dac at nospam.com
Tue Oct 2 04:11:00 PDT 2012
The problem
-----------
String literals in D are a little bit magical; they have a trailing \0.
This means that is possible to write,
printf("Hello, World!\n");
without including a trailing \0. This is important for compatibility
with C. This trailing \0 is mentioned in the spec but only incidentally,
and generally in connection with printf.
But the semantics are not well defined.
printf("Hello, W" ~ "orld!\n");
Does this have a trailing \0 ? I think it should, because it improves
readability of string literals that are longer than one line. Currently
DMD adds a \0, but it is not in the spec.
Now consider array literals.
printf(['H','e', 'l', 'l','o','\n']);
Does this have a trailing \0 ? Currently DMD does not put one in.
How about ['H','e', 'l', 'l','o'] ~ " World!\n" ?
And "Hello " ~ ['W','o','r','l','d','\n'] ?
And "Hello World!" ~ '\n' ?
And null ~ "Hello World!\n" ?
Currently DMD puts \0 in some cases but not others, and it's rather random.
The root cause is that this trailing zero is not part of the type, it's
part of the literal. There are no rules for how literals are propagated
inside expressions, they are just literals. This is a mess.
There is a second difference.
Array literals of char type, have completely different semantics from
string literals. In module scope:
char[] x = ['a']; // OK -- array literals can have an implicit .dup
char[] y = "b"; // illegal
This is a big problem for CTFE, because for CTFE, a string is just a
compile-time value, it's neither string literal nor array literal!
See bug 8660 for further details of the problems this causes.
A proposal to clean up this mess
--------------------------------
Any compile-time value of type immutable(char)[] or const(char)[],
behaves a string literals currently do, and will have a \0 appended when
it is stored in the executable.
ie,
enum hello = ['H', 'e', 'l', 'l', 'o', '\n'];
printf(hello);
will work.
Any value of type char[], which is generated at compile time, will not
have the trailing \0, and it will do an implicit dup (as current array
literals do).
char [] foo()
{
return "abc";
}
char [] x = foo();
// x does not have a trailing \0, and it is implicitly duped, even
though it was not declared with an array literal.
-------------------
So that the difference between string literals and char array literals
would simply be that the latter are polysemous. There would be no
semantics associated with the form of the literal itself.
We still have this oddity:
void foo(char qqq = 'b') {
string x = "abc"; // trailing \0
string y = ['a', 'b', 'c']; // trailing \0
string z = ['a', qqq, 'c']; // no trailing \0
}
This is because we made the (IMHO mistaken) decision to allow variables
inside array literals.
This is the reason why I listed _compile time value_ in the requirement
for having a \0, rather than entirely basing it on the type.
We could fix that with a language change: an array literal which
contains a variable should not be of immutable type. It should be of
mutable type (or const, in the case where it contains other, immutable
values).
So char [] w = ['a', qqq, 'c']; should compile (it currently doesn't,
even though w is allocated on the heap).
But that's a separate proposal from the one I'm making here. I just need
a decision on the main proposal so that I can fix a pile of CTFE bugs.
More information about the Digitalmars-d
mailing list