[Issue 8660] New: Unclear semantics of array literals of char type, vs string literals

Don Clugston dac at nospam.com
Fri Sep 14 08:00:41 PDT 2012


On 14/09/12 14:50, monarch_dodra wrote:
> On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:
>> --- Comment #0 from Don <clugdbug at yahoo.com.au> 2012-09-14 04:28:17
>> PDT ---
>> Array literals of char type, have completely different semantics from
>> string
>> literals. In module scope:
>>
>> char[] x = ['a'];  // OK -- array literals can have an implicit .dup
>> char[] y = "b";    // illegal
>>
>> A second difference is that string literals have a trailing \0. It's
>> important
>> for compatibility with C, but is barely mentioned in the spec. The
>> spec does
>> not state if the trailing \0 is still present after operations like
>> concatenation.
>
> I think this is the normal behavior actually. When you write "char[] x =
> ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are
> just letting x point to a stack allocated array of chars.

I don't think you've looked at the compiler source code...
The dup is in e2ir.c:4820.

> So the
> assignment is legal (but kind of unsafe actually, if you ever leak x).

Yes it's legal. In my view it is a design mistake in the language.
The issue now is how to minimize the damage from it.


> On the other hand, you can't bind y to an array of immutable chars, as
> that would subvert the type system.
>
> This, on the other hand, is legal.
> char[] y = "b".dup;
>
> I do not know how to initialize a char[] on the stack though (Appart
> from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I
> don't know of any workaround.
>
> I think a good solution would be to request the "m" prefix for literals,
> which would initialize them as "mutable":
> x = m"some mutable string";
>
>> A second difference is that string literals have a trailing \0. It's
>> important
>> for compatibility with C, but is barely mentioned in the spec. The
>> spec does
>> not state if the trailing \0 is still present after operations like
>> concatenation.
>>
>> CTFE can use either, but it has to choose one. This leads to odd effects:
>>
>> string foo(bool b) {
>>     string c = ['a'];
>>     string d = "a";
>>     if (b)
>>         return c ~ c;
>>     else
>>         return c ~ d;
>> }
>>
>> char[] x = foo(true);   // ok
>> char[] y = foo(false);  // rejected!
>>
>> This is really bizarre because at run time, there is no difference
>> between
>> foo(true) and foo(false). They both return a slice of something
>> allocated on
>> the heap. I think x = foo(true) should be rejected as well, it has an
>> implicit
>> cast from immutable to mutable.
>
> Good point. For anybody reading though, the actual code example should be
> enum char[] x = foo(true);   // ok
> enum char[] y = foo(false);  // rejected!

No it should not.
The code example was correct. These are static variables.

>
>> I think the best way to clean up this mess would be to convert char[]
>> array
>> literals into string literals whenever possible. This would mean that
>> string
>> literals may occasionally be of *mutable* type! This would means that
>> whenever
>> they are assigned to a mutable variable, an implicit .dup gets added
>> (just as
>> happens now with array literals). The trailing zero would not be duped.
>> ie:
>> A string literal of mutable type should behaves the way a char[] array
>> literal
>> behaves now.
>> A char[] array literal of immutable type should behave the way a
>> string literal
>> does now.
>
> I think this would work with my "m" suggestion

Not necessary. This is only a question about what happens with the 
compiler internals.


More information about the Digitalmars-d-bugs mailing list