Mutable enums

Mon Nov 14 13:28:52 PST 2011

On 11/14/2011 09:39 PM, Steven Schveighoffer wrote:
> On Mon, 14 Nov 2011 14:59:50 -0500, Timon Gehr <timon.gehr at gmx.ch> wrote:
>
>> On 11/14/2011 08:37 PM, Steven Schveighoffer wrote:
>>> On Mon, 14 Nov 2011 13:37:18 -0500, Timon Gehr <timon.gehr at gmx.ch>
>>> wrote:
>>>
>>>> On 11/14/2011 02:13 PM, Steven Schveighoffer wrote:
>>>>> On Mon, 14 Nov 2011 03:27:21 -0500, Timon Gehr <timon.gehr at gmx.ch>
>>>>> wrote:
>>>>>
>>>>>> On 11/14/2011 01:02 AM, bearophile wrote:
>>>>>>> Jonathan M Davis:
>>>>>>>
>>>>>>>>> import std.algorithm;
>>>>>>>>> void main() {
>>>>>>>>> enum a = [3, 1, 2];
>>>>>>>>> enum s = sort(a);
>>>>>>>>> assert(equal(a, [3, 1, 2]));
>>>>>>>>> assert(equal(s, [1, 2, 3]));
>>>>>>>>> }
>>>>>>>>
>>>>>>>> It's not a bug. Those an manifest constants. They're copy-pasted
>>>>>>>> into whatever
>>>>>>>> code you used them in. So,
>>>>>>>>
>>>>>>>> enum a = [3, 1, 2];
>>>>>>>> enum s = sort(a);
>>>>>>>>
>>>>>>>> is equivalent to
>>>>>>>>
>>>>>>>> enum a = [3, 1, 2];
>>>>>>>> enum s = sort([3, 1, 2]);
>>>>>>>
>>>>>>> You are right, there's no DMD bug here. Yet, it's a bit
>>>>>>> surprising to
>>>>>>> sort in-place a "constant". I have to stop thinking of them as
>>>>>>> constants. I don't like this design of enums...
>>>>>>
>>>>>> It is the right design. Why should enum imply const or immutable? (or
>>>>>> inout, for that matter). They are completely orthogonal.
>>>>>
>>>>> There is definitely some debatable practice here for wherever enum is
>>>>> used on an array.
>>>>>
>>>>> Consider that:
>>>>>
>>>>> enum a = "hello";
>>>>>
>>>>> foo(a);
>>>>>
>>>>> Does not allocate heap memory, even though "hello" is a reference
>>>>> type.
>>>>>
>>>>> However:
>>>>>
>>>>> enum a = ['h', 'e', 'l', 'l', 'o'];
>>>>>
>>>>> foo(a);
>>>>>
>>>>> Allocates heap memory every time a is *used*. This is
>>>>> counter-intuitive,
>>>>> one uses enum to define things using the compiler, not during runtime.
>>>>> It's used to invoke CTFE, to avoid heap allocation. It's not a
>>>>> glorified
>>>>> #define macro.
>>>>>
>>>>> The deep issue here is not that enum is used as a manifest
>>>>> constant, but
>>>>> rather the fact that enum can map to a *function call* rather than the
>>>>> *result* of that function call.
>>>>>
>>>>> Would you say this should be acceptable?
>>>>>
>>>>> enum a = malloc(5);
>>>>>
>>>>> foo(a); // calls malloc(5) and passes the result to foo.
>>>>>
>>>>> If the [...] form is an acceptable enum, I contend that malloc
>>>>> should be
>>>>> acceptable as well.
>>>>>
>>>>
>>>> a indeed refers to the result of the evaluation of ['h', 'e', 'l',
>>>> 'l', 'o'].
>>>>
>>>> enum a = {return ['h', 'e', 'l', 'l', 'o'];}(); // also allocates on
>>>> every use
>>>>
>>>> But malloc is not CTFE-able, that is why it fails.
>>>
>>> You are comparing apples to oranges here. Whether it's CTFE able or not
>>> has nothing to do with it, since the code is executed at runtime, not
>>> compile time.
>>>
>>
>> The code is executed at compile time. It is just that the value is
>> later created by allocating at runtime.
>>
>> enum foo = {writeln("foo"); return [1,2,3];}(); // fails, because
>> writeln is not ctfe-able.
>
> Look at the code generated for enum a = [1, 2, 3]. using a is replaced
> with a call to _d_arrayliteral. There is no CTFE going on.
>

There is some ctfe going on, but the compiler has to allocate the result 
anew every time it is used. So there is also some runtime overhead.

To make my point clearer:

int foo(){return 100;}
enum a = [foo(), foo(), foo()]; // a is the array literal [100, 100, 100];

void main(){
     auto x = a; // this does *not* call foo. But it allocates a new 
array literal
}

>>
>>
>>>>
>>>>
>>>>> My view is that enum should only be acceptable on data that is
>>>>> immutable, or implicitly cast to immutable,
>>>>
>>>> Too restrictive imho.
>>>
>>> It allows the compiler to evaluate the enum at compile time, and store
>>> any referenced data in ROM, avoiding frequent heap allocations (similar
>>> to string literals).
>>>
>>> IMO, type freedom is lower on the priority list than performance.
>>>
>>> You can already define a symbol that calls arbitrary code at runtime:
>>>
>>> @property int[] a() { return [3, 1, 2];}
>>>
>>> Why should we muddy enum's goals with also being able to call functions
>>> during runtime?
>>>
>>
>> As I said, I would not miss the capability of enums to create mutable
>> arrays a lot. Usually you don't want that behavior, and explicitly
>> .dup-ing is just fine.
>>
>> But I think it is a bit exaggerated to say enums can call functions at
>> runtime. It is up to the compiler how to implement the array allocation.
>
> The compiler has no choice. It must develop the array at runtime, or
> else the type allows one to modify the source value (just like in D1 how
> you could modify string literals). In essence, the compiler is creating
> a new copy for every usage (and building it from scratch).
>

That is a quality of implementation issue. The language semantics do not 
require that.

>>
>>>>
>>>>> and should *never* map to an
>>>>> expression that calls a function during runtime.
>>>>>
>>>>
>>>> Well, I would not miss that at all.
>>>> But being stored as enum should not imply restrictions on type
>>>> qualifiers.
>>>
>>> The restrictions are required in order to avoid calling runtime
>>> functions for enum usage. Without the restrictions, you must necessarily
>>> call runtime functions for any reference-based types (to avoid modifying
>>> the original).
>>
>> Yes, I don't need that. But I don't really want compile time
>> capabilities hampered.
>>
>> enum a = [2,1,4];
>> enum b = sort(a); // should be fine.
>
> I was actually surprised that this compiles. But this should not be a
> problem even if a was immutable(int)[]. sort should be able to create a
> copy of an immutable array in order to sort it. It doesn't matter the
> performance hit, because this should all be done at compile time.
>

It does not, but explicitly calling .dup works
immutable x = [3,2,1];
immutable y = sort(x.dup);

>>>
>>> Note that I'm not saying literals in general should not trigger heap
>>> allocations, I'm saying assigning such literals to enums should require
>>> unrestricted copying without runtime function calls.
>>
>> Yes, I get that. And I think it makes sense. But I am not (yet?)
>> convinced that the solution to make all enums non-assignable,
>> head-mutable and tail-immutable is satisfying.
>
> When I see an enum, I think "evaluated at compile time". No matter how
> complex it is to build that value, it should be built at compile-time
> and *used* at runtime. No complex function calls should be done at
> runtime, an enum is a value.

Exactly. Therefore you assign from it by copying it.

Compare to static array.

int[10] x = [1,2,3,4,5,6,7,8,9,0];

x still needs to be initialized at runtime.

>
> I did an interesting little test:
>
> import std.algorithm;
> import std.stdio;
>
> int[] foo(int[] x)
> {
> return x ~ x;
> }
> enum a = [3, 1, 2];
> enum b = sort(foo(foo(foo(a))));
>
> void main()
> {
> writeln(b);
> }
>
> Want to see the assembly generated for the writeln call?
>
> push 018h
> mov EAX,offset FLAT:_D11TypeInfo_Ai6__initZ at SYM32
> push EAX
> call _d_arrayliteralTX at PC32
> add ESP,8
> mov ECX,1
> mov [EAX],ECX
> mov 4[EAX],ECX
> mov 8[EAX],ECX
> mov 0Ch[EAX],ECX
> mov 010h[EAX],ECX
> mov 014h[EAX],ECX
> mov 018h[EAX],ECX
> mov 01Ch[EAX],ECX
> mov EDX,2
> mov 020h[EAX],EDX
> mov 024h[EAX],EDX
> mov 028h[EAX],EDX
> mov 02Ch[EAX],EDX
> mov 030h[EAX],EDX
> mov 034h[EAX],EDX
> mov 038h[EAX],EDX
> mov 03Ch[EAX],EDX
> mov EBX,3
> mov 040h[EAX],EBX
> mov 044h[EAX],EBX
> mov 048h[EAX],EBX
> mov 04Ch[EAX],EBX
> mov 050h[EAX],EBX
> mov 054h[EAX],EBX
> mov 058h[EAX],EBX
> mov 05Ch[EAX],EBX
> mov ECX,EAX
> mov EAX,018h
> mov -8[EBP],EAX
> mov -4[EBP],ECX
> mov EDX,-4[EBP]
> mov EAX,-8[EBP]
> push EDX
> push EAX
> call
> _D3std5stdio76__T7writelnTS3std5range37__T11SortedRangeTAiVAyaa5_61203c2062Z11SortedRangeZ7writelnFS3std5range37__T11SortedRangeTAiVAyaa5_61203c2062Z11SortedRangeZv at PC32
>
>
>
> Really? That's a better solution than using ROM space to store the
> result of the expression as evaluated at compile time? The worst part is
> that this will be used *EVERY TIME* I use the enum b (even if I pass it
> as a const array).
>

That just tells us that DMD sucks at generating code for array literals.

This generates identical code:

import std.stdio;

void main() {
     writeln([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 
3, 3, 3, 3, 3]);
}

You don't need enums for that.

What it actually should for both our examples is more like the following:

import std.stdio;

immutable _somewhereinrom = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
2, 2, 3, 3, 3, 3, 3, 3, 3, 3];

void main() {
     writeln(_somewhereinrom.dup);
}

push   %ebp
mov    %esp,%ebp
pushl  0x8097184
pushl  0x8097180
mov    $0x80975c8,%eax
push   %eax
call   8079470 <_adDupT>
add    $0xc,%esp
push   %edx
push   %eax
call   807041c <_D3std5stdio15__T7writelnTAiZ7writelnFAiZv>
xor    %eax,%eax
pop    %ebp
ret

If writeln would actually be const correct, the compiler could even get 
rid of the allocation.

This is not about enums that much, it is about array literals.

The fact that stack static array initialization allocates is one of DMDs 
bigger warts.

Look at the ridiculous code generated for the following example:

void main() {
     int[24] x = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 
3, 3, 3, 3, 3, 3];
     writeln(x);
}

>>
>>>
>>> I don't think you would miss this as much as you think. Assigning a
>>> non-immutable array from an immutable one is as easy as adding a .dup,
>>> and then the code is more clear that an allocation is taking place.
>>>
>>
>> It would be somewhat odd.
>>
>> enum a = [2,1,4];
>> enum b = sort(a.dup); // what exactly is that 'a.dup' thing?
>
> I don't think .dup should be necessary at compile time. Creating a
> sorted copy of an immutable array should be quite doable.
>

I agree, phobos won't currently do it though.

>> enum c = a.dup; // does this implicitly convert to immutable, or what
>> happens here?
>
> Either a compile error (cannot store mutable reference data as an enum),
> or an implicit conversion back to immutable.
>
>> enum d = sort(c); // does not work?
>>
>> enum e = foo(a.dup, b.dup, c.dup, d.dup);
>
> Again, I don't think .dup would be used for dependent enums, I was
> rather thinking dup would be used where you need a mutable copy of an
> array during enum usage in normal code.
>

But if the type of a,b,c,d is immutable(int)[] and foo is a function 
that takes 4 int[]s then the .dup's are necessary to pass type checking.