Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Fri Jun 22 18:15:22 UTC 2018

On 6/22/18 1:41 PM, Stefan Koch wrote:
> On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
>> On 6/21/18 6:46 PM, Per Nordlöw wrote:
>>> I've discovered the annoying fact that std.conv.to doesn't scale for 
>>> enum to string conversion when the enum has hundreds of members. This 
>>> because of a call to `NoDuplicates` which has (at least) O(n*log(n) 
>>> time and space complexity.
>>>
>>> So I've come up with
>>>
>>>
>>> /** Faster implementation of `std.conv.to`.
>>>   */
>>> string toString(T)(T value) @safe pure nothrow @nogc
>>> if (is(T == enum))
>>> {
>>>      final switch (value)
>>>      {
>>>          static foreach (member; __traits(allMembers, T))
>>>          {
>>>          case __traits(getMember, T, member):
>>>              return member;
>>>          }
>>>      }
>>> }
>>>
>>> ///
>>> @safe pure nothrow @nogc unittest
>>> {
>>>      enum E { unknown, x, y, z, }
>>>      assert(E.x.toString == "x");
>>>      assert(E.y.toString == "y");
>>>      assert(E.z.toString == "z");
>>> }
>>>
>>>
>>> The question know is: How do I make this support enums with 
>>> enumerator aliases without needing to call `NoDuplicates`?
>>
>> You can't. That's why it's in there. And I can't think of a better 
>> algorithm than O(nlgn). That's what it would cost to sort anyway.
>>
>> The sucky thing is, the compiler is *already* doing a sort on the 
>> items in the switch, and *already* doing the duplicate check. It would 
>> be cool to be able to leverage this mechanism to avoid the library 
>> solution, but I don't know how we can do that, as the semantics for 
>> switch are well defined, and there's no other way to hook this builtin 
>> functionality.
>>
>> One thing that may be worthwhile is checking to see if a CTFE solution 
>> is better than a template solution. In other words:
>>
>> string[] dedup(string[] identifiers) { ... }
>>
>> enum items = dedup(__traits(allMembers, T));
>>
>> static foreach(member; items) ...
>>
>> Just avoiding all the template symbol generation may make it worth it, 
>> even if the dedup function is O(n^2) complexity.
>>
> 
> 
> I do have my own ctfe utils for this
> {https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, 
> however without dedup (because I'd rather be alerted when it happens) 
> what you have to do is to sort the enum values.
> (I'd suggest a non-recursive mergesort.)

How will that perform in CTFE? I'm concerned about swapping values 
making it allocate new arrays all over the place.

> However if there are no duplicates there's no need to deduplicate, the 
> reason it is deduplicated it because the switch_statement forbids 
> duplicates.

If you can sort, deduplicating is as easy as checking if the value is 
the same as the previous. So really, we just need a good compile-time 
sort for strings.

> I may add that this performs very well with newCTFE I don't know about 
> the old engine, but I'd assume even there it's faster.
I think your algorithm is roughly the same as Nordlow's. Basically you 
are using the compiler to sort the array (as it does anyway for a switch 
statement).

-Steve