No we should not support enum types derived from strings

Jon Degenhardt jond at noreply.com
Fri May 7 22:34:19 UTC 2021


On Friday, 7 May 2021 at 15:24:42 UTC, Andrei Alexandrescu wrote:
> 0. We put a String type in the standard library. It uses UTF8 
> inside and supports iteration by either bytes, UTF8, UTF16, or 
> UTF32. It manages its own memory so no need for the GC. It 
> disallows remote coupling across callers/callees. Case closed.

This is a bit orthogonal, but... An important characteristic of 
utf-8 arrays is that they are simultaneously a random access 
range of bytes and an input range of utf-8 characters. For 
efficiency it's often important to switch back and forth between 
these two interpretations.

`byLine` is one type of example, where a byte oriented search is 
done (e.g. with `memchr`), but afterward the representation array 
is accessed as utf-8 input range.

`byLine` implementations will usually work by iterating forward, 
but there are random access use cases as well. For example, it is 
perfectly reasonable to divide a utf-8 array in roughly in half 
using byte offsets, then searching for the nearest utf-8 
character boundary. At after this both halves are treated as 
utf-8 input ranges, not random access.

This switching between interpretations doesn't fit well with 
current distinction between `char[]` and `byte[]`. A numbers of 
algorithms in phobos operate on one or the other, but not both.

It'd be very useful to have an approach to utf-8 strings that 
enabled switching interpretations easily, without casting.

--Jon


More information about the Digitalmars-d mailing list