No we should not support enum types derived from strings
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sat May 8 02:05:42 UTC 2021
On 5/7/21 6:34 PM, Jon Degenhardt wrote:
> On Friday, 7 May 2021 at 15:24:42 UTC, Andrei Alexandrescu wrote:
>> 0. We put a String type in the standard library. It uses UTF8 inside
>> and supports iteration by either bytes, UTF8, UTF16, or UTF32. It
>> manages its own memory so no need for the GC. It disallows remote
>> coupling across callers/callees. Case closed.
>
> This is a bit orthogonal, but... An important characteristic of utf-8
> arrays is that they are simultaneously a random access range of bytes
> and an input range of utf-8 characters. For efficiency it's often
> important to switch back and forth between these two interpretations.
>
> `byLine` is one type of example, where a byte oriented search is done
> (e.g. with `memchr`), but afterward the representation array is accessed
> as utf-8 input range.
>
> `byLine` implementations will usually work by iterating forward, but
> there are random access use cases as well. For example, it is perfectly
> reasonable to divide a utf-8 array in roughly in half using byte
> offsets, then searching for the nearest utf-8 character boundary. At
> after this both halves are treated as utf-8 input ranges, not random
> access.
>
> This switching between interpretations doesn't fit well with current
> distinction between `char[]` and `byte[]`. A numbers of algorithms in
> phobos operate on one or the other, but not both.
>
> It'd be very useful to have an approach to utf-8 strings that enabled
> switching interpretations easily, without casting.
String s;
func1(s.bytes);
func2(s.dchars);
More information about the Digitalmars-d
mailing list