No we should not support enum types derived from strings

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sat May 8 02:05:42 UTC 2021


On 5/7/21 6:34 PM, Jon Degenhardt wrote:
> On Friday, 7 May 2021 at 15:24:42 UTC, Andrei Alexandrescu wrote:
>> 0. We put a String type in the standard library. It uses UTF8 inside 
>> and supports iteration by either bytes, UTF8, UTF16, or UTF32. It 
>> manages its own memory so no need for the GC. It disallows remote 
>> coupling across callers/callees. Case closed.
> 
> This is a bit orthogonal, but... An important characteristic of utf-8 
> arrays is that they are simultaneously a random access range of bytes 
> and an input range of utf-8 characters. For efficiency it's often 
> important to switch back and forth between these two interpretations.
> 
> `byLine` is one type of example, where a byte oriented search is done 
> (e.g. with `memchr`), but afterward the representation array is accessed 
> as utf-8 input range.
> 
> `byLine` implementations will usually work by iterating forward, but 
> there are random access use cases as well. For example, it is perfectly 
> reasonable to divide a utf-8 array in roughly in half using byte 
> offsets, then searching for the nearest utf-8 character boundary. At 
> after this both halves are treated as utf-8 input ranges, not random 
> access.
> 
> This switching between interpretations doesn't fit well with current 
> distinction between `char[]` and `byte[]`. A numbers of algorithms in 
> phobos operate on one or the other, but not both.
> 
> It'd be very useful to have an approach to utf-8 strings that enabled 
> switching interpretations easily, without casting.

String s;
func1(s.bytes);
func2(s.dchars);




More information about the Digitalmars-d mailing list