No we should not support enum types derived from strings
Jon Degenhardt
jond at noreply.com
Fri May 7 22:34:19 UTC 2021
On Friday, 7 May 2021 at 15:24:42 UTC, Andrei Alexandrescu wrote:
> 0. We put a String type in the standard library. It uses UTF8
> inside and supports iteration by either bytes, UTF8, UTF16, or
> UTF32. It manages its own memory so no need for the GC. It
> disallows remote coupling across callers/callees. Case closed.
This is a bit orthogonal, but... An important characteristic of
utf-8 arrays is that they are simultaneously a random access
range of bytes and an input range of utf-8 characters. For
efficiency it's often important to switch back and forth between
these two interpretations.
`byLine` is one type of example, where a byte oriented search is
done (e.g. with `memchr`), but afterward the representation array
is accessed as utf-8 input range.
`byLine` implementations will usually work by iterating forward,
but there are random access use cases as well. For example, it is
perfectly reasonable to divide a utf-8 array in roughly in half
using byte offsets, then searching for the nearest utf-8
character boundary. At after this both halves are treated as
utf-8 input ranges, not random access.
This switching between interpretations doesn't fit well with
current distinction between `char[]` and `byte[]`. A numbers of
algorithms in phobos operate on one or the other, but not both.
It'd be very useful to have an approach to utf-8 strings that
enabled switching interpretations easily, without casting.
--Jon
More information about the Digitalmars-d
mailing list