No we should not support enum types derived from strings
Jon Degenhardt
jond at noreply.com
Sat May 8 03:49:56 UTC 2021
On Saturday, 8 May 2021 at 02:05:42 UTC, Andrei Alexandrescu
wrote:
> On 5/7/21 6:34 PM, Jon Degenhardt wrote:
>> It'd be very useful to have an approach to utf-8 strings that
>> enabled switching interpretations easily, without casting.
>
> String s;
> func1(s.bytes);
> func2(s.dchars);
That's not quite what I was getting at. But that's my fault. A
hastily written message that muddled a couple of concepts. Sorry
about that, I need to write up a better description. But there
are two underlying thoughts.
One is being able to convert from a random access byte array to
char input range (e.g. `byUTF`), do something with it (e.g.
`popFront`), then convert that form back to a random access byte
range. This is logically doable because both are views on the
same physical array. However, once something is an input range it
doesn't convert simply to a random access range.
This first one strikes me as potentially challenging because this
dual view on the underlying data is not common, so there's not a
lot of incentive to support it as a general concept.
The second issue is more about current Phobos algorithms that
specialize their implementations depending on whether the
argument is a `char[]` or a `byte[]`. This normally involves
conditioning on `isSomeString` or `isSomeChar`. `char[]` / `char`
pass these tests, `byte[]` / `byte` do not. The cases I remember
are cases where the string form was specialized to have better
performance than the byte form. Look through searching.d for
`isSomeString` use to see this.
The trouble with this is that at the application level it can be
necessary to use a byte array when working with a number
facilities. This often involves I/O. E.g. Reading fixed sized
blocks from an input stream (`File.byChunk`). This operates on
`ubyte[]` arrays. It can be cast to a `char[]`. But, this can run
afoul of autodecoding related routines that expect correctly
formed utf-8 characters. When reading fixed size buffers, the
starts and ends of the buffer will often not fall on utf-8
boundaries, so examining the bytes is necessary to handle these
cases. (And input streams may contain corrupt utf-8 characters.)
I know the above is still not an adequate description. At some
point I'll try to write up something more compelling.
--Jon
More information about the Digitalmars-d
mailing list