No we should not support enum types derived from strings
Jon Degenhardt
jond at noreply.com
Sat May 8 20:22:28 UTC 2021
On Saturday, 8 May 2021 at 19:33:45 UTC, guai wrote:
> ...
> But you cannot split a string wherever you want treating it as
> bytes. It most certainly wouldn't work with all the languages
> out there.
Sure you can. It's necessary to take of advantage of the
properties of utf-8 encoding to do it. That is, it's necessary to
find a nearby utf-8 character boundary, but utf-8 is defined in a
manner that enables this. Take a look at [section 2.5 Encoding
Forms](http://www.unicode.org/versions/Unicode13.0.0/ch02.pdf#G13708) in the Unicode Standards doc. It describes exactly this.
> With string you cannot get a char by index, you must read them
> sequentially.
Correct, you cannot find a unicode character using a character
based index without processing sequentially. But for large
classes of algorithms this is not necessary. That is, there is
often no need to find, for example, the 100th character. If all
an algorithm needs to do is split a string roughly in half, then
use the byte offsets to find the halfway point and then look for
a utf-8 character boundary. If the algorithm is based on some
other boundary, say, token boundaries, then find one of those
boundaries.
More information about the Digitalmars-d
mailing list