No we should not support enum types derived from strings

Jon Degenhardt jond at noreply.com
Sat May 8 20:22:28 UTC 2021


On Saturday, 8 May 2021 at 19:33:45 UTC, guai wrote:
> ...
> But you cannot split a string wherever you want treating it as 
> bytes. It most certainly wouldn't work with all the languages 
> out there.

Sure you can. It's necessary to take of advantage of the 
properties of utf-8 encoding to do it. That is, it's necessary to 
find a nearby utf-8 character boundary, but utf-8 is defined in a 
manner that enables this. Take a look at [section 2.5 Encoding 
Forms](http://www.unicode.org/versions/Unicode13.0.0/ch02.pdf#G13708) in the Unicode Standards doc. It describes exactly this.

> With string you cannot get a char by index, you must read them 
> sequentially.

Correct, you cannot find a unicode character using a character 
based index without processing sequentially. But for large 
classes of algorithms this is not necessary. That is, there is 
often no need to find, for example, the 100th character. If all 
an algorithm needs to do is split a string roughly in half, then 
use the byte offsets to find the halfway point and then look for 
a utf-8 character boundary. If the algorithm is based on some 
other boundary, say, token boundaries, then find one of those 
boundaries.



More information about the Digitalmars-d mailing list