No we should not support enum types derived from strings

Berni44 someone at somemail.com
Sat May 8 16:25:31 UTC 2021


On Saturday, 8 May 2021 at 16:04:24 UTC, guai wrote:
> Dividing utf-8 array and searching for the nearest char may 
> split inside a combining character which isn't a thing you 
> usually want.

It is not difficult to recognize this case and go back 1 to 3 
bytes to reach a correct splitting place. UTF-8 was designed with 
this in mind.

- I can imagine, that this can be useful in divide-and-conquer 
algorithms, like binary search.
- Or when you've got for whatever reason the possibility to do 
larger jumps while scanning a string, e.g. when you know there 
are now 50 letters ahead, that do not contain a certain token you 
are looking for, you can safely jump 50 bytes, go back to the 
next splitting point and continue linear search there.
- Or you want to cut a string into pieces of a certain length 
(again 50?), where the exact length is not so much important. So 
you just jump ahead 50, go back again and split at this point. If 
there are a lot of non ascii characters in between, this is of 
course shorter, but maybe ok, because speed is more important.
- You want to process pieces of a string in parallel: Cut it in 
16 pieces and let your 16 cores work on each of them.


More information about the Digitalmars-d mailing list