No we should not support enum types derived from strings

guai guai at inbox.ru
Sat May 8 19:06:48 UTC 2021


On Saturday, 8 May 2021 at 16:25:31 UTC, Berni44 wrote:
> On Saturday, 8 May 2021 at 16:04:24 UTC, guai wrote:
>> Dividing utf-8 array and searching for the nearest char may 
>> split inside a combining character which isn't a thing you 
>> usually want.
>
> It is not difficult to recognize this case and go back 1 to 3 
> bytes to reach a correct splitting place. UTF-8 was designed 
> with this in mind.


I ment this [combining 
characters](https://en.wikipedia.org/wiki/Combining_character). 
they are language-specific, but most of the time the string does 
not contain any clue which language is it.

> - I can imagine, that this can be useful in divide-and-conquer 
> algorithms, like binary search.

They must be applied with great careful to non-ascii texts. What 
about RTL for example? You cannot split inside RTL block

> - Or you want to cut a string into pieces of a certain length 
> (again 50?), where the exact length is not so much important.

For what business task would I do that? I may want to split a 
string on some char subsequence for lexing. But one cannot assume 
lengths of those chunks.

> So you just jump ahead 50, go back again and split at this 
> point. If there are a lot of non ascii characters in between, 
> this is of course shorter, but maybe ok, because speed is more 
> important.

Not sure if speed is more important than correctness.

> - You want to process pieces of a string in parallel: Cut it in 
> 16 pieces and let your 16 cores work on each of them.

I'm not sure if this is possible with all the quirks of unicode. 
Never herd even of parallel processors of structured texts like 
xml.


More information about the Digitalmars-d mailing list