No we should not support enum types derived from strings

Berni44 someone at somemail.com
Sat May 8 20:19:51 UTC 2021


On Saturday, 8 May 2021 at 19:06:48 UTC, guai wrote:
> I ment this [combining 
> characters](https://en.wikipedia.org/wiki/Combining_character). 
> they are language-specific, but most of the time the string 
> does not contain any clue which language is it.

You are talking about generic algorithms that work for every 
script. But unicode allows for algorithms only supporting 
subsets. If your subset doesn't contain combining characters, you 
don't need to care about them. And else you may need to go back 
to the next base character. Depends on the usecase.

>> - I can imagine, that this can be useful in divide-and-conquer 
>> algorithms, like binary search.
>
> They must be applied with great careful to non-ascii texts. 
> What about RTL for example? You cannot split inside RTL block

Oh, yes, you can! Think of an algorithm which is doing 
cryptographic analysis and counting consecutive pairs of ascii 
characters. For that it doesn't matter if there is RTL text cut 
into pieces.

>> - Or you want to cut a string into pieces of a certain length 
>> (again 50?), where the exact length is not so much important.
>
> For what business task would I do that?

Simple wrapping to avoid loosing text when printing, or to avoid 
having to scroll vertically. Is probably not useful for a high 
quality program...

> I may want to split a string on some char subsequence for 
> lexing. But one cannot assume lengths of those chunks.

Depending on the use case you may know ahead.

>> So you just jump ahead 50, go back again and split at this 
>> point. If there are a lot of non ascii characters in between, 
>> this is of course shorter, but maybe ok, because speed is more 
>> important.
>
> Not sure if speed is more important than correctness.

Of course, this again depends on the use case. You can't say that 
in general.

>> - You want to process pieces of a string in parallel: Cut it 
>> in 16 pieces and let your 16 cores work on each of them.
>
> I'm not sure if this is possible with all the quirks of unicode.

Think again of the cryptographic analysis above, for an example. 
(Or checking wikipedia entries for whatever automatically.)

Keep in mind, that we do not always have to support everything of 
unicode. If we know ahead, that our text contains mainly ascii 
and aside from this only a few base characters, but never 
combining characters and so on, we can use different algorithms 
which might be simpler or faster or both. To make sure, that this 
constraint holds, is then something, that has to be done outside 
of the algorithm.

> Never herd even of parallel processors of structured texts like 
> xml.

I would judge it much more difficult to process xml in parallel 
than to do the same with unicode.


More information about the Digitalmars-d mailing list