Why is std.regex slow, well here is one reason!

Max Samukha maxsamukha at gmail.com
Fri Feb 24 20:05:48 UTC 2023


On Friday, 24 February 2023 at 18:34:42 UTC, Walter Bright wrote:

> Let's say I write "x". Is that the letter x, or the math symbol 
> x? I know which it is from the context. But in Unicode, there's 
> a letter x and the math symbol x, although they look identical.

Same as 'A' in KOI8 or Windows-1251? Latin and Cyrillic 'A' look 
identical but have different codes. Not that I disagree with you, 
but Unicode just upheld the tradition.

>
> There is no end to semantic meanings for "x", and so any 
> attempt to encode semantics into Unicode is doomed from the 
> outset.

That is similar to attempts to encode semantics in, say, binary 
operators - they are nothing but functions, but...

>
> Printed media do not seem to require these hidden semantics, 
> why should Unicode? If you print the Unicode on paper, thereby 
> losing its meaning, what again is the purpose of Unicode?

Looks like another case of caching, one of the two hard problems 
in computing. The meaning of a code point can be inferred without 
the need to keep track of the context.

>
> Equally stupid are:
>
> 1. encoding of various fonts
>
> 2. multiple encodings of the same character, leading to 
> "normalization" problems

I agree that multiple encodings for the same abstract character 
is not a great idea, but "same character" is unfortunately not 
well defined. Is Latin 'A' the same character as Cyrillic 'A'? 
Should they have the same code?

>
> 3. encodings to enable/disable the direction the glyphs are to 
> be read
>
> Implementing all this stuff is hopelessly complex, which is why 
> Unicode had to introduce "levels" of Unicode support.

That's true.


More information about the Digitalmars-d mailing list