Why is std.regex slow, well here is one reason!

Walter Bright newshound2 at digitalmars.com
Fri Feb 24 18:34:42 UTC 2023


On 2/23/2023 11:28 PM, Max Samukha wrote:
> On Thursday, 23 February 2023 at 23:11:56 UTC, Walter Bright wrote:
>> Unicode is a brilliant idea, but its doom comes from the execrable decision to 
>> apply semantic meaning to glyphs.
> 
> Unicode did not start that. For example, all Cyrillic encodings encode Latin А, 
> K, H, etc. differently than the similarly looking Cyrillic counterparts. Whether 
> that decision was execrable is highly debatable.

Let's say I write "x". Is that the letter x, or the math symbol x? I know which 
it is from the context. But in Unicode, there's a letter x and the math symbol 
x, although they look identical.

There is no end to semantic meanings for "x", and so any attempt to encode 
semantics into Unicode is doomed from the outset.

Printed media do not seem to require these hidden semantics, why should Unicode? 
If you print the Unicode on paper, thereby losing its meaning, what again is the 
purpose of Unicode?

Equally stupid are:

1. encoding of various fonts

2. multiple encodings of the same character, leading to "normalization" problems

3. encodings to enable/disable the direction the glyphs are to be read

Implementing all this stuff is hopelessly complex, which is why Unicode had to 
introduce "levels" of Unicode support.


More information about the Digitalmars-d mailing list