Why is std.regex slow, well here is one reason!
Max Samukha
maxsamukha at gmail.com
Fri Feb 24 20:05:48 UTC 2023
On Friday, 24 February 2023 at 18:34:42 UTC, Walter Bright wrote:
> Let's say I write "x". Is that the letter x, or the math symbol
> x? I know which it is from the context. But in Unicode, there's
> a letter x and the math symbol x, although they look identical.
Same as 'A' in KOI8 or Windows-1251? Latin and Cyrillic 'A' look
identical but have different codes. Not that I disagree with you,
but Unicode just upheld the tradition.
>
> There is no end to semantic meanings for "x", and so any
> attempt to encode semantics into Unicode is doomed from the
> outset.
That is similar to attempts to encode semantics in, say, binary
operators - they are nothing but functions, but...
>
> Printed media do not seem to require these hidden semantics,
> why should Unicode? If you print the Unicode on paper, thereby
> losing its meaning, what again is the purpose of Unicode?
Looks like another case of caching, one of the two hard problems
in computing. The meaning of a code point can be inferred without
the need to keep track of the context.
>
> Equally stupid are:
>
> 1. encoding of various fonts
>
> 2. multiple encodings of the same character, leading to
> "normalization" problems
I agree that multiple encodings for the same abstract character
is not a great idea, but "same character" is unfortunately not
well defined. Is Latin 'A' the same character as Cyrillic 'A'?
Should they have the same code?
>
> 3. encodings to enable/disable the direction the glyphs are to
> be read
>
> Implementing all this stuff is hopelessly complex, which is why
> Unicode had to introduce "levels" of Unicode support.
That's true.
More information about the Digitalmars-d
mailing list