Why is std.regex slow, well here is one reason!

Patrick Schluter Patrick.Schluter at bbox.fr
Sat Feb 25 13:19:55 UTC 2023


On Friday, 24 February 2023 at 18:39:02 UTC, Walter Bright wrote:
> On 2/24/2023 2:27 AM, Richard (Rikki) Andrew Cattermole wrote:
>> who knew those innocent looking symbols, all in their tables 
>> could be so complicated!
>
> Because the Unicode designers are in love with complexity (like 
> far too many engineers).

Languages are complex and often contradictory. The moment you 
want, f.ex. taking letter cases you're in for the complexity. 
Uppercase i is different in Turkish than in any other language. ß 
does not have uppercase (uppercase is SS) but has a titlecase 
(titlecase is not the same thing as uppercase) ß. Changing cases 
is not reversible in general (Greek has two lower case sigma but 
only one uppercase, German again with ß, which becomes SS in 
uppercase, but not all SS can be ß wenn lowercased). This were 
just some simple example in Latin scripts.
Unicode is complex because language is complex. Is it perfect? 
No. Is it bad, far from it.



More information about the Digitalmars-d mailing list