RFC: Case-Insensitive Strings (And usually they really do *have*case)

Nick Sabalausky a at a.a
Mon Jan 10 19:33:40 PST 2011


"Michel Fortin" <michel.fortin at michelf.com> wrote in message 
news:iggi34$cqm$1 at digitalmars.com...
> On 2011-01-10 17:00:40 -0500, "Nick Sabalausky" <a at a.a> said:
>
>> Actually, what probably should be stored is a *normalized* folding-case
>> version of the string, because then (if I understand correctly) memcmp 
>> could
>> be used. I don't think memcpy technically works on non-ASCII (unless it's 
>> in
>> normalized form).
>
> Actually, what would be compatible with Unicode collation is probably an 
> array of collation elements. This would also make it useful to sort 
> case-insensitively, not just testing for equality.
>
> Details (perhaps too much details):
> <http://unicode.org/reports/tr10/>
>

I'll have to take a look. (I don't even know what "Unicode collation" is 
:P )

>> In any case, Phobos doesn't currently handle any of that stuff at all, so 
>> my
>> case-insensitive string type wouldn't be taking things backwards in that
>> regard.
>
> Probably not.
>

Actually I was (mostly) wrong: There's already unicode-compatible 
upper/lower case characters functions in std.uni, and these are used by 
std.string.toupper and std.string.tolower (as well as their InPlace 
coutnerparts). Which incidentally means that my Insentitive type worked 
correctly for unicode all along (I didn't know about icmp when I wrote it) - 
well, except for things that need folding case instead of lower-case.

Not only that, but Andrei just committed a unicode fix for icmp (bugzilla 
5443) just a few hours ago. (That's gotta be a record for fastest issue 
fixed in D's bug tracker!)

Still no folding-case though.




More information about the Digitalmars-d mailing list