String compare performance

Sun Nov 28 18:11:00 PST 2010

On 2010-11-28 20:57:38 -0500, bearophobic <notbear at cave.net> said:

> Stewart Gordon Wrote:
> 
>> On 27/11/2010 23:04, Kagamin wrote:
>>> bearophile Wrote:
>>> 
>>>>> Also, is there a way to bit-compare given memory areas at much
>>>>> higher speed than element per element (I mean for arrays in
>>>>> general)?
>>>> 
>>>> I don't know. I think you can't.
>>> 
>>> You can use memcmp, though only for utf-8 strings.
>> 
>> Only for utf-8 strings?  Why's that?  I would've thought memcmp to be
>> type agnostic.
>> 
>> Stewart.
> 
> D community is amazing cult of premature optimization fans. Any one of 
> you heard of canonically equivalent sequences? The integrated Unicode 
> support is a clusterfuck. Please do compare ASCII strings with memcmp, 
> but no Unicode. Where did the original poster pull this problem from, 
> his ass? "My system runs 100,000,000,000 instructions per second, but 
> this comparison of 4 letter strings uses 5 cycles too much! This is the 
> only problem on the way to world domination with my $500 Microsoft Word 
> clone!". No wait, the problems are completely imaginatory.

Comparing unicode UTF-* strings using memcmp is fine as long as what 
you want to know is whether the code points are the same. If your point 
was that per-code-point comparisons aren't the right way to compare 
Unicode strings (in most situations), then I support this view too. 
Though if that's what you wanted to say, you could have made your point 
clearer.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/