RFC: Case-Insensitive Strings (And usually they really do *have*case)

Nick Sabalausky a at a.a
Mon Jan 10 10:46:55 PST 2011


"Jim" <bitcirkel at yahoo.com> wrote in message 
news:igfado$11g3$1 at digitalmars.com...
>> While writing and dealing with all that code I realized something: While
>> programmers are usually heavily conditioned to think of case-sensitivity 
>> as
>> an attribute of the comparison, it's very frequent that the deciding 
>> factor
>> in which comparison to use is *not* the comparison itself but *what* gets
>> compared. And in those cases, you have to use the awful strategy of 
>> "relying
>> on convention" to make sure you get it right in *every* place that
>> particular data gets compared.
>
>
> You have a point. Your case-sensitivity-aware string types will guarantee 
> correctness in a large and complex program. I like that. Ideally though, 
> they would only be compile-time constraints (i.e. not carrying any other 
> data).
>

Not carrying any other data means not caching the lowercase version, which 
means recreating the lowercase version more than necessary. So it's the 
classic speed vs. space tradeoff. I would think there would be cases where 
they get compared enough for that to make a difference, although I suppose 
we'd really need benchmarks to see. OTOH, there are certainly cases (such as 
my original motivating case) where the extra space is not an issue at all.

But that does raise a point worth exploring: Maybe it should support both 
caching and non-caching behavior? I can think of at least two different ways 
to do this:

1. Provide a "useCache" template parameter. Problem with that though, is 
that the caching and non-caching versions end up being two distinct types.

2. Make "useCache" a runtime property. Since the cached foldingCase var is 
entirely private, the "useCache" flag could probably be encoded as a 
specific value of foldingCase.length and/or foldingCase.ptr. So the overhead 
of the non-cached version would only be size_t*2 instead of 
size_t*2+str.length.

One other idea: It's possible to change it so the folding case version is 
only created and cached when toHash is used. For other cases, it could just 
use icmp. So the storage space (beyond .length and .ptr) for the folding 
case version would never be allocated unless you're using it as an AA key 
(or some other use that needs hashing). Some benchmarking would be needed to 
be certain, but I think that would hit a nice sweet spot between space and 
speed, plus it wouldn't bother the user of the type with "cache or no 
cache?".

> Perhaps it's possible to create such types?
> CaseSensitiveString,
> CaseInsensitiveString,
> and the standard string type being unspecified.
>

I think the standard type can be assumed to be case-sensitive since that's 
how it already operates.




More information about the Digitalmars-d mailing list