RFC: Case-Insensitive Strings (And usually they really do *have*case)
Nick Sabalausky
a at a.a
Mon Jan 10 10:46:55 PST 2011
"Jim" <bitcirkel at yahoo.com> wrote in message
news:igfado$11g3$1 at digitalmars.com...
>> While writing and dealing with all that code I realized something: While
>> programmers are usually heavily conditioned to think of case-sensitivity
>> as
>> an attribute of the comparison, it's very frequent that the deciding
>> factor
>> in which comparison to use is *not* the comparison itself but *what* gets
>> compared. And in those cases, you have to use the awful strategy of
>> "relying
>> on convention" to make sure you get it right in *every* place that
>> particular data gets compared.
>
>
> You have a point. Your case-sensitivity-aware string types will guarantee
> correctness in a large and complex program. I like that. Ideally though,
> they would only be compile-time constraints (i.e. not carrying any other
> data).
>
Not carrying any other data means not caching the lowercase version, which
means recreating the lowercase version more than necessary. So it's the
classic speed vs. space tradeoff. I would think there would be cases where
they get compared enough for that to make a difference, although I suppose
we'd really need benchmarks to see. OTOH, there are certainly cases (such as
my original motivating case) where the extra space is not an issue at all.
But that does raise a point worth exploring: Maybe it should support both
caching and non-caching behavior? I can think of at least two different ways
to do this:
1. Provide a "useCache" template parameter. Problem with that though, is
that the caching and non-caching versions end up being two distinct types.
2. Make "useCache" a runtime property. Since the cached foldingCase var is
entirely private, the "useCache" flag could probably be encoded as a
specific value of foldingCase.length and/or foldingCase.ptr. So the overhead
of the non-cached version would only be size_t*2 instead of
size_t*2+str.length.
One other idea: It's possible to change it so the folding case version is
only created and cached when toHash is used. For other cases, it could just
use icmp. So the storage space (beyond .length and .ptr) for the folding
case version would never be allocated unless you're using it as an AA key
(or some other use that needs hashing). Some benchmarking would be needed to
be certain, but I think that would hit a nice sweet spot between space and
speed, plus it wouldn't bother the user of the type with "cache or no
cache?".
> Perhaps it's possible to create such types?
> CaseSensitiveString,
> CaseInsensitiveString,
> and the standard string type being unspecified.
>
I think the standard type can be assumed to be case-sensitive since that's
how it already operates.
More information about the Digitalmars-d
mailing list