Why Strings as Classes?
Benji Smith
dlanguage at benjismith.net
Tue Aug 26 03:23:41 PDT 2008
Walter Bright wrote:
> Benji Smith wrote:
>> You can't distribute your code as a DLL or a shared object, because
>> the template instantiations won't be included (unless you create
>> wrapper functions with explicit template instantiations, bloating your
>> code size, but more importantly tripling the number of functions in
>> your API).
>
> Is the problem you're referring to the fact that there are 3 character
> types?
Basically, yeah.
With three different character types, and two different array types
(static & dynamic). And in D2, with const, invariant, and mutable types
(and soon with shared and unshared), the number of ways of representing
a "string" in the type-system is overwhelming.
This afternoon, I was writing some string-processing code that I intend
to distribute in a library, and I couldn't but help thinking to myself
"This code is probably broken, for anything but the most squeaky-clean
ASCII text".
I don't mind that there are different character types, or that there are
different character encodings. But I want to deal with those issues in
exactly *one* place: in my string constructor (and, very rarely, during
IO). But 99% of the time, I want to just think of the object as a
String, with all the ugly details abstracted away.
>> Another good low-hanging argument is that strings are frequently used
>> as keys in associative arrays. Every insertion and retrieval in an
>> associative array requires a hashcode computation. And since D strings
>> are just dumb arrays, they have no way of memoizing their hashcodes.
>
> True, but I've written a lot of string processing programs (compilers
> are just one example of such). This has never been an issue, because the
> AA itself memoizes the hash, and from then on the dictionary handle is
> used.
Cool. The hashcode-memoization thing was really just a catalyst to get
me thinking. It's really at the periphery of my concerns with Strings.
>> We've already observed that D assoc arrays are less performant than
>> even Python maps, so the extra cost of lookup operations is unwelcome.
>
> Every one of those benchmarks that purported to show that D AA's were
> relatively slow turned out to be, on closer examination, D running the
> garbage collector more often than Python does. It had NOTHING to do with
> the AA's.
Ah. Good point. Thanks for clarifying. I didn't remember all the
follow-up details.
>> Most importantly, the contract between the regex engine and its
>> consumers would provide a well-defined interface for processing text,
>> regardless of the source or representation of that text.
>
> I think a better solution is for regexp to accept an Iterator as its
> source. That doesn't require polymorphic behavior via inheritance, it
> can do polymorphism by value (which is what templates do).
That's a great idea.
I should clarify that my referring to an "interface" was in the informal
sense. (Though I think actual interfaces would be a reasonable
solution.) But any sort of contract between text-data-structures and
text-processing-routines would fit the bill nicely.
>> But then again, I haven't used any of the const functionality in D2,
>> so I can't actually comment on relative usability of compiler-enforced
>> immutability versus interface-enforced immutability.
>
> From my own experience, I didn't 'get' invariant strings until I'd used
> them for a while.
I actually kind of think I'm on the other side of the issue.
I've been primarily a Java programmer (8 years) and secondarily a C#
programmer (3 years), so immutable Strings are the only thing I've ever
used. Lots of the other JDK classes are like that, too.
So, from my perspective, it seems like the ideal, low-impact way of
enforcing immutability is to have the classes enforce it on themselves.
I've never felt the need for compiler-enforced const semantics in any of
the work I've done.
Thanks for your replies! I always appreciate hearing from you.
--benji
More information about the Digitalmars-d
mailing list