Why Strings as Classes?

Walter Bright newshound1 at digitalmars.com
Tue Aug 26 02:09:04 PDT 2008


Benji Smith wrote:
> For starters, with strings implemented as character arrays, writing 
> library code that accepts and operates on strings is a bit of a pain in 
> the neck, since you always have to write templates and template code is 
> slightly less readable than non-template code.
> You can't distribute your 
> code as a DLL or a shared object, because the template instantiations 
> won't be included (unless you create wrapper functions with explicit 
> template instantiations, bloating your code size, but more importantly 
> tripling the number of functions in your API).

Is the problem you're referring to the fact that there are 3 character 
types?

> Another good low-hanging argument is that strings are frequently used as 
> keys in associative arrays. Every insertion and retrieval in an 
> associative array requires a hashcode computation. And since D strings 
> are just dumb arrays, they have no way of memoizing their hashcodes.

True, but I've written a lot of string processing programs (compilers 
are just one example of such). This has never been an issue, because the 
AA itself memoizes the hash, and from then on the dictionary handle is used.


> We've already observed that D assoc arrays are less performant than even 
> Python maps, so the extra cost of lookup operations is unwelcome.

Every one of those benchmarks that purported to show that D AA's were 
relatively slow turned out to be, on closer examination, D running the 
garbage collector more often than Python does. It had NOTHING to do with 
the AA's.

> But much more important than either of those reasons is the lack of 
> polymorphism on character arrays. Arrays can't have subclasses, and they 
> can't implement interfaces.
> 
> A good example of what I'm talking about can be seen in the Phobos and 
> Tango regular expression engines. At least the Tango implementation 
> matches against all string types (the Phobos one only works with char[] 
> strings).
> 
> But what if I want to consume a 100 MB logfile, counting all lines that 
> match a pattern?
> 
> Right now, to use the either regex engine, I have to read the entire 
> logfile into an enormous array before invoking the regex search function.
> 
> Instead, what if there was a CharacterStream interface? And what if all 
> the text-handling code in Phobos & Tango was written to consume and 
> return instances of that interface?
> 
> A regex engine accepting a CharacterStream interface could process text 
> from string literals, file input streams, socket input streams, database 
> records, etc, etc, etc... without having to pollute the API with a bunch 
> of casts, copies, and conversions. And my logfile processing application 
> would consume only a tiny fraction of the memory needed by the character 
> array implementation.
> 
> Most importantly, the contract between the regex engine and its 
> consumers would provide a well-defined interface for processing text, 
> regardless of the source or representation of that text.

I think a better solution is for regexp to accept an Iterator as its 
source. That doesn't require polymorphic behavior via inheritance, it 
can do polymorphism by value (which is what templates do).

> 
> Along a similar vein, I've worked on a lot of parsers over the past few 
> years, for domain specific languages and templating engines, and stuff 
> like that. Sometimes it'd be very handy to define a "Token" class that 
> behaves exactly like a String, but with some additional behavior. 
> Ideally, I'd like to implement that Token class as an implementor of the 
> CharacterStream interface, so that it can be passed directly into other 
> text-handling functions.
> 
> But, in D, with no polymorphic text handling, I can't do that.

Templates are the ideal solution to that, and the more specific idiom is 
to use iterators.


> But then again, I haven't used any of the const functionality in D2, so 
> I can't actually comment on relative usability of compiler-enforced 
> immutability versus interface-enforced immutability.

 From my own experience, I didn't 'get' invariant strings until I'd used 
them for a while.



More information about the Digitalmars-d mailing list