Why Strings as Classes?

Benji Smith dlanguage at benjismith.net
Tue Aug 26 03:23:41 PDT 2008


Walter Bright wrote:
> Benji Smith wrote:
>> You can't distribute your code as a DLL or a shared object, because 
>> the template instantiations won't be included (unless you create 
>> wrapper functions with explicit template instantiations, bloating your 
>> code size, but more importantly tripling the number of functions in 
>> your API).
> 
> Is the problem you're referring to the fact that there are 3 character 
> types?

Basically, yeah.

With three different character types, and two different array types 
(static & dynamic). And in D2, with const, invariant, and mutable types 
(and soon with shared and unshared), the number of ways of representing 
a "string" in the type-system is overwhelming.

This afternoon, I was writing some string-processing code that I intend 
to distribute in a library, and I couldn't but help thinking to myself 
"This code is probably broken, for anything but the most squeaky-clean 
ASCII text".

I don't mind that there are different character types, or that there are 
different character encodings. But I want to deal with those issues in 
exactly *one* place: in my string constructor (and, very rarely, during 
IO). But 99% of the time, I want to just think of the object as a 
String, with all the ugly details abstracted away.

>> Another good low-hanging argument is that strings are frequently used 
>> as keys in associative arrays. Every insertion and retrieval in an 
>> associative array requires a hashcode computation. And since D strings 
>> are just dumb arrays, they have no way of memoizing their hashcodes.
> 
> True, but I've written a lot of string processing programs (compilers 
> are just one example of such). This has never been an issue, because the 
> AA itself memoizes the hash, and from then on the dictionary handle is 
> used.

Cool. The hashcode-memoization thing was really just a catalyst to get 
me thinking. It's really at the periphery of my concerns with Strings.

>> We've already observed that D assoc arrays are less performant than 
>> even Python maps, so the extra cost of lookup operations is unwelcome.
> 
> Every one of those benchmarks that purported to show that D AA's were 
> relatively slow turned out to be, on closer examination, D running the 
> garbage collector more often than Python does. It had NOTHING to do with 
> the AA's.

Ah. Good point. Thanks for clarifying. I didn't remember all the 
follow-up details.

>> Most importantly, the contract between the regex engine and its 
>> consumers would provide a well-defined interface for processing text, 
>> regardless of the source or representation of that text.
> 
> I think a better solution is for regexp to accept an Iterator as its 
> source. That doesn't require polymorphic behavior via inheritance, it 
> can do polymorphism by value (which is what templates do).

That's a great idea.

I should clarify that my referring to an "interface" was in the informal 
sense. (Though I think actual interfaces would be a reasonable 
solution.) But any sort of contract between text-data-structures and 
text-processing-routines would fit the bill nicely.

>> But then again, I haven't used any of the const functionality in D2, 
>> so I can't actually comment on relative usability of compiler-enforced 
>> immutability versus interface-enforced immutability.
> 
>  From my own experience, I didn't 'get' invariant strings until I'd used 
> them for a while.

I actually kind of think I'm on the other side of the issue.

I've been primarily a Java programmer (8 years) and secondarily a C# 
programmer (3 years), so immutable Strings are the only thing I've ever 
used. Lots of the other JDK classes are like that, too.

So, from my perspective, it seems like the ideal, low-impact way of 
enforcing immutability is to have the classes enforce it on themselves. 
I've never felt the need for compiler-enforced const semantics in any of 
the work I've done.

Thanks for your replies! I always appreciate hearing from you.

--benji



More information about the Digitalmars-d mailing list