Is all this Invarient **** er... stuff, premature optimisation?

Mon Apr 28 16:14:36 PDT 2008

Walter Bright wrote:

>p9e883002 at sneakemail.com wrote:

>>Did I suggest this was an optimisation?
>
>You bring up a good point.

Sorry to have provoked you Walter, but thanks for your reply.

>On a tiny example such as yours, where you can see everything that is 
>going on at a glance, such as where strings come from and where they are 
>going, there isn't any point to immutable strings. You're right about that.

Well obviously the example was trivial to concentrate attemtion upon the 
issue I was having.

>  It's real easy to lose track of who owns a string, who else has references to the string, who has rights to change the string and who doesn't.

The keyword in there is "who". The problem is that you are pessimising the 
entire language, once rightly famed for it's performance, for *all* users. 
For the notional convenience of those few writing threaded applications. 
Now don't go taking that the wrong way. In other circles, I am known as 
"Mr. Threading". At least for my advocacy of them, if not my expertise. 
Though I have been using threads for a relatively long time, going way 
back to pre-1.0 OS/2 (then known internally as CP/DOS). Only mentioned to 
show I'm not in the "thread is spelt f-o-r-k" camp.

>For example, you're changing the char[][] passed in to main(). What if one 
>of those strings is a literal in the read-only data section?

Okay. So that begs the question of how does runtime external data end up 
in a read-only data section? Of course, it can be done, but that then begs 
the question: why? But let's ignore that for now and concentrate on the 
development on my application that wants to mutate one or more of those 
strings.

The first time I try to mutate one, I'm going to hit an error, either 
compile tiime or runtime, and immediately know, assuming the error message 
is reasonably unerstandable, that I need to make a copy of the immutable 
to string into something I can mutate. A quick, *single* dup, and I'm away 
and running.

Provided that I have the tools to do what I need that is. In this case, 
and the entire point of the original post, that means a library of common 
string manipulation functions that work on my good old fashioned char[]s 
without my needing jump through the hoops of neo-orthodoxy to use them.

But, as I tried to point out in the post to whihc you replied, the whole 
'args' thing is a red herring. It was simply a convenient source of 
non-compile-time data. I couldn't get the std.stream example to compile. 
Apparently due to a bug in the v2 libraries--see elsewhere.

In this particular case, I turned to D in order to manipulate 125,000,000 
x 500 to 2000 byte strings. A dump of a inverted index DB. I usually do 
this kinda stuff in a popular scripting language, but that proved to be 
rather too slow for this volume of data. Each of those records needs to go 
through multiple mutations. From uppercasing of certain fields; the 
complete removal of certain characters within substantial subsets of each 
record; to the recalculation and adjustment of an embedded hex digest 
within each reord to reflect the preceeding changes. All told, each record 
my go through anthing from 5 to 300 seprate mutations.

Doing this via immutable buffers is going to create scads and scads of 
short-lived, immutable sub-elements that will just tax the GC to hell and 
impose unnecessary and unacceptable time penalties on the process. And I 
almost cerrtainly will have to go through the process many times before I 
get the data in the untilmate form I need.

>So what happens is code starts defensively making copies of the string 
>"just in case." I'll argue that in a complex program, you'll actually wind 
>up making far more copies than you will with invariant strings.
>[from another post] I bet that, though, after a while they'll evolve to 
>eschew it in favor of immutable strings. It's easier than arguing about it

You are so wrong here. I spent 2 of the worst years of my coding career 
working in Java, and ended up fighting it all the way. Whilst some of that 
was due to their sudden re-invention of major parts of the system 
libraries in completly incompatible ways when the transition from (from 
memory) 1.2 to 1.3 occured--and being forced to make the change because of 
the near total abandonment of support or bug fixing for the 'old 
libraries'. Another big part of the problem was the endless complexities 
involved in switching between the String type and the StringBuffer type.

Please learn from history. Talk to (experienced) Java programmers. I mean 
real working stiffs, not OO-purists from academia. Preferably some that 
have experience of other languages also. It took untl v1.5 before the 
performance of Java--and the dreaded GC pregnent pause--finally reached a 
point where Java performance for manipulating large datasets was both 
reasonable, and more importantly, reasonably determanistic. Don't make 
their mistakes over.

Too many times in the last thirty years I've seen promising, pragmatic 
software technologies tail off into academic obscurity because th primary 
motivators suddenly "got religion". Whether OO dogma or functional purity 
or whatever other flavour of neo-orthodoxy became flavour de jour, The 
assumption that "they'll see the light eventually" has been the downfall 
of many a promising start.

Just as the answer to the occasional hit-and-run death is not banning 
cars, so fixing unintentional aliasing in threaded applications does not 
lie in forcing all chaacter arrays to be immutable. For one reason, it 
doesn't stop there. Character arrays, are just arrays of numbers. Exactly 
the same problems arise with arrays of

--