Is all this Invarient **** er... stuff, premature optimisation?
Me Here
p9e883002 at sneakemail.com
Mon Apr 28 16:14:36 PDT 2008
Walter Bright wrote:
>p9e883002 at sneakemail.com wrote:
>>Did I suggest this was an optimisation?
>
>You bring up a good point.
Sorry to have provoked you Walter, but thanks for your reply.
>On a tiny example such as yours, where you can see everything that is
>going on at a glance, such as where strings come from and where they are
>going, there isn't any point to immutable strings. You're right about that.
Well obviously the example was trivial to concentrate attemtion upon the
issue I was having.
> It's real easy to lose track of who owns a string, who else has references to the string, who has rights to change the string and who doesn't.
The keyword in there is "who". The problem is that you are pessimising the
entire language, once rightly famed for it's performance, for *all* users.
For the notional convenience of those few writing threaded applications.
Now don't go taking that the wrong way. In other circles, I am known as
"Mr. Threading". At least for my advocacy of them, if not my expertise.
Though I have been using threads for a relatively long time, going way
back to pre-1.0 OS/2 (then known internally as CP/DOS). Only mentioned to
show I'm not in the "thread is spelt f-o-r-k" camp.
>For example, you're changing the char[][] passed in to main(). What if one
>of those strings is a literal in the read-only data section?
Okay. So that begs the question of how does runtime external data end up
in a read-only data section? Of course, it can be done, but that then begs
the question: why? But let's ignore that for now and concentrate on the
development on my application that wants to mutate one or more of those
strings.
The first time I try to mutate one, I'm going to hit an error, either
compile tiime or runtime, and immediately know, assuming the error message
is reasonably unerstandable, that I need to make a copy of the immutable
to string into something I can mutate. A quick, *single* dup, and I'm away
and running.
Provided that I have the tools to do what I need that is. In this case,
and the entire point of the original post, that means a library of common
string manipulation functions that work on my good old fashioned char[]s
without my needing jump through the hoops of neo-orthodoxy to use them.
But, as I tried to point out in the post to whihc you replied, the whole
'args' thing is a red herring. It was simply a convenient source of
non-compile-time data. I couldn't get the std.stream example to compile.
Apparently due to a bug in the v2 libraries--see elsewhere.
In this particular case, I turned to D in order to manipulate 125,000,000
x 500 to 2000 byte strings. A dump of a inverted index DB. I usually do
this kinda stuff in a popular scripting language, but that proved to be
rather too slow for this volume of data. Each of those records needs to go
through multiple mutations. From uppercasing of certain fields; the
complete removal of certain characters within substantial subsets of each
record; to the recalculation and adjustment of an embedded hex digest
within each reord to reflect the preceeding changes. All told, each record
my go through anthing from 5 to 300 seprate mutations.
Doing this via immutable buffers is going to create scads and scads of
short-lived, immutable sub-elements that will just tax the GC to hell and
impose unnecessary and unacceptable time penalties on the process. And I
almost cerrtainly will have to go through the process many times before I
get the data in the untilmate form I need.
>So what happens is code starts defensively making copies of the string
>"just in case." I'll argue that in a complex program, you'll actually wind
>up making far more copies than you will with invariant strings.
>[from another post] I bet that, though, after a while they'll evolve to
>eschew it in favor of immutable strings. It's easier than arguing about it
You are so wrong here. I spent 2 of the worst years of my coding career
working in Java, and ended up fighting it all the way. Whilst some of that
was due to their sudden re-invention of major parts of the system
libraries in completly incompatible ways when the transition from (from
memory) 1.2 to 1.3 occured--and being forced to make the change because of
the near total abandonment of support or bug fixing for the 'old
libraries'. Another big part of the problem was the endless complexities
involved in switching between the String type and the StringBuffer type.
Please learn from history. Talk to (experienced) Java programmers. I mean
real working stiffs, not OO-purists from academia. Preferably some that
have experience of other languages also. It took untl v1.5 before the
performance of Java--and the dreaded GC pregnent pause--finally reached a
point where Java performance for manipulating large datasets was both
reasonable, and more importantly, reasonably determanistic. Don't make
their mistakes over.
Too many times in the last thirty years I've seen promising, pragmatic
software technologies tail off into academic obscurity because th primary
motivators suddenly "got religion". Whether OO dogma or functional purity
or whatever other flavour of neo-orthodoxy became flavour de jour, The
assumption that "they'll see the light eventually" has been the downfall
of many a promising start.
Just as the answer to the occasional hit-and-run death is not banning
cars, so fixing unintentional aliasing in threaded applications does not
lie in forcing all chaacter arrays to be immutable. For one reason, it
doesn't stop there. Character arrays, are just arrays of numbers. Exactly
the same problems arise with arrays of
--
More information about the Digitalmars-d
mailing list