Is all this Invarient **** er... stuff, premature optimisation?
Me Here
p9e883002 at sneakemail.com
Mon Apr 28 17:04:06 PDT 2008
Walter Bright wrote:
>p9e883002 at sneakemail.com wrote:
>>Did I suggest this was an optimisation?
>
>You bring up a good point.
Sorry to have provoked you Walter, but thanks for your reply.
>On a tiny example such as yours, where you can see everything that is
>going on at a glance, such as where strings come from and where they are
>going, there isn't any point to immutable strings. You're right about that.
Well obviously the example was trivial to concentrate attention upon the
issue I was having.
> It's real easy to lose track of who owns a string, who else has references to the string, who has rights to change the string and who doesn't.
The keyword in there is "who". The problem is that you are pessimising the
entire language, once rightly famed for it's performance, for *all* users.
For the notional convenience of those few writing threaded applications.
Now don't go taking that the wrong way. In other circles, I am known as
"Mr. Threading". At least for my advocacy of them, if not my expertise.
Though I have been using threads for a relatively long time, going way
back to pre-1.0 OS/2 (then known internally as CP/DOS). Only mentioned to
show I'm not in the "thread is spelt f-o-r-k" camp.
>For example, you're changing the char[][] passed in to main(). What if one
>of those strings is a literal in the read-only data section?
Okay. So that begs the question of how does runtime external data end up
in a read-only data section? Of course, it can be done, but that then begs
the question: why? But let's ignore that for now and concentrate on the
development on my application that wants to mutate one or more of those
strings.
The first time I try to mutate one, I'm going to hit an error, either
compile time or runtime, and immediately know, assuming the error message
is reasonably understandable, that I need to make a copy of the immutable
to string into something I can mutate. A quick, *single* dup, and I'm away
and running.
Provided that I have the tools to do what I need that is. In this case,
and the entire point of the original post, that means a library of common
string manipulation functions that work on my good old fashioned char[]s
without my needing jump through the hoops of neo-orthodoxy to use them.
But, as I tried to point out in the post to which you replied, the whole
'args' thing is a red herring. It was simply a convenient source of
non-compile-time data. I couldn't get the std.stream example to compile.
Apparently due to a bug in the v2 libraries--see elsewhere.
In this particular case, I turned to D in order to manipulate 125,000,000
x 500 to 2000 byte strings. A dump of a inverted index DB. I usually do
this kinda stuff in a popular scripting language, but that proved to be
rather too slow for this volume of data. Each of those records needs to go
through multiple mutations. From uppercasing of certain fields; the
complete removal of certain characters within substantial subsets of each
record; to the recalculation and adjustment of an embedded hex digest
within each record to reflect the preceding changes. All told, each record
my go through anything from 5 to 300 separate mutations.
Doing this via immutable buffers is going to create scads and scads of
short-lived, immutable sub-elements that will just tax the GC to hell and
impose unnecessary and unacceptable time penalties on the process. And I
almost certainly will have to go through the process many times before I
get the data in the ultimate form I need.
>So what happens is code starts defensively making copies of the string
>"just in case." I'll argue that in a complex program, you'll actually wind
>up making far more copies than you will with invariant strings.
>[from another post] I bet that, though, after a while they'll evolve to
>eschew it in favor of immutable strings. It's easier than arguing about it
You are so wrong here. I spent 2 of the worst years of my coding career
working in Java, and ended up fighting it all the way. Whilst some of that
was due to their sudden re-invention of major parts of the system
libraries in completely incompatible ways when the transition from (from
memory) 1.2 to 1.3 occurred--and being forced to make the change because
of the near total abandonment of support or bug fixing for the 'old
libraries'. Another big part of the problem was the endless complexities
involved in switching between the String type and the StringBuffer type.
Please learn from history. Talk to (experienced) Java programmers. I mean
real working stiffs, not OO-purists from academia. Preferably some that
have experience of other languages also. It took until v1.5 before the
performance of Java--and the dreaded GC pregnant pause--finally reached a
point where Java performance for manipulating large datasets was both
reasonable, and more importantly, reasonably deterministic. Don't make
their mistakes over.
Too many times in the last thirty years I've seen promising, pragmatic
software technologies tail off into academic obscurity because th primary
motivators suddenly "got religion". Whether OO dogma or functional purity
or whatever other flavour of neo-orthodoxy became flavour de jour, The
assumption that "they'll see the light eventually" has been the downfall
of many a promising start.
Just as the answer to the occasional hit-and-run death is not banning
cars, so fixing unintentional aliasing in threaded applications does not
lie in forcing all character arrays to be immutable.
For one reason, it doesn't stop there. Character arrays, are just arrays
of numbers. Exactly the same problems arise with arrays of integers,
reals, associative arrays. etc. Imagine the costs of duplicating an entire
hash every time you add a new key or alter a value. The penalties grow
exponentially with the size of the hash (array of ints, longs, reals ...).
And before you reject this notion on the basis that "I'd never do that",
what's the difference? Are strings any more vulnerable to the problems
invariance is meant to tackle that these other datatypes?
Try manipulating large datasets--images, DNA data, signal processing,
finite element analysis; any of the types of applications for which
multi-threading isn't just a way allow the program to do something useful
while the user decides which button to click--in any of the "referentially
transparent" languages that are concurrency capable and see the hoops you
have to leap through to achieve anything like descent performance. Eg.
Haskell Unsafe* library routines (Basically, abandon referential
transparency for this data so that we can get something done in a
reasonable time frame!). Look for "If you can match 1-core C speed using
4-core Haskell parallelism without "unsafe pseudo-C in Haskell" trickery,
I will be impressed. ..." in the following article:
http://reddit.com/r/programming/info/61p6f/comments/
The abandonment or deprecation of lvalue slices on string types is the
thin end of the wedge toward referential transparency and despite all the
academic hype and impressive (small scale) demos of the 'match made in
heaven' that is 'referential transparency & concurrency', try to seek out
real-world examples of the combination running in real-world environments.
Ie. Where someone other than the tax-payer of whatever country is paying
for the development, and the time pressure to obtain the results are a
little more demanding than Thesis submission date and you'll find them
very conspicuous by their absence.
Such ideas look great on paper, in the heady world of ideal Turing
Machines with unlimited length tapes (unbounded memory). But once you
bring them back to the real world of finite RAM, fragmentable heaps and
GC, they becomes impractical. Unworkable for real data sets in real time.
Don't feel the need to argue this on-forum. If it hasn't persuaded you
that forcing invariance upon one datatype, through providing a string
library that only work with invariant strings, will do little to address
the problems it attempts to solve, then I doubt further discussion will.
Please return to the pragmatism that so stood out in your early visions
for D and abandon this folly before, as with so many of the follies of the
gentleman academic of yore, it becomes a life-long quest ending up as a
memorial or tombstone.
Cheers, b.
--
More information about the Digitalmars-d
mailing list