Is all this Invarient **** er... stuff, premature optimisation?

Mon Apr 28 21:05:36 PDT 2008

Me Here wrote:
> Janice Caron wrote:
>> Functions don't overload on return value.
> They don't? Why not? Seems like a pretty obvious step to me.

Type inference in D is done "bottom up". Doing overloading based on 
function return type is "top down". Trying to get both schemes to 
coexist is a hard problem.

>>>  The idea that runtime obtained or derived strings can be made truly
>>>  invariant is purely theoretical.
>> But the fact that someone else might be sharing the data is not.
> By "someone else" you mean 'another thread'?

No, it could be the same thread, via another alias to the same data. 
Using invariant strings allows the programmer to treat them as if they 
were value types and being copied for every use (like ints are), except 
they don't need to be actually copied.

With mutable strings, one always has to be careful to keep track of who 
'owns' the string, and who has references to it. When mutating the 
string, one must manually ensure that there are no other references to 
it that would be surprised by the data changing. For example, if you 
insert a string into a symbol table, and then later some other reference 
to that string changes it, it could wind up corrupting the symbol table.

The point about the main(char[][] args) and modifying those strings 
in-place is very valid - nothing is said about where those strings 
actually reside, and who else may have references to the same data, and 
whether you can modify them with impunity or not. You could argue "this 
should be better documented" and you'd be right, but if the declaration 
instead said main(invariant(char[])args) then I *know* that I am not 
allowed to change them, and whoever calls main() *knows* that those arg 
strings won't get changed. We can both sleep comfortably.

Invariant strings offer a guarantee that the data won't change, which 
clarifies the API of the functions. (Whenever I see an API function that 
takes a char*, say putenv(), it rarely says whether it saves a copy of 
the data or saves a copy of the reference. That just sucks.)

> If so, then if that is a possibility, if my code is using threads, then 
> I, the programmer,
> will be aware of that  and will be able to take appropriate choices.
> 
> I /might/ chose to use invariance to 'protect' this particular piece of 
> data from the problems
> of shared state concurrency--if there is any possibility that I intend 
> to shared this particular piece of data.
> But in truth, it is very unlikely that I *will* make /that/ choice. 
> Here's why.
> 
> What does it mean to make and hold multiple (mutated) copies of a single 
> entity?
> 
> That is, I obtain a piece of data from somewhere and make it invariant.
> Somehow two threads obtain references to that piece of data.
> If none of them attempt to change it, then it makes no difference that 
> it is marked invariant.
> If however, one of them is programmed to change it, then it now has a 
> different,
> version of that entity to the other thread. But what does that mean? Who 
> has the 'right' version?
> 
> Show me a real situation where two threads can legitimately be making 
> disparate modifications to a single entity,
> string or otherwise, and I'll show you a programming error. Once two 
> threads make disparate modifications to an entity,
> they are separate entities. And they should have been given copies, not 
> references to a single copy, in the first place.
> 
> If the intent is that the share a single entity, then any legitimate 
> modifications to that single entity should be reflected
> in the views of that single entity by both threads. And therefore 
> subjected to locking, or STM or whatever mechanism is
> used to control that modification.
> 
> This whole thing of invariance and concurrency seems to be aimed at 
> enabling the use of COW.

Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at 
its core?

> And if that is the case, and I very much hope it isn't, then let me tell 
> you as someone who is intimately familiar with the
> one existing system that wen this route (iThreads: look'em up), that it 
> is a total disaster,

ithreads copies the entire user data per thread. Using invariant is, of 
course, a way to avoid copying the data.

> The whole purpose and advantage of multi-threading, over 
> multi-processing, is (mutable) shared state. And the elimination of
> costs of serialisation and narrow bandwidth if IPC in the forking 
> concurrency mode. Attempting to emulate that model
> using threading gives few of its advantages, all of its disadvantages, 
> and throws away all of the advantages of threading.
> It is a complete and utter waste of time and effort.

I can agree with that.