COW vs. in-place.

Thu Aug 3 07:24:35 PDT 2006

Dave wrote:
> Reiner Pope wrote:
>>> Why not:
>>>
>>>     str = toupper(str);     // in-place
>>>     str = toupper(str.dup); // COW

What is the advantage of redundantly assigning the result of an in-place 
function to itself? In my opinion, all in-place functions should have a 
void return type to avoid common mistakes such as:

foreach(e; arr.reverse) { ... }
// OOPS, arr is now reversed

.dup followed by calling an in-place function is certainly ok, but in 
those cases, an ordinary functional (non-in-place) function would have 
been more efficient.

>> This is not copy on write. That is simply 'always copy', and this 
> 
> But presumably the user would only do the dup if they didn't want to 
> modify str, so CoW would basically go away as a design pattern.
> 
>> performs worse than COW (which in turn performs worse than in-place, 
>> if in-place is possible). Walter has also said earlier that, with COW, 
>> it should be the responsibility of the writer to ensure the copy, not 
>> the caller.
> 
> That's what I'm questioning ultimately. The caller knows best if the 
> object that _they created_ should be modified or copied and they can do 
> that best before a call to a modifying function. No matter if that 
> happens to be the developer of another lib. function or an application 
> programmer.
> 
> What's more, CoW for arrays is inconsistent with how other reference 
> objects are treated (class objects are really not made for CoW - there's 
> not even a rudimentary copy ctor provided by the language. Same with 
> AA's, which don't have a .dup for example).

> 
> Ultimately, most data that is modified is used modified for its 
> remaining program "lifetime", and however the original data was sourced 
> (e.g.: reading from disk) can be replicated if needed instead of having 
> to keep copies around.

> 
> I think CoW for arrays was a mistake -- it is most often unnecessary, 
> will cause D to repeat many of Java's performance woes for the average 
> user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.

Consider the following (just made up) case insensitive multi-file word 
count application:

import std.stdio;
import std.file;
import std.string;

void main(char[][] args) {
         int[char[]] wc;
         foreach(filename; args[1..$]) {
                 char[] data = cast(char[]) read(filename);
                 foreach(word; data.split())
                         wc[tolower(word)]++;
         }
         writefln("num words: ",wc.length);
}

If you ran this program on the full collection of 18000 Gutenberg books, 
you would inevitably run out of memory. Why would you do that when a 
standard English dictionary only occupies a couple of megabytes?

Without knowing the intricate details of D and Phobos, I bet you would 
have no way of knowing that you got killed by the cow. :)

/Oskar