COW vs. in-place.

Thu Aug 3 13:00:20 PDT 2006

Oskar Linde wrote:
> Dave wrote:
>> Reiner Pope wrote:
>>>> Why not:
>>>>
>>>>     str = toupper(str);     // in-place
>>>>     str = toupper(str.dup); // COW
> 
> What is the advantage of redundantly assigning the result of an in-place 

No advantage - the poster was just using the example from the OP. And what the OP example was 
showing is that the way it is now (CoW), the coder (often) ends-up assigning the results back to the 
original string reference, in which case the .dup inside toupper is a total waste.

     writefln(toupper(str));             // in-place

     writefln("Uppercase string: ", toupper(str.dup));
     writefln("Original string:  ", str);

> function to itself? In my opinion, all in-place functions should have a 
> void return type to avoid common mistakes such as:
> 

     writefln(toupper(str));             // function chain

Many of C's string functions do this too.

> foreach(e; arr.reverse) { ... }
> // OOPS, arr is now reversed
> 
> .dup followed by calling an in-place function is certainly ok, but in 
> those cases, an ordinary functional (non-in-place) function would have 
> been more efficient.
> 

If the programmer needs to keep a copy of the original, the way toupper/tolower/etc is done now is 
more efficient only in the case where the data was not modified.

My argument is that most often when data is modified at some point in a program, it is because the 
rest of the program needs the modified version and not a copy of the original (so defensive .dups 
won't be done anyhow).

>>
>> I think CoW for arrays was a mistake -- it is most often unnecessary, 
>> will cause D to repeat many of Java's performance woes for the average 
>> user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
> 
> Consider the following (just made up) case insensitive multi-file word 
> count application:
> 
> import std.stdio;
> import std.file;
> import std.string;
> 
> void main(char[][] args) {
>         int[char[]] wc;
>         foreach(filename; args[1..$]) {
>                 char[] data = cast(char[]) read(filename);
>                 foreach(word; data.split())
>                         wc[tolower(word)]++;
>         }
>         writefln("num words: ",wc.length);
> }
> 
> If you ran this program on the full collection of 18000 Gutenberg books, 
> you would inevitably run out of memory. Why would you do that when a 
> standard English dictionary only occupies a couple of megabytes?
> 
> Without knowing the intricate details of D and Phobos, I bet you would 
> have no way of knowing that you got killed by the cow. :)
> 

Exactly my point and great example. It's that kind of stuff that is really tough on a newbie trying 
to get the most out of a high-performance language.

IMHO, it's not too big of a leap for a beginner to suspect that data will be modified when they pass 
a byref argument into a function like toupper. If 'in-place' is clearly documented then I don't see 
a problem.

- Dave

> /Oskar