COW vs. in-place.
Oskar Linde
oskar.lindeREM at OVEgmail.com
Thu Aug 3 07:24:35 PDT 2006
Dave wrote:
> Reiner Pope wrote:
>>> Why not:
>>>
>>> str = toupper(str); // in-place
>>> str = toupper(str.dup); // COW
What is the advantage of redundantly assigning the result of an in-place
function to itself? In my opinion, all in-place functions should have a
void return type to avoid common mistakes such as:
foreach(e; arr.reverse) { ... }
// OOPS, arr is now reversed
.dup followed by calling an in-place function is certainly ok, but in
those cases, an ordinary functional (non-in-place) function would have
been more efficient.
>> This is not copy on write. That is simply 'always copy', and this
>
> But presumably the user would only do the dup if they didn't want to
> modify str, so CoW would basically go away as a design pattern.
>
>> performs worse than COW (which in turn performs worse than in-place,
>> if in-place is possible). Walter has also said earlier that, with COW,
>> it should be the responsibility of the writer to ensure the copy, not
>> the caller.
>
> That's what I'm questioning ultimately. The caller knows best if the
> object that _they created_ should be modified or copied and they can do
> that best before a call to a modifying function. No matter if that
> happens to be the developer of another lib. function or an application
> programmer.
>
> What's more, CoW for arrays is inconsistent with how other reference
> objects are treated (class objects are really not made for CoW - there's
> not even a rudimentary copy ctor provided by the language. Same with
> AA's, which don't have a .dup for example).
>
> Ultimately, most data that is modified is used modified for its
> remaining program "lifetime", and however the original data was sourced
> (e.g.: reading from disk) can be replicated if needed instead of having
> to keep copies around.
>
> I think CoW for arrays was a mistake -- it is most often unnecessary,
> will cause D to repeat many of Java's performance woes for the average
> user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
Consider the following (just made up) case insensitive multi-file word
count application:
import std.stdio;
import std.file;
import std.string;
void main(char[][] args) {
int[char[]] wc;
foreach(filename; args[1..$]) {
char[] data = cast(char[]) read(filename);
foreach(word; data.split())
wc[tolower(word)]++;
}
writefln("num words: ",wc.length);
}
If you ran this program on the full collection of 18000 Gutenberg books,
you would inevitably run out of memory. Why would you do that when a
standard English dictionary only occupies a couple of megabytes?
Without knowing the intricate details of D and Phobos, I bet you would
have no way of knowing that you got killed by the cow. :)
/Oskar
More information about the Digitalmars-d
mailing list