COW vs. in-place.

Thu Aug 3 09:21:16 PDT 2006

Sean Kelly wrote:
> Oskar Linde wrote:
>> Dave wrote:
>>> Reiner Pope wrote:
>>>>> Why not:
>>>>>
>>>>>     str = toupper(str);     // in-place
>>>>>     str = toupper(str.dup); // COW
>>
>> What is the advantage of redundantly assigning the result of an 
>> in-place function to itself? In my opinion, all in-place functions 
>> should have a void return type to avoid common mistakes such as:
>>
>> foreach(e; arr.reverse) { ... }
>> // OOPS, arr is now reversed
> 
> I like returning the mutated value so the function call can be embedded 
> in other code.  

I have already seen the above foreach error in others D code.
I believe it is good library design to clearly mark functions with 
side-effects. Giving them a void return type will prevent any mistake of 
the following kind (assume toupper is in-place modifying as well as 
returning):

func(toupper(mystring));
func(arr.reverse);

where the side effect was unintended.
could those be errors: ?

arr2 = arr1.reverse;

toupper(mystring) ~ mystring;

> And arr.reverse is already a built-in mutating function, 
> according to the spec.

Yes. I find that unfortunate and inconsistent with how Phobos is 
designed. Luckily, arr.sort and arr.reverse are not callable as 
arr.sort() and arr.reverse(), so they really don't look like functions.

>> .dup followed by calling an in-place function is certainly ok, but in 
>> those cases, an ordinary functional (non-in-place) function would have 
>> been more efficient.
> 
> Why?

What I meant was that .dup + inplace will never be more efficient than a 
copying algorithm. In-place algorithms are often more complicated. If 
you want a copy anyway, it is more efficient to use a copying algorithm. 
As an example, consider stable sorting, where efficient copying 
algorithms are trivial.

Re: Library design

I would like to see both copying and in-place versions of algorithms 
where it makes sense, but only one behavior should be default. That 
default should be consistent throughout the standard library and 
preferably be recommended in an official style guide for third party 
libraries to follow.

I see two valid designs:

1. in-place default, copying algorithms specially named
-------------------------------------------------------

Design:
void toUpper(char[] str); // in-place
char[] toUpperCopy(char[] str); // copy

Pros:
* in-place is often more efficient and therefore default.
* many functions are imperative verbs, and as such one expects them to 
be modifying
* Similar to how the C++ STL is designed
Cons:
* many functions can not be expressed in-place (example: UTF-8 toUpper)

2. copying default, in-place versions specially named
-----------------------------------------------------

Design:
void toUpperInPlace(char[] str); // in-place
char[] toUpper(char[] str); // copy

Pros:
* copying is safer, and is therefore a better default
* in-place is an optimization and would stand out as such
* default is functional (no-side effects), side effects stand out
* people used to functional style programming would not find any
surprises
* all functions can be defined as copying functions
* how many popular languages are designed (Ruby, Python, php, all 
"functional" languages, etc...)
Cons:
* could confuse people, lead to silent errors:
toupper(str); // doesn't change str
cos(x); // doesn't change x ;)

For the record, I am in favor of number 2 and that would have biased the 
arguments above.

/Oskar