COW vs. in-place.

Dave Dave_member at pathlink.com
Mon Jul 31 16:01:14 PDT 2006


Kirk McDonald wrote:
> Derek wrote:
>> On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
>>
>>
>>> Not a bad idea... The main prob. would be that there would be a lot 
>>> of duplication of code.
>>
>>
>> void toUpper_inplace(char[] x)
>> {
>>  . . .
>> }
>>
>> char[] toUpper(char[] x)
>> {
>>    char[] y = x.dup;
>>    toUpper_inplace(y);
>>    return y;
>> }
>>

With this one, you're always dup'ing instead of .dup'ing only when 
needed (the current one is actually more efficient).

> 
> I've got one better. Say we have a whole bunch of inplace string 
> functions, like the one above and this one:
> 
> void toLower_inplace(char[] x) {
>     // ...
> }
> 
> and others. Then we can:
> 
> char[] cow_func(alias fn)(char[] x) {
>     char[] y = x.dup;
>     fn(y);
>     return y;
> }
> 
> alias cow_func!(toUpper_inplace) toUpper;
> alias cow_func!(toLower_inplace) toLower;
> 
> Etc. Obviously, you'd have to provide a different template for each 
> function footprint, but the string library has a lot of repeated 
> footprints.
> 

I think to maximize code re-use you'd have to build the "COW or not to 
COW" logic into the "base" function. And if you did that you'd have to 
live with a little more function call overhead (passing a bool or small 
enum around) in order to avoid the defensive copying like in cow_func above.

I'm wondering - if Phobos would have been built that way (making it the 
'D way' of doing things), would all the concerns about GC performance 
and "const" have been so acute over the last year or so (hind-sight is 
always closer to 20-20 of course)?

The problem w/ all the dup'ing is when you put something like this in a 
tight loop you get sloooowwwww code:

import std.file, std.string, std.stdio;

void main()
{
   char[][] formatted;
   char[][] text = split(cast(char[])read("largefile.txt"), ".");
   foreach(char[] sentence; text)
   {
     formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
}

None of those functions (except for read()) would really have to do much 
allocating because the input file for all intents and purposes is 
read-only here (it won't get implicitly modified even if COW isn't used).

- Dave



More information about the Digitalmars-d mailing list