COW vs. in-place.

Dave Dave_member at pathlink.com
Mon Jul 31 19:50:51 PDT 2006


Derek Parnell wrote:
> On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:
> 
>> Kirk McDonald wrote:
>>> Derek wrote:
>>>> On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
>>>>
>>>>
>>>>> Not a bad idea... The main prob. would be that there would be a lot 
>>>>> of duplication of code.
>>>>
>>>> void toUpper_inplace(char[] x)
>>>> {
>>>>  . . .
>>>> }
>>>>
>>>> char[] toUpper(char[] x)
>>>> {
>>>>    char[] y = x.dup;
>>>>    toUpper_inplace(y);
>>>>    return y;
>>>> }
>>>>
>> With this one, you're always dup'ing instead of .dup'ing only when 
>> needed (the current one is actually more efficient).
> 
> I'm getting confused about what you are after now, sorry. 
> 
> It seems that you are wanting a CoW version, an InPlace version, and a
> non-Destructive version of each function and let the compiler and/or the
> author choose the best one for the job at hand.
> 
> The example about gave the InPlace and non-destructive versoins and the
> current version is CoW. 
> 
> ...
> 
>> The problem w/ all the dup'ing is when you put something like this in a 
>> tight loop you get sloooowwwww code:
> 
> Not if the author has a choice ...
>  
> import std.file, std.string, std.stdio;
> 
> void main()
> {
>    char[][] formatted;
>    char[][] text = split(cast(char[])read("largefile.txt"), ".");
>    foreach(char[] sentence; text)
>    {
>      strip_IP(sentence);
>      tolower_IP(sentence);
>      capitalize_IP(sentence);
>      formatted ~= sentence ~ ".\r\n";
>    }
>    //...
>    foreach(char[] sentence; formatted)
>    {
>      writefln(sentence);
>    }
> }
> 
> 

Sorry, I think some of that got lost in the thread...

I'm asking if it would make sense to change the current functions so COW is optional. That way 
current code wouldn't be broken but we'd have the choice.

For example, the current tolower w/ the changes added (denoted by **):

//** char[] tolower(char[] s)
char[] tolower(char[] s, bool cow = true)
//**
{
     int changed;
     int i;
     char[] r = s;

     changed = 0;
     for (i = 0; i < s.length; i++)
     {
         auto c = s[i];
         if ('A' <= c && c <= 'Z')
         {
             //**if (!changed)
             if (cow && !changed)
             //**
             {   r = s.dup;
                 changed = 1;
             }
             r[i] = c + (cast(char)'a' - 'A');
         }
         else if (c >= 0x7F)
         {
             foreach (size_t j, dchar dc; s[i .. length])
             {
                 //**if (!changed)
                 if (cow && !changed)
                 //**
                 {
                     if (!std.uni.isUniUpper(dc))
                         continue;

                     r = s[0 .. i + j].dup;
                     changed = 1;
                 }
                 dc = std.uni.toUniLower(dc);
                 std.utf.encode(r, dc);
             }
             break;
         }
     }
     return r;
}

So the sample code would become:

import std.file, std.string, std.stdio;

void main()
{
   char[][] formatted;
   char[][] text = split(cast(char[])read("largefile.txt"), ".");
   foreach(char[] sentence; text)
   {
     formatted ~= capitalize(tolower(strip(sentence, false), false), false) ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
}

Then I suggested either make the cow parameter default to false, or wondered how things would have 
worked out if the original data owner became responsible for there own dups:

void main()
{
   char[][] formatted;
   char[] original = cast(char[])read("largefile.txt").dup; //**
   char[][] text = split(original, ".");
   foreach(char[] sentence; text)
   {
     formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
   //** The 'original' (duplicated, unmodified) data is used again here
}

If everything was done inplace in Phobos, then it would become 2nd nature for the owner to dup when 
needed. And the user wouldn't need to rely on the hope that the library developer didn't make a 
mistake and forget to COW when they were supposed to.

Thanks,

- Dave



More information about the Digitalmars-d mailing list