const and phobos string functions

Reiner Pope some at address.com
Thu Aug 2 05:37:49 PDT 2007


Regan Heath wrote:
> Reiner Pope wrote:
>> Regan Heath wrote:
>>> Kirk McDonald wrote:
>>>> I submit the following Python idiom: Functions which mutate their 
>>>> arguments should return nothing. That is:
>>>>
>>>> // Return new string
>>>> string tolower(string);
>>>> // Mutate argument
>>>> void tolower(char[]);
>>>>
>>>> This rather strictly highlights the difference between char[] and 
>>>> string, and makes it essentially impossible to mix up library 
>>>> functions differentiated in this way.
>>>
>>> That's all well and good until you want to write:
>>>
>>> char[] s;
>>> ...
>>> foo(tolower(s));
>>>
>>> If you template tolower in the manner I described it's not possible 
>>> to mix it up and call the wrong one anyway (as it selects the correct 
>>> one based on the input) so it's a non-problem.
>>>
>>> Regan
>>
>> I think the code you gave here would be misleading. For instance, 
>> suppose you had your templated overloads, effectively giving you
>>
>>    // copy-on-write
>>    string tolower(string);
>>    // in-place
>>    char[] tolower(char[]);
>>
>> Then you would get surprising results if you did this:
>>
>> char[] s = "Hello World!".dup;
>> int a = howManyLettersDiffer(tolower(s), s);
>> assert (a == 2); // assert failed, a is actually 0
>>
>> I think avoiding this is exactly why Kirk suggested his overloads.
> 
> Good point.
> 
> Using Kirk's Python'esque function signatures you'd have:
> 
> char[] s = "Hello World!".dup;
> tolower(s);
> int a = howManyLettersDiffer(s, s);
> assert (a == 2); // assert failed, a is actually 0
> 
> which is immediately and obviously a pointless operation, requring a 
> re-code to:
> 
> char[] s = "Hello World!".dup;
> char[] s2 = s.dup
> tolower(s2);
> int a = howManyLettersDiffer(s2, s);
> assert (a == 2); // assert failed, a is actually 0
> 
> And the 'string' case:
> 
> string s = "Hello World!";
> int a = howManyLettersDiffer(tolower(s), s);
> assert (a == 2); // assert failed, a is actually 0
> 
> which would already behave as desired.
> 
> It seems that tolower is always going to have to 'copy on write' and 
> return that copy.
> 
> But, that doesn't stop you templating it so that it returns the copy as 
> char[] when you pass char[] (which is all I really wanted in the first 
> place) :)
> 
> Unfortunately as we cannot overload on return type it would prevent an 
> inplace tolower in the form (given by Kirk):
> 
> void tolower(char[])
> 
> Instead perhaps we need a naming convention for inplace modifying 
> functions, options:
> 
> void <func>Inplace(<args>)
> void <func>IP(<args>)       // "IP" stands for inplace
> void <func>M(<args>)        // "M" stands for mutates
> 
> or something like those.
> 
> Regan

A lot of functions just to do the string operation efficiently, though, 
isn't it? It's easy to see in cases like this why Walter cites the 
Pareto principle, saying mostly this isn't the bottleneck.

But still, isn't D supposed to give you fast code _easily_ ?

I must say, I *like* the simplicity of overloading with a mutable and a 
readonly version, although we have established that it can certainly 
lead to some confusion. Mind you, D arrays ignore that potential 
confusion (in the .sort and .reverse properties).

I also had some fancy ideas for a CoW wrapper, which looks something like:

struct CoW_Wrapper(T)
{
   const(T) val;
   private bool isMutable;

   void opAssign(T t)
   {
     val = t;
     isMutable = true;
   }

   void opAssign(const(T) t)
   {
     val = t;
     isMutable = false;
   }

   T mutable()
   {
     if (!isMutable)
     {
       val = val.dup;
       isMutable = true;
     }
     return cast(T) val;
   }
}

This would make using CoW quite simple, and would also solve the problem 
of keeping track across function boundaries of whether it's been duped, 
for things like:

    string s = ...;
    string s2 = s.tolower().replace("foo", "bar").entab();

which could potentially involve 3 dups. (I'm aware of the easier 
solution of templating to support char[]->char[] and string->string 
overloads, but let's have some fun going the extra way :-) )

You would write something like tolower as (ignoring unicode stuff):

alias CoW_Wrapper!(char[]) cow_string;

cow_string tolower(cow_string s)
{
   foreach (i; 0..s.val.length)
   {
     if (s.val[i] >= 'A' && s.val[i] <= 'Z')
       s.mutable[i] = s.val[i] + 'a' - 'A';
   }
   return s;
}

The neat feature is that the cow_string returned knows whether it 
isMutable, so you will always have the fewest required dups: 0 or 1.

Unfortunately, it doesn't currently look very nice because of the lack 
of opImplicitCast, which means you'd have to write:

   string s2 = s.tolower.val;

instead of

   string s2 = s.tolower;

(there's also problems with the type of s, but that can be solved with 
some template stuff on the tolower side; those template manipulations 
can also bypass the cow_string struct if the parameter type is char[], 
by defining
    char[] mutable(char[] s) { return s; }
)

   -- Reiner



More information about the Digitalmars-d mailing list