const and phobos string functions

Kirk McDonald kirklin.mcdonald at gmail.com
Thu Aug 2 02:35:45 PDT 2007


Regan Heath wrote:
> Kirk McDonald wrote:
>> Regan Heath wrote:
>>> Note in the code below 'remove' is a template function which uses 
>>> 'memmove' to remove an item from the array.
>>>
>>> (The code):
>>>
>>> char[][] tmp;
>>>
>>> tmp = cast(char[][])splitlines(cast(string)read(a[3..$]));
>>> foreach(ref line; tmp) line = cast(char[])strip(line);
>>> for(int i = 0; i < tmp.length; i++)
>>> {
>>>   if (tmp[i].length == 0)
>>>     tmp.remove(i);
>>> }
>>
>> A lot of those casts go away if tmp is a string[].
> 
> True.  I knew someone would realise/spot that.  ;)  I don't think it 
> solves the basic problem however, which is why I didn't mention it in me 
> original post, perhaps I should have.  Let me explain further...
> 
> Imagine calling splitlines on char[] getting string[] then calling 
> tolower on each line.
> 
> Because you have a string[], tolower always has to dup any line it wants 
> to change.
> 
> If you template tolower and splitlines so that they accept const and 
> non-const input and return the same then in the case where it gets 
> non-const input tolower can avoid duplication (AKA *performance gain*)
> 
> The same is true for any function (phobos or otherwise) which will (or 
> even might) modify the input data.
> 
> In the case of splitlines or strip they don't need to modify the input 
> data so there is no performance gain by templating them and handling 
> string and char[] differently.
> 
> However, what you do get is a function which no longer 'takes ownership' 
> of the array passed and can then be used in a sequence of modifications 
> without having to cast or dup to 'take ownership back' by brute force 
> (the problem my code illustrates)
> 
> To think of it in the abstract sense, by passing a char[] you're passing 
> ownership of the data to the function, when it returns it is passing 
> ownership back again in the form char[] or char[][].
> 
> When you pass string you're not passing ownership and the function has 
> no other option but to duplicate the input when it wants to modify it.
> 
> In the current case of tolower it returns string and therefore retains 
> ownership of the string, despite the fact that it promptly forgets it 
> ever existed.  Not much we can do about this case because we cannot 
> decide what to return from a function at runtime.
> 
> Regan

In this particular case, you could call this mutating form of tolower() 
on the buffer returned from read(), and then split it afterwards 
(perhaps yielding strings). This makes a certain degree of logical 
sense: If you'd wanted to keep the original contents of the buffer, 
you'd have to duplicate it at some point anyway. Since you don't, you 
can alter it immediately.

I submit the following Python idiom: Functions which mutate their 
arguments should return nothing. That is:

// Return new string
string tolower(string);
// Mutate argument
void tolower(char[]);

This rather strictly highlights the difference between char[] and 
string, and makes it essentially impossible to mix up library functions 
differentiated in this way.

-- 
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org



More information about the Digitalmars-d mailing list