const and phobos string functions
Kirk McDonald
kirklin.mcdonald at gmail.com
Thu Aug 2 02:35:45 PDT 2007
Regan Heath wrote:
> Kirk McDonald wrote:
>> Regan Heath wrote:
>>> Note in the code below 'remove' is a template function which uses
>>> 'memmove' to remove an item from the array.
>>>
>>> (The code):
>>>
>>> char[][] tmp;
>>>
>>> tmp = cast(char[][])splitlines(cast(string)read(a[3..$]));
>>> foreach(ref line; tmp) line = cast(char[])strip(line);
>>> for(int i = 0; i < tmp.length; i++)
>>> {
>>> if (tmp[i].length == 0)
>>> tmp.remove(i);
>>> }
>>
>> A lot of those casts go away if tmp is a string[].
>
> True. I knew someone would realise/spot that. ;) I don't think it
> solves the basic problem however, which is why I didn't mention it in me
> original post, perhaps I should have. Let me explain further...
>
> Imagine calling splitlines on char[] getting string[] then calling
> tolower on each line.
>
> Because you have a string[], tolower always has to dup any line it wants
> to change.
>
> If you template tolower and splitlines so that they accept const and
> non-const input and return the same then in the case where it gets
> non-const input tolower can avoid duplication (AKA *performance gain*)
>
> The same is true for any function (phobos or otherwise) which will (or
> even might) modify the input data.
>
> In the case of splitlines or strip they don't need to modify the input
> data so there is no performance gain by templating them and handling
> string and char[] differently.
>
> However, what you do get is a function which no longer 'takes ownership'
> of the array passed and can then be used in a sequence of modifications
> without having to cast or dup to 'take ownership back' by brute force
> (the problem my code illustrates)
>
> To think of it in the abstract sense, by passing a char[] you're passing
> ownership of the data to the function, when it returns it is passing
> ownership back again in the form char[] or char[][].
>
> When you pass string you're not passing ownership and the function has
> no other option but to duplicate the input when it wants to modify it.
>
> In the current case of tolower it returns string and therefore retains
> ownership of the string, despite the fact that it promptly forgets it
> ever existed. Not much we can do about this case because we cannot
> decide what to return from a function at runtime.
>
> Regan
In this particular case, you could call this mutating form of tolower()
on the buffer returned from read(), and then split it afterwards
(perhaps yielding strings). This makes a certain degree of logical
sense: If you'd wanted to keep the original contents of the buffer,
you'd have to duplicate it at some point anyway. Since you don't, you
can alter it immediately.
I submit the following Python idiom: Functions which mutate their
arguments should return nothing. That is:
// Return new string
string tolower(string);
// Mutate argument
void tolower(char[]);
This rather strictly highlights the difference between char[] and
string, and makes it essentially impossible to mix up library functions
differentiated in this way.
--
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org
More information about the Digitalmars-d
mailing list