V2 string

Bruno Medeiros brunodomedeiros+spam at com.gmail
Thu Jul 5 07:27:13 PDT 2007


Frits van Bommel wrote:
> Bruno Medeiros wrote:
>> Regan Heath wrote:
>>> tolower is an interesting case.  As a caller I expect it to modify 
>>> the string, or perhaps give a modified copy back (both options are 
>>> valid and should perhaps be supported?).
>>>
>>> So, the 'string tolower(string)' version has 2 cases, the first case 
>>> where it doesn't need to modify the input and can simply return it, 
>>> no problem. But case 2, where it does modify it should dup and return 
>>> char[].  My reasoning being that after it has completed and returned 
>>> the copy, the caller now 'owns' the string (as it's the only copy in 
>>> existance and no-one else has a reference to it).
>>>
>>
>> Indeed, I think this illustrates that some standard library functions 
>> may not have the correct signature, and I tolower is likely one of them.
>> The most general case for tolower is:
>>   char[] tolower(const(char)[] s);
>> Since tolower creates a new array, but does not keep it, it can give 
>> away it's ownership of the the array (ie, return a mutable).
> 
> Sorry, but you seem to have missed a bit above: if the string doesn't 
> contain any uppercase characters tolower returns the input without 
> ..dup-ing it (aka copy-on-write).
> 

Oops, sorry, that's right, I missed that part about tolower not
modifying the string if it wasn't necessary. :(


>> The second case, more specific, is simply syntactic sugar for making 
>> that array invariant:
>>
>>   invariant(char)[] tolowerinv(const(char)[] str) {
>>     return cast(invariant) tolower(str);
>>   }
> 
> Yes, but only if it actually needs to modify the string.
> 
> You seem to have missed that the two cases can't (in general) be 
> distinguished at compile time; it's only at run time when a choice is 
> made between a copy and no copy.
> 
>> The current signature:
>>   const(char)[] tolower(const(char)[] str)
>> is kinda incorrect, because it returns a const reference for an array 
>> that has no mutable references, and that is the same as an invariant 
>> reference, so tolower might as well return invariant(char)[].
> 
> Again, that only holds if a copy was actually made at run time. If no 
> copy was made the original input is returned, to which there may be 
> mutable references.

You're right, if a copy is not made *every* time (which is the case
after all), then the above doesn't hold.
But then, what I think is happening is that Phobo's current tolower is
suboptimal in terms of usefulness, because the fact that we don't know
if a new copy is made or not. I'm wondering now what would be the more
useful form, or forms, of tolower (and similar functions) to have.
Now that I think of it again (admittedly I haven't got much experience 
with string manipulation in C++ or D, though), but perhaps the best form 
is an in-place mutable version:
   char[] tolower(char[] str);
And it's this one after all that is the most general form. If you want 
to call tolower on a const or invariant array you dup it yourself on the 
call:
   char[] str = tolower("FOO".dup);


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D



More information about the Digitalmars-d mailing list