[Design] return char[] or string?

Mon Jul 30 01:37:40 PDT 2007

Some comments inline...

Kirk McDonald wrote:
> Stewart Gordon wrote:
>> While I haven't got into using D 2.x, I've already begun thinking 
>> about making libraries compatible with it.  On this basis, a design 
>> decision to consider is whether functions that return a string should 
>> return it as a char[] or a const(char)[].  (I use "string" with its 
>> general meaning, and "const(char)[]" to refer to that specific type.  
>> Obviously for 1.0 compatibility, I'd have to use the "string" alias 
>> wherever I want const(char)[].)
>>
>> Obviously, a function that takes a string as a parameter has to take 
>> in a const(char)[], to be able to accept a string literal or otherwise 
>> a constant string.  But what about the return type?

It's a pity D cannot differentiate string literals and place those 
passed as char[] (mutable) parameters in RAM.  Obviously it would have 
to create a seperate one for each and every use.

>> Looking through the 2.x version of std.string, they all return 
>> const(char)[] rather than char[].  (Except for those that return 
>> something else such as a number.)  This is necessary in most cases 
>> because of the copy-on-write policy.

True, however when you perform 'copy on write' you get a copy of the 
original and that copy is unique and owned by the copier and therefore 
can be mutable, or in other words char[] not const(char)[].

>> But otherwise, it seems that both have their pros and cons.
>>
>> There seem to be two cases to consider: libraries targeted 
>> specifically at D 2.x, and libraries that (attempt to) support both 
>> 1.x and 2.x.  At the moment, it's the latter that really matters.
>>
>> Let's see.  The string-returning functions in my library more or less 
>> fall into these categories:
>> (a) functions that build a string in a local variable, which is then 
>> returned
>> (b) functions that return a copy of a member variable
>> (c) property setters and the like that simply pass the argument through
>> (d) functions that call a function in Phobos and return the result
>>
>> In the case of (a), there is no obvious benefit to returning a 
>> const(char)[] rather than a char[].
>>
>> Many of the cases of (b) are property getters.  If we have such things 
>> returning a const(char)[], then the getter no longer needs to copy the 
>> member variable.  Though versioning would be needed to implement this 
>> behaviour without causing havoc under 1.x.  The alternative, leaving 
>> them returning char[], leads to inconsistency with (c), which would 
>> have to return const(char)[].
>>
>> That leaves (d), to which the obvious answer is to return whatever 
>> type the Phobos function returns.
>>
>> On one hand, if the string is generated on the fly, and so altering it 
>> would not cause a problem, it seems wasteful to return a const(char)[] 
>> only for the caller to have to .dup it if it wants to modify it.

Indeed and some Phobos function are doing this, it has been a source of 
irritation for me since the inception of 'const'.

>> On the other hand, from the library user's point of view, it can be 
>> seen as a confusing inconsistency if some functions return char[] and 
>> others const(char)[], when no difference in the semantics of what's 
>> returned accounts for this.  It also borders on breaking the 
>> encapsulation principle, whereby internal implementation details 
>> should not be exposed in my library's API.

I think perhaps providing more than one overload could help lessen 
confusion, things like having:

char[] tolowerInplace(char [] s)

in addition to the standard tolower.

> It's a question of ownership. If the function is returning a new string, 
> and giving ownership of that string to the caller, then it should return 
> a char[]. If the function is returning a string which the caller is 
> merely borrowing, it should return a const(char)[]. In most cases, 
> thinking of things this way causes the return type to be obvious.
> 
> And, of course, you can always convert a char[] to a const(char)[].

This is how I tend to think about it also.

> In (a), the function is returning a new string to the caller; it should 
> return char[].
> 
> (b) should usually return const(char)[], unless of course you want the 
> caller to mutate the string. If you're going through the trouble of 
> wrapping a member with a getter/setter, then that probably means you 
> don't want the user messing with it directly.
> 
> The other cases are less clear, and will vary from function to function.

As I mentioned above I have been repeatedly annoyed by a number of 
Phobos string functions since the introduction of 'const'.

I think in some cases we need to rethink some of the functions and how 
they work in order to provide a more 'const' aware/friendly library.

Example "string[] split(in string s)" in std.string.

If the input is char[] then this function essentially casts the input to 
const and if I want to perform further modification of the input I now 
have to dup the results.

In a sense this function 'takes ownership' of the input and does not 
give it back again.

I think in this case split should be templated.  If the input is char[] 
the result should be char[][], if the input is string the result should 
be string[].

This works fine for cases where the input is not ever copied, but in 
cases where it is conditionally copied, "string tolower(string s)" in 
std.string for example.

It cannot know ahead of time whether it's going to need to 'copy on 
write' so simply templating it doesn't help, however I suggested a 
possible templated solution which dups only in the case where the input 
is 'string':

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55337

I figure if want a copy of the input you can manually dup the parameter 
you pass.

Another solution (also hinted at above) may be to provide more than one 
overload, you might do this where you cannot easily template a solution 
to efficiently handle the common case for each input type (mutable/const).

As for your cases mentioned above...

I would probably implement (c), a property setter, as code that sets the 
member followed by a call to the getter so it would return the same as 
(b).  That said I haven't written a lot of these so perhaps my 
experience using them isn't sufficient.

Is there some reason you'd rather return char[] from a setter?

I'm hoping in the case of (d) that phobos will change or provide more 
overloads to handle the different use-cases.

Regan