D2 toStringz Return Type

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Nov 7 13:26:14 PST 2008


Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
>> Steven Schveighoffer wrote:
>>> "Andrei Alexandrescu" wrote
>>>> Steven Schveighoffer wrote:
>>>>> "Andrei Alexandrescu" wrote
>>>>>> Mike Parker wrote:
>>>>>>> I'm curious as to why toStringz in D2 returns const(char)* instead of 
>>>>>>> just a plain char*. Considering that the primary use case foe the 
>>>>>>> function is interfacing with C code, it seems rather pointless to 
>>>>>>> return a const(char)*.
>>>>>> We want to leave the opportunity open to not duplicate the actual 
>>>>>> memory underneath the string object. (Right now that opportunity is 
>>>>>> not effected.)
>>>>> My recommendation -- have 2 functions.  One which always copies (and 
>>>>> returns char *), and one which does not.
>>>>>
>>>>> This at least leaves a safe alternative for people who have headers 
>>>>> that aren't properly constified, and don't want to go through the 
>>>>> hassle of looking it up themselves.  Also good for those C functions 
>>>>> which actually require a mutable char *, since D2 strings are mostly 
>>>>> invariant.
>>>> You can't quite do that because dynamic conditions establish whether 
>>>> it's safe to avoid copying or not.
>>> I can see how you interpreted it this way.
>>>
>>> What I meant was one is the toStringz as it is today, which might copy 
>>> and might leave it in-place.  This can be used to call C functions that 
>>> take a const char *.  The other function will *always* copy, and will 
>>> return a mutable char *.  This is for when you don't care to look at the 
>>> function yourself (assuming the author got it correct), or the case where 
>>> the C function actually does mutate the argument.
>>>
>>> If the C function does actually require a mutable argument, you are 
>>> forced to do an extra dup for no reason with today's toStringz.
>>>
>>> -Steve
>> I see. So:
>>
>> const(char)* toStringzMayOrMayNotCopy(in char[]);
>> char* toStringzWillAlwaysCopy(in char[]);
>>
>> Providing writable zero-terminated strings is a sure recipe for disaster
>>  (see the debates around sprintf, strcpy etc.). I think the need for
>> such things is rare and at best avoided entirely by the standard
>> library. If you so wish, you can always use malloc by hand.
> 
> Using zero terminated strings, even const ones, is a recipe for disaster. 
> Yet, there it is. 

Well writable ones are even more of a disaster. Reading random 
characters can cause the program to fail but does not corrupt its state 
arbitrarily. So it's good to limit the damage. The C and C++ communities 
have much more beef with writable stringz's than read-only ones.

> And it's making me do 2 duplications.

Not at all.

string s = ...;
auto sz = cast(char*) malloc(s.length + 1);
sz[0 .. s.length] = s[];
sz[s.length] = 0;

If you use it often in an application, put it in a function. I'm not 
putting it in the standard library.

> The reality is that as soon as you cross the boundary from D to C, you have 
> lost all the safety benefits that D provides, even if the signature is 
> const.

I disagree. You lost automatic checking from the D side when interfacing 
with C, but if a C function is reliably not mutating its arguments its D 
signature is better tagged as const. It's a net win.

> The reality is, people are still going to call these functions, 
> either with an extra dup (which buys you nothing in safety), or by editing 
> the bindings to be const (which makes it even more unsafe).  The reality is, 
> most of these calls are pretty innocuous.  People aren't using sprintf or 
> strcpy, they are using C libraries that do things that D doesn't already do. 
> Most of these are just using char * as a way to pass const strings, it isn't 
> too much to ask for a function that complies.

Maybe I got lucky, but I haven't run across any C libraries that don't 
use const in signatures. Anyhow the point is superfluous as you, not 
them, gets to write the D interfacing signatures. Const conveys a world 
of information. True, that is not 100% enforceable in D and in C alike, 
as a cast could always ruin things. But it's good if the signature 
reflects a guarantee that is reasonable and also reasonably easy to observe.

> But you probably won't add it.  That's ok, I don't use Phobos anyways.  I'll 
> be sure to add an appropriate function to Tango while porting it to D2.

You may want to rethink before putting dangerous functions in 
widely-used libraries. Returning a writable zero-terminated char* is as 
dangerous as it gets, and fostering bad coding style too.


Andrei



More information about the Digitalmars-d mailing list