toStringz or not toStringz

Thu Jul 14 08:00:48 PDT 2011

On Thu, 14 Jul 2011 12:30:24 +0100, Steven Schveighoffer  
<schveiguy at yahoo.com> wrote:
> On Thu, 14 Jul 2011 05:53:47 -0400, Regan Heath <regan at netmail.co.nz>  
> wrote:
>
>> On Wed, 13 Jul 2011 19:31:42 +0100, Steven Schveighoffer  
>> <schveiguy at yahoo.com> wrote:
>>
>>> On Wed, 13 Jul 2011 13:32:56 -0400, Regan Heath <regan at netmail.co.nz>  
>>> wrote:
>>>
>>>> On Wed, 13 Jul 2011 17:00:39 +0100, Steven Schveighoffer  
>>>> <schveiguy at yahoo.com> wrote:
>>>
>>>>> How does your proposal know that a char * is part of a  
>>>>> heap-allocated array?  If you are assuming the only case where char  
>>>>> * is passed will be arr.ptr, then that doesn't cut it.  What if the  
>>>>> compiler doesn't know where the char * came from?
>>>>
>>>> See your Q and my A above ("char * foo" example).
>>>>
>>>>> The inherent problem of zero-terminated strings is that you don't  
>>>>> know how long it is until you search for a zero.  If it's not  
>>>>> properly terminated, then you are screwed.  That problem cannot be  
>>>>> "solved", even with compiler help -- you can get situations where  
>>>>> there is no more information other than the pointer.
>>>>
>>>> Really?  But cant we obtain the GC lock and look them up, as  
>>>> mentioned above?  And isn't this exactly what toStringz will do when  
>>>> the programmer first of all curses because it has crashed, and then  
>>>> adds an explicit toStringz call?
>>>
>>> Who said the char * points into GC memory?  It could point at stack  
>>> memory, or static data in ROM.
>>
>> Ok.  What would toStringz do in this case? .. because that's what I'm  
>> proposing we do here.
>
> Nothing, you don't call toStringz on a char *, you call it on a string.   
> The point is, for those who have already guaranteed a char * has a 0 in  
> it, they should not have to have the compiler injecting useless code for  
> a simple function call.
>
> A really really good example is if you use a char * you got from a C  
> function to call another C function.

Good points all.  So, the idea should be limited to cases where D's char[]  
and string are passed to extern "C" functions expecting char*, and should  
not affect cases where D's char* is passed directly.  Sounds good.

>> The goal here is to pick some low hanging fruit, the general case  
>> mentioned earlier, and make it work as a new D programmer would  
>> expect.  In that case there is no technical difficulty implementing it  
>> (toStringz already exists), there is no extra cost (you already have to  
>> call toStringz), and the only disagreement seems to be whether it  
>> should be implicit or explicit.
>
> There is an extra cost where you wouldn't have to call toStringz  
> currently.

The point I've tried to make all along is that this is a rare situation,  
and not the general case.  In the general case you're going to need to  
call toStringz.  Especially if you restrict this idea to D's char[] and  
string and not D's char* as mentioned above.

>> In this particular case I cannot see any harm in making it implicit.   
>> Yes, there are some edge cases, but they either already exist (as shown  
>> by the explicit toStringz example I gave where the passed char[]  
>> remained unchanged, and your example passing buffer[]), or they may be  
>> detectable by the compiler, or they are rare - in which case requiring  
>> some manual intervention is not too much to ask.
>>
>> So, on balance I reckon the implicit call would be "better" for more  
>> people more of the time, and at no extra cost.  It seems like a win/win  
>> to me.  Yes, there are edge cases, yes there are wrinkles to iron out,  
>> no it's not a "general/covers everything perfectly" kind of idea -  
>> which I agree we'd all prefer, but it makes D look slicker, and removes  
>> one more stumbling block for new D programmers.
>
> We also have to weigh this against two things:

Assuming the above mentioned restriction (char[] and string, not char*)...

> 1. How will existing code (that already calls toStringz) be affected?

Not at all.

> 2. This is *not* a trivial compiler change.  So all other options should  
> be considered, there's a *lot* of C calls that exist from D today that  
> could possibly be affected.

It will affect none of these.

> If C strings were their own type (and not conflated with "buffer  
> pointer"), and verifying a C string was valid without segfaulting and in  
> O(1) time, I'd agree that a compiler change would be warranted.  There's  
> just too many cases (note, these aren't the majority, but they are  
> enough) where the injected calls will be either performance drags or  
> unnecessary.

I disagree about the number of cases being too many, but this is a gut  
feeling and I have no evidence to support it.

I think with the restriction I mentioned above the situation changes  
however, as all those edge cases are unaffected, old code is unaffected  
and only new code will allow char[] and string to be passed as extern "C"  
char* parameters.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/