toStringz or not toStringz

Thu Jul 14 04:30:24 PDT 2011

On Thu, 14 Jul 2011 05:53:47 -0400, Regan Heath <regan at netmail.co.nz>  
wrote:

> On Wed, 13 Jul 2011 19:31:42 +0100, Steven Schveighoffer  
> <schveiguy at yahoo.com> wrote:
>
>> On Wed, 13 Jul 2011 13:32:56 -0400, Regan Heath <regan at netmail.co.nz>  
>> wrote:
>>
>>> On Wed, 13 Jul 2011 17:00:39 +0100, Steven Schveighoffer  
>>> <schveiguy at yahoo.com> wrote:
>>
>>>> How does your proposal know that a char * is part of a heap-allocated  
>>>> array?  If you are assuming the only case where char * is passed will  
>>>> be arr.ptr, then that doesn't cut it.  What if the compiler doesn't  
>>>> know where the char * came from?
>>>
>>> See your Q and my A above ("char * foo" example).
>>>
>>>> The inherent problem of zero-terminated strings is that you don't  
>>>> know how long it is until you search for a zero.  If it's not  
>>>> properly terminated, then you are screwed.  That problem cannot be  
>>>> "solved", even with compiler help -- you can get situations where  
>>>> there is no more information other than the pointer.
>>>
>>> Really?  But cant we obtain the GC lock and look them up, as mentioned  
>>> above?  And isn't this exactly what toStringz will do when the  
>>> programmer first of all curses because it has crashed, and then adds  
>>> an explicit toStringz call?
>>
>> Who said the char * points into GC memory?  It could point at stack  
>> memory, or static data in ROM.
>
> Ok.  What would toStringz do in this case? .. because that's what I'm  
> proposing we do here.

Nothing, you don't call toStringz on a char *, you call it on a string.   
The point is, for those who have already guaranteed a char * has a 0 in  
it, they should not have to have the compiler injecting useless code for a  
simple function call.

A really really good example is if you use a char * you got from a C  
function to call another C function.

> The goal here is to pick some low hanging fruit, the general case  
> mentioned earlier, and make it work as a new D programmer would expect.   
> In that case there is no technical difficulty implementing it (toStringz  
> already exists), there is no extra cost (you already have to call  
> toStringz), and the only disagreement seems to be whether it should be  
> implicit or explicit.

There is an extra cost where you wouldn't have to call toStringz currently.

>
> In this particular case I cannot see any harm in making it implicit.   
> Yes, there are some edge cases, but they either already exist (as shown  
> by the explicit toStringz example I gave where the passed char[]  
> remained unchanged, and your example passing buffer[]), or they may be  
> detectable by the compiler, or they are rare - in which case requiring  
> some manual intervention is not too much to ask.
>
> So, on balance I reckon the implicit call would be "better" for more  
> people more of the time, and at no extra cost.  It seems like a win/win  
> to me.  Yes, there are edge cases, yes there are wrinkles to iron out,  
> no it's not a "general/covers everything perfectly" kind of idea - which  
> I agree we'd all prefer, but it makes D look slicker, and removes one  
> more stumbling block for new D programmers.

We also have to weigh this against two things:

1. How will existing code (that already calls toStringz) be affected?
2. This is *not* a trivial compiler change.  So all other options should  
be considered, there's a *lot* of C calls that exist from D today that  
could possibly be affected.

If C strings were their own type (and not conflated with "buffer  
pointer"), and verifying a C string was valid without segfaulting and in  
O(1) time, I'd agree that a compiler change would be warranted.  There's  
just too many cases (note, these aren't the majority, but they are enough)  
where the injected calls will be either performance drags or unnecessary.