toStringz or not toStringz

Steven Schveighoffer schveiguy at yahoo.com
Thu Jul 14 04:30:24 PDT 2011


On Thu, 14 Jul 2011 05:53:47 -0400, Regan Heath <regan at netmail.co.nz>  
wrote:

> On Wed, 13 Jul 2011 19:31:42 +0100, Steven Schveighoffer  
> <schveiguy at yahoo.com> wrote:
>
>> On Wed, 13 Jul 2011 13:32:56 -0400, Regan Heath <regan at netmail.co.nz>  
>> wrote:
>>
>>> On Wed, 13 Jul 2011 17:00:39 +0100, Steven Schveighoffer  
>>> <schveiguy at yahoo.com> wrote:
>>
>>>> How does your proposal know that a char * is part of a heap-allocated  
>>>> array?  If you are assuming the only case where char * is passed will  
>>>> be arr.ptr, then that doesn't cut it.  What if the compiler doesn't  
>>>> know where the char * came from?
>>>
>>> See your Q and my A above ("char * foo" example).
>>>
>>>> The inherent problem of zero-terminated strings is that you don't  
>>>> know how long it is until you search for a zero.  If it's not  
>>>> properly terminated, then you are screwed.  That problem cannot be  
>>>> "solved", even with compiler help -- you can get situations where  
>>>> there is no more information other than the pointer.
>>>
>>> Really?  But cant we obtain the GC lock and look them up, as mentioned  
>>> above?  And isn't this exactly what toStringz will do when the  
>>> programmer first of all curses because it has crashed, and then adds  
>>> an explicit toStringz call?
>>
>> Who said the char * points into GC memory?  It could point at stack  
>> memory, or static data in ROM.
>
> Ok.  What would toStringz do in this case? .. because that's what I'm  
> proposing we do here.

Nothing, you don't call toStringz on a char *, you call it on a string.   
The point is, for those who have already guaranteed a char * has a 0 in  
it, they should not have to have the compiler injecting useless code for a  
simple function call.

A really really good example is if you use a char * you got from a C  
function to call another C function.

> The goal here is to pick some low hanging fruit, the general case  
> mentioned earlier, and make it work as a new D programmer would expect.   
> In that case there is no technical difficulty implementing it (toStringz  
> already exists), there is no extra cost (you already have to call  
> toStringz), and the only disagreement seems to be whether it should be  
> implicit or explicit.

There is an extra cost where you wouldn't have to call toStringz currently.

>
> In this particular case I cannot see any harm in making it implicit.   
> Yes, there are some edge cases, but they either already exist (as shown  
> by the explicit toStringz example I gave where the passed char[]  
> remained unchanged, and your example passing buffer[]), or they may be  
> detectable by the compiler, or they are rare - in which case requiring  
> some manual intervention is not too much to ask.
>
> So, on balance I reckon the implicit call would be "better" for more  
> people more of the time, and at no extra cost.  It seems like a win/win  
> to me.  Yes, there are edge cases, yes there are wrinkles to iron out,  
> no it's not a "general/covers everything perfectly" kind of idea - which  
> I agree we'd all prefer, but it makes D look slicker, and removes one  
> more stumbling block for new D programmers.

We also have to weigh this against two things:

1. How will existing code (that already calls toStringz) be affected?
2. This is *not* a trivial compiler change.  So all other options should  
be considered, there's a *lot* of C calls that exist from D today that  
could possibly be affected.

If C strings were their own type (and not conflated with "buffer  
pointer"), and verifying a C string was valid without segfaulting and in  
O(1) time, I'd agree that a compiler change would be warranted.  There's  
just too many cases (note, these aren't the majority, but they are enough)  
where the injected calls will be either performance drags or unnecessary.


More information about the Digitalmars-d mailing list