toStringz or not toStringz
Regan Heath
regan at netmail.co.nz
Thu Jul 14 08:00:48 PDT 2011
On Thu, 14 Jul 2011 12:30:24 +0100, Steven Schveighoffer
<schveiguy at yahoo.com> wrote:
> On Thu, 14 Jul 2011 05:53:47 -0400, Regan Heath <regan at netmail.co.nz>
> wrote:
>
>> On Wed, 13 Jul 2011 19:31:42 +0100, Steven Schveighoffer
>> <schveiguy at yahoo.com> wrote:
>>
>>> On Wed, 13 Jul 2011 13:32:56 -0400, Regan Heath <regan at netmail.co.nz>
>>> wrote:
>>>
>>>> On Wed, 13 Jul 2011 17:00:39 +0100, Steven Schveighoffer
>>>> <schveiguy at yahoo.com> wrote:
>>>
>>>>> How does your proposal know that a char * is part of a
>>>>> heap-allocated array? If you are assuming the only case where char
>>>>> * is passed will be arr.ptr, then that doesn't cut it. What if the
>>>>> compiler doesn't know where the char * came from?
>>>>
>>>> See your Q and my A above ("char * foo" example).
>>>>
>>>>> The inherent problem of zero-terminated strings is that you don't
>>>>> know how long it is until you search for a zero. If it's not
>>>>> properly terminated, then you are screwed. That problem cannot be
>>>>> "solved", even with compiler help -- you can get situations where
>>>>> there is no more information other than the pointer.
>>>>
>>>> Really? But cant we obtain the GC lock and look them up, as
>>>> mentioned above? And isn't this exactly what toStringz will do when
>>>> the programmer first of all curses because it has crashed, and then
>>>> adds an explicit toStringz call?
>>>
>>> Who said the char * points into GC memory? It could point at stack
>>> memory, or static data in ROM.
>>
>> Ok. What would toStringz do in this case? .. because that's what I'm
>> proposing we do here.
>
> Nothing, you don't call toStringz on a char *, you call it on a string.
> The point is, for those who have already guaranteed a char * has a 0 in
> it, they should not have to have the compiler injecting useless code for
> a simple function call.
>
> A really really good example is if you use a char * you got from a C
> function to call another C function.
Good points all. So, the idea should be limited to cases where D's char[]
and string are passed to extern "C" functions expecting char*, and should
not affect cases where D's char* is passed directly. Sounds good.
>> The goal here is to pick some low hanging fruit, the general case
>> mentioned earlier, and make it work as a new D programmer would
>> expect. In that case there is no technical difficulty implementing it
>> (toStringz already exists), there is no extra cost (you already have to
>> call toStringz), and the only disagreement seems to be whether it
>> should be implicit or explicit.
>
> There is an extra cost where you wouldn't have to call toStringz
> currently.
The point I've tried to make all along is that this is a rare situation,
and not the general case. In the general case you're going to need to
call toStringz. Especially if you restrict this idea to D's char[] and
string and not D's char* as mentioned above.
>> In this particular case I cannot see any harm in making it implicit.
>> Yes, there are some edge cases, but they either already exist (as shown
>> by the explicit toStringz example I gave where the passed char[]
>> remained unchanged, and your example passing buffer[]), or they may be
>> detectable by the compiler, or they are rare - in which case requiring
>> some manual intervention is not too much to ask.
>>
>> So, on balance I reckon the implicit call would be "better" for more
>> people more of the time, and at no extra cost. It seems like a win/win
>> to me. Yes, there are edge cases, yes there are wrinkles to iron out,
>> no it's not a "general/covers everything perfectly" kind of idea -
>> which I agree we'd all prefer, but it makes D look slicker, and removes
>> one more stumbling block for new D programmers.
>
> We also have to weigh this against two things:
Assuming the above mentioned restriction (char[] and string, not char*)...
> 1. How will existing code (that already calls toStringz) be affected?
Not at all.
> 2. This is *not* a trivial compiler change. So all other options should
> be considered, there's a *lot* of C calls that exist from D today that
> could possibly be affected.
It will affect none of these.
> If C strings were their own type (and not conflated with "buffer
> pointer"), and verifying a C string was valid without segfaulting and in
> O(1) time, I'd agree that a compiler change would be warranted. There's
> just too many cases (note, these aren't the majority, but they are
> enough) where the injected calls will be either performance drags or
> unnecessary.
I disagree about the number of cases being too many, but this is a gut
feeling and I have no evidence to support it.
I think with the restriction I mentioned above the situation changes
however, as all those edge cases are unaffected, old code is unaffected
and only new code will allow char[] and string to be passed as extern "C"
char* parameters.
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
More information about the Digitalmars-d
mailing list