toStringz or not toStringz

Tue Jul 12 09:09:04 PDT 2011

On Tue, 12 Jul 2011 11:41:56 -0400, Regan Heath <regan at netmail.co.nz>  
wrote:

> On Tue, 12 Jul 2011 15:59:58 +0100, Steven Schveighoffer  
> <schveiguy at yahoo.com> wrote:
>
>> On Tue, 12 Jul 2011 10:50:07 -0400, Regan Heath <regan at netmail.co.nz>  
>> wrote:
>
>>>> What if you expect the function is expecting to write to the buffer,  
>>>> and the compiler just made a copy of it?  Won't that be pretty  
>>>> surprising?
>>>
>>> Assuming a C function in this form:
>>>
>>>    void write_to_buffer(char *buffer, int length);
>>
>> No, assuming C function in this form:
>>
>> void ucase(char* str);
>>
>> Essentially, a C function which takes a writable  
>> already-null-terminated string, and writes to it.
>
> Ok, that's an even better example for my case.
>
> It would be used/called like...
>
>    char[] foo;
>    .. code which populates foo with something ..
>    ucase(foo);
>
> and in D today this would corrupt memory.  Unless the programmer  
> remembered to write:

No, it wouldn't compile.  char[] does not cast implicitly to char *.  (if  
it does, that needs to change).

> I am assuming also that if this idea were implemented it would handle  
> things intelligently, like for example if when toStringz is called the  
> underlying array is out of room and needs to be reallocated, the  
> compiler would update the slice/reference 'foo' in the same way as it  
> already does for an append which triggers a reallocation.

OK, but what if it's like this:

char[] foo = new char[100];
auto bar = foo;

ucase(foo);

In most cases, bar is also written to, but in some cases only foo is  
written to.

Granted, we're getting further out on the hypothetical limb here :)  But  
my point is, making it require explicit calling of toStringz instead of  
implicit makes the code less confusing, because you understand "oh,  
toStringz may reallocate, so I can't expect bar to also get updated" vs.  
simply calling a function with a buffer.

>>> You might initially extern it as:
>>>
>>>    extern "C" void write_to_buffer(char *buffer, int length);
>>>
>>> And, you could call it one of 2 ways (legitimately):
>>>
>>>    char[] foo = new char[100];
>>>    write_to_buffer(foo, foo.length);
>>>
>>> or:
>>>
>>>    char[100] foo;
>>>    write_to_buffer(foo, foo.length);
>>>
>>> and in both cases, toStringz would do nothing as foo is zero  
>>> terminated already (in both cases), or am I wrong about that?
>>
>> In neither case are they required to be null terminated.
>
> True, but I was outlining the worst case scenario for my suggestion, not  
> describing the real C function requirements.

No, I mean you were wrong, D does not guarantee either of those (stack  
allocated or heap allocated) is null terminated.  So toStringz must add a  
'\0' at the end (which is mildly expensive for heap data, and very  
expensive for stack data).

>> The only thing that guarantees null termination is a string literal.
>
> string literals /and/ calling toStringz.
>
>> Even "abc".dup is not going to be guaranteed to be null terminated.   
>> For an actual example, try "012345678901234".dup.  This should have a  
>> 0x0f right after the last character.
>
> Why 0x0f?  Does the allocator initialise array memory to it's offset  
> from the start of the block or something?

The final byte of the block is used as the hidden array length (in this  
case 15).

-Steve