toStringz or not toStringz
Steven Schveighoffer
schveiguy at yahoo.com
Tue Jul 12 09:09:04 PDT 2011
On Tue, 12 Jul 2011 11:41:56 -0400, Regan Heath <regan at netmail.co.nz>
wrote:
> On Tue, 12 Jul 2011 15:59:58 +0100, Steven Schveighoffer
> <schveiguy at yahoo.com> wrote:
>
>> On Tue, 12 Jul 2011 10:50:07 -0400, Regan Heath <regan at netmail.co.nz>
>> wrote:
>
>>>> What if you expect the function is expecting to write to the buffer,
>>>> and the compiler just made a copy of it? Won't that be pretty
>>>> surprising?
>>>
>>> Assuming a C function in this form:
>>>
>>> void write_to_buffer(char *buffer, int length);
>>
>> No, assuming C function in this form:
>>
>> void ucase(char* str);
>>
>> Essentially, a C function which takes a writable
>> already-null-terminated string, and writes to it.
>
> Ok, that's an even better example for my case.
>
> It would be used/called like...
>
> char[] foo;
> .. code which populates foo with something ..
> ucase(foo);
>
> and in D today this would corrupt memory. Unless the programmer
> remembered to write:
No, it wouldn't compile. char[] does not cast implicitly to char *. (if
it does, that needs to change).
> I am assuming also that if this idea were implemented it would handle
> things intelligently, like for example if when toStringz is called the
> underlying array is out of room and needs to be reallocated, the
> compiler would update the slice/reference 'foo' in the same way as it
> already does for an append which triggers a reallocation.
OK, but what if it's like this:
char[] foo = new char[100];
auto bar = foo;
ucase(foo);
In most cases, bar is also written to, but in some cases only foo is
written to.
Granted, we're getting further out on the hypothetical limb here :) But
my point is, making it require explicit calling of toStringz instead of
implicit makes the code less confusing, because you understand "oh,
toStringz may reallocate, so I can't expect bar to also get updated" vs.
simply calling a function with a buffer.
>>> You might initially extern it as:
>>>
>>> extern "C" void write_to_buffer(char *buffer, int length);
>>>
>>> And, you could call it one of 2 ways (legitimately):
>>>
>>> char[] foo = new char[100];
>>> write_to_buffer(foo, foo.length);
>>>
>>> or:
>>>
>>> char[100] foo;
>>> write_to_buffer(foo, foo.length);
>>>
>>> and in both cases, toStringz would do nothing as foo is zero
>>> terminated already (in both cases), or am I wrong about that?
>>
>> In neither case are they required to be null terminated.
>
> True, but I was outlining the worst case scenario for my suggestion, not
> describing the real C function requirements.
No, I mean you were wrong, D does not guarantee either of those (stack
allocated or heap allocated) is null terminated. So toStringz must add a
'\0' at the end (which is mildly expensive for heap data, and very
expensive for stack data).
>> The only thing that guarantees null termination is a string literal.
>
> string literals /and/ calling toStringz.
>
>> Even "abc".dup is not going to be guaranteed to be null terminated.
>> For an actual example, try "012345678901234".dup. This should have a
>> 0x0f right after the last character.
>
> Why 0x0f? Does the allocator initialise array memory to it's offset
> from the start of the block or something?
The final byte of the block is used as the hidden array length (in this
case 15).
-Steve
More information about the Digitalmars-d
mailing list