toStringz and toUTFz potentially unsafe

Jonathan M Davis jmdavisProg at gmx.com
Sun Jul 24 19:09:17 PDT 2011


On Sunday 24 July 2011 21:57:34 Johann MacDonagh wrote:
> On 7/24/2011 9:06 PM, Jonathan M Davis wrote:
> > On Sunday 24 July 2011 17:56:04 Jonathan M Davis wrote:
> > The real question is what to do with to!(char*)(str). The plan is to
> > make it call toUTFz, but at that point, the warning about toUTFz is not
> > as obvious (though it can re-iterate the warning or point you to the
> > toUTFz documentation to read it). Also, since you already have toUTFz,
> > calling to!(char*) is kind of pointless. So, I think that there's a
> > good argument for forcing to!(char*) to append '\0' instead of checking
> > one past the end. Then when you want a guarantee that the '\0' isn't
> > going to change, you can use to!(char*), and if you want the
> > efficiency, you can call toUTFz. But it is debatable whether we should
> > do that or just have to!(char*) call toUTFz in all cases. I'm leaning
> > towards making it always copy though.
> > 
> > - Jonathan M Davis
> 
> In that case, maybe we should implement @schveiguy's suggestion.
> 
> immutable(char)* toStringz(string s, bool unsafe = true) pure nothrow
> 
> That way the user can decide whether to take the optimization risk (or
> if they know the string is on the stack, etc...). In addition, always
> copying is wasteful. We're usually able to append a NULL to a dynamic
> array without relocation / copying.

If you always append a '\0', then there's no point to toStringz at all. So, 
sure we _could_ add the unsafe parameter like that, but I seriously question 
the value of it. The primary value in toStringz is to give you the 
optimization of looking one past the end of the string and attempting to 
completely avoid any chance of reallocation.

And yes, you'd use ~= which _might_ copy rather than forcing a copy every 
time, but the cases where you could have just checked one past the end of the 
array and done nothing if it were '\0' are generally going to be the cases 
where ~= has to reallocate. So, in reality, you're pretty much going to copy 
every time that you could have avoided the copy if you had gone with true for 
unsafe rather than false.

- Jonathan M Davis


More information about the Digitalmars-d mailing list