std.file functions and embedded NUL characters [CWE-158]

Steven Schveighoffer schveiguy at gmail.com
Fri Aug 1 17:53:17 UTC 2025


On Friday, 1 August 2025 at 11:02:37 UTC, kdevel wrote:
> On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
> wrote:
>> [...]
>> If we checked for mid-string zero terminators on all calls to 
>> `toStringz`, we would kill performance where mostly it isn't 
>> necessary (this is only important if you don't trust where the 
>> data came from. This would lead to a different sort of problem 
>> ("How come D/C interop is so slow!?")
>
> This is a strawman. I am writing about file system functions! 
> Furthermode in the case of `std.file.rename` `toStringz` is not 
> even called. What is used to convert to `char *` seems to be 
> the highly interesting template `tempCString` in 
> `std.internal.cstring`.

I was responding to the link sent about `toStringz`. I don't 
think we can always check for internal 0 characters there.

But in `tempCString`, we can, since we are always copying.

However, this is delicate, because you will affect performance. A 
slice assign is `memcpy`, and `memcpy` is damn fast (and hard to 
reimplement).

If instead you check every character, you will change to a for 
loop, which will be slow.

I think the right answer here is to use 
[strncpy](https://cplusplus.com/reference/cstring/strncpy/). 
According to the docs, strncpy will copy up to N characters. But 
if a NUL character is reached before end of the string, then it 
zeroes the rest of the buffer. This means we can detect whether a 
0 was inside the string by checking the last byte copied.

So this really is a limitation of `tempCString`, and we can fix 
that. However, std.file could just have easily used `toStringz`.

In terms of correctness, you are passing in a parameter that has 
different properties than the one ultimately sent to the 
underlying call. It's always a fight between correctness and 
performance here. I always prefer (if it is available) an API 
that does not rely on zero termination, but when it is not 
available, it needs to be checked. Where it is checked is 
important.

I'm not sure that's the library's responsibility. In other words, 
if I pass in `rmdir("foo")`, then why should I pay the penalty of 
examining `"foo"` for malicious NUL bytes? The source is a 
literal, I can know whether it has them or not. In fact, there 
isn't even a way to pass in a `char*` to `rmdir`, which is 
unfortunate.

We have the capability of making an API that allows for all cases 
-- user input, static data, validated data, actual C strings, 
etc. We shouldn't make all these go through the same 
super-defensive gauntlet.

> And of course a library should not assert, nor exit nor ignore 
> the error, but make it handleable:
>
> ```
> #!/usr/bin/python
>
> def myfun (filename):
>    open (filename, 'w')
>
> try:
>    myfun ("a\0c")
> except TypeError:
>    print ("error occurred")
> #   raise
> ```

Python is not D. It cannot do any kind of type introspection, and 
char* just isn't a thing, so it has to always go through the same 
path. We don't want to make D as slow as python.

-Steve


More information about the Digitalmars-d mailing list