Checking function parameters in Phobos

Wed Nov 20 02:38:42 PST 2013

On Tuesday, November 19, 2013 16:01:00 Andrei Alexandrescu wrote:
> Please chime in with ideas!

In general, I favor using defensive programming in library APIs and using 
enforce to validate the input to functions. Doing so makes it much harder to 
misuse the library and makes it much less likely that programs will run into 
weird and/or undefined behavior or other types of bugs. I then favor using DbC 
within a library or application for its own code and asserting that input is 
valid in those cases, because in that case, the caller is essentially part of 
the same code that's doing the asserting and is maintained by the same people.

The problem with that is of course that there are cases where performance 
degrades when you use defensive programming and always check input - 
especially when the caller can know that the data is valid without having to 
check it first. So, having a way to use an API that doesn't involve it always 
defensively checking its input can be useful for the sake of efficiency.

Unfortunately, I don't think that it scales at all to take the approach that 
Walter has suggested of having the API normally assert on input and provide 
helper functions which the caller can use to validate input when they deem 
appropriate. That has the advantage of giving the caller control over what is 
and isn't checked and avoiding unnecessary checks, but it also makes it much 
easier to misuse the API, and I would expect the average programmer to skip 
the checks in most cases. It very quickly becomes like using error codes 
instead of exceptions, except that in this case, instead of an error code 
being ignored, the data's validity wouldn't have even been checked in the first 
place, resulting in the function being called doing who-knows-what. And the 
resulting bugs could be very obvious, or they could be insidiously hard to 
detect.

So, if we can find a way to default to checking validity and throwing on bad 
input but still provide a way for the caller to avoid the checks when 
appropriate, I think that that would be ideal. That way, we default to 
correctness and user-friendliness (in that the API is harder to silently use 
incorrectly that way), but we still provide a more performant route for those 
who know what they're doing and are willing to take the time to make sure that 
they are sure that they truly do know how to use the API correctly and take 
responsibility for ensuring that they don't feed bad input to the API.

Now, how we do that, I don't know. In some cases, creating a wrapper type 
would solve the problem (e.g. some kind of wrapper for strings which 
guaranteed UTF-correctness). But I don't think that it scales to use wrapper 
types for all such situations. One alternative is to essentially duplicate a 
lot of functions with one function validating the input for you and throwing 
on failure, and the other asserting that the input is valid. But that could 
result in a lot of code duplication, which isn't terribly desirable either.

The assumeSorted or FracSec.assumeValid solutions seem to go either with a 
wrapper type or with essentially being a second function which does the same 
thing but without the validation depending on the types involved and what the 
function is doing.

Another alternative would be to provide an argument (probably a template 
argument, though it could be a function argument if that makes more sense) 
which told the function whether it should assert or enforce on its input. That 
would at least localize the code duplication, but again, that could get a bit 
verbose, and I do like how assumeXYZ makes it abundantly clear that the caller 
is taking responsibility for the correctness in that case.

And in some situations, I think that it would clearly be the case that it 
wouldn't make any sense to do anything else other than enforce on the input 
(e.g. string parsing functions have a tendency to have to do almos the same 
work in the validation function as the actual parsing function, making it 
almost pointless to have a separate validation function).

So, I think that what we end up doing is definitely going to depend on what the 
code in question is for and what it's doing, but I agree that it would be 
valuable to come up with some common idioms for handling validation and error 
checking, and assumeXYZ would be one such idiom and one which documents things 
nicely when it can be used.

Still, the most important point that I'd like to make is that I think we 
should lean towards validating input with enforce by default and then provide 
alternative means to avoid that validation rather than using assertions and 
DbC by default, because leaving the validation up to the caller in release and 
asserting in debug is going to lead to _far_ more bugs in code using Phobos, 
particularly when the result isn't immediately and obviously wrong when bad 
input is given. And the fact that by default, the assertions in Phobos won't 
be hit in calling code unless the Phobos function is templatized (because 
Phobos will have been compiled in release) makes using assertions that much 
worse.

But I'll definitely have to think about idioms that we could use to do separate 
validation where appropriate and yet validate arguments via enforce by 
default.

- Jonathan M Davis