input validation

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Wed Mar 4 08:47:50 PST 2009


Sean Kelly wrote:
> Georg Wrede wrote:
>> The distinction is not whether you or others write stuff. It's about 
>> whether it is for debugging *only*, as opposed to general input 
>> validation.
> 
> So I guess the real question is whether a function is expected to 
> validate its parameters.  I'd argue that it isn't, but then I'm from a 
> C/C++ background.  For me, validation is a debugging tool, or at least 
> an optional feature for applications that want the added insurance.

Interesting. My policy is to favor validation whenever it doesn't impact 
performance. Imagine for example that strlen() validated its input for 
non-null. Would that show on the profiling chart of any C application? 
No, unless the application's core loop only called strlen() on a 
1-character string or so.

One simple case that clarifies the necessary tradeoff is binary search. 
That assumes the range to be searched is sorted. If you actually checked 
for that, it would render binary search useless as a linear search would 
be in fact faster. So you need to assume. One way to do so is in the 
documentation. You write in the docs that findSorted expects a sorted 
range. Another way is to encode this information in the type of the 
sorted range. But that's onerous as most of the time you have an array 
you just sorted, not a SortedArray value.

The approach I took with the new phobos is:

int[] haystack;
int[] needle;
...
auto pos1 = find(haystack, needle); // linear
sort(haystack);
auto pos2 = find(assumeSorted(haystack), needle);

The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) 
type without adding members or running extra code. It's there to clarify 
to everyone what's going on. And it's usable with other arguments or 
functions too, e.g.

auto pos3 = find(haystack, assumeSorted(needle));
setIntersection(assumeSorted(haystack), assumeSorted(needle));

Interestingly, assumeSorted can actually do checking without impacting 
the complexity of the search. In debug mode, it can arrange to run 
random isSorted tests every 1/N calls, where N is the average length of 
the incoming arrays, then its complexity impact is amortized constant.


Andrei



More information about the Digitalmars-d mailing list