templatizing parameters and induction (was Re: Writing Bug-Free C/D Code)

Thu Mar 22 22:32:27 PDT 2007

Kevin Bealer wrote:
> Dan wrote:
> ...
>> What should then happen is this:  The compiler should recognize that we
>  > only accept values such that x > y.  Knowing that, it should then verify
>> that in all cases where this function can be called, that the input to 
>> the
>  > function must have this condition.  In cases where this *can* fail, it
>> should raise an error - such as variables that are affected by user input
>  > or outside parameters that aren't already bounds checked or adjusted...
>>
>> Theoretically, one could take any given int (between int.min and int.max)
>  > and adjust it such that x > y will be true for all cases.  This can be
>  > proven and verified by a proof by induction.
>>
>> If you don't isolate the in/out tests from the body, a proof by induction
>> couldn't know what needs to be proved to prove the code safe.
> 
> I think there are very few cases where this kind of induction could 
> reasonably be done, unless x and y are literals.  I suspect that in most 
> cases where the compiler had enough info to do this kind of induction 
> checking, the code for the function could nearly as easily be "constant 
> folded" and optimized away entirely.  Prove me wrong, but I suspect that 
> this kind of induction is too hard (and in the cases where it works, too 
> slow) to be practical.
> 
> If x and y are literals, though, you can do something like this now:
> 
> Original source:
> 
> int foo(int x, int y, int z)
> {
>     assert( x > y ); // runtime
> ...
> }
> 
> int z = foo(10, 20, 30);
> 
> Checking at compile time:
> 
> int foo(int x, int y)(int z)
> {
>     static assert( x > y ); // compile time
> ...
> }
> 
> int z = foo!(10, 20)(30);
> 
> It's a very simple transformation --- take any compile-time known values 
> and make them template parameters.  In this case, the calling syntax 
> even looks nearly the same!

The need to cater for cases with partial statically-available
information is seriously taken into consideration. The general scenario
is called "partial evaluation": you call a function with a few
statically-known values and a few runtime values. The current
compile-time evaluation mechanism can't help with that. So a considered
addition to D is to accommodate static parameters (pushed by yours truly
and using a syntax suggested by Dave Held):

int foo(int x, int y, int z);
int foo(static int x, int y, int z); // x is known during compilation
int foo(int x, static int y, int z); // y is known during compilation
int foo(static int x, static int y, int z); // x and y are... you know

Seeing that, the compiler does a little trick - it transforms each
"static" parameter into a template. For example, the second overload
becomes under the covers:

int foo(int x)(int y, int z)
{
   ... you can use e.g. static if (x > 0) ...
}

Then the foo's will overload without you doing anything special:

final a = to!(int)(readln), b = to!(int)(readln);
foo(a, b, b);  // firsts overload
foo(1, a + b, b); // 2nd
foo(a + b, a - b, b);  // 3rd
foo(42, 7, b); // 4th

Such partial evaluation scenarios are very useful for a host of
functions (writef/readf are the first to come to mind). Consider a
simple example: the function:

uint match_one_of(char[] set, char[] candidate);

Returns the position in candidate up to which candidate's characters can
be found inside set. For example, find("0123456789", "23a4") returns 2.

We have quite a few different evaluation scenarios:

1. set and candidate aren't known til runtime. Then find will gotta do
whatever it has to do dynamically to get the task done.

2. set is known statically, and candidate is dynamic. Well, this is a
very interesting case. The function can decide for different strategies
depending on set's size. For small sizes, it just does a series of
comparisons. For moderate sizes, linear search. For large sizes, hash. Cool!

3. candidate is known statically, set is dynamic. Probably this is an
infrequent case. Anyhow, all duplicate characters in candidate can be
removed upon searching, thus again saving time.

4. Both are known statically. The case becomes again uninteresting as
D's constant folding can just invoke (1) during compilation. Done.

Andrei