enum pre-conditions

bearophile bearophileHUGS at lycos.com
Tue Dec 10 07:09:17 PST 2013


(This post is a second try of the ideas I discussed in this ER:
http://d.puremagic.com/issues/show_bug.cgi?id=5906 )

It is sometimes handy and useful to be able to give structs some 
of the capabilities and qualities of some build-in values, this 
makes structs more powerful, handy and allow to better implement 
built-in-like features with library code.

The bounds of built-in arrays are verified at compile time where 
possible:


void main() {
     int[5] a;
     immutable idx = 5;
     a[idx] = 1;
}

test.d(4): Error: array index 5 is out of bounds a[0 .. 5]

- - - - - - - - - - - - - - - - - - -

Where the values are not known at compile-time they are ignored, 
and the problem is found at run-time (here foo() could be easily 
run at compile-time, but the compiler does not attempt to run it 
a compile-time):

size_t foo() { return 5; }
void main(string[] args) {
     int[5] a;
     a[5 + args.length] = 1;
     a[foo()] = 2;
}


core.exception.RangeError at test(4): Range violation

- - - - - - - - - - - - - - - - - - -

If I define a library type, using a struct, that mimics/wraps a 
fixed size array, the compile-time test capability of built-in 
arrays is lost:


struct IntArray(size_t n) {
     int[n] data;
     ref int opIndex(size_t i) {
         return data[i];
     }
}
void main() {
     IntArray!5 a;
     a[5] = 1;
}


core.exception.RangeError at test(4): Range violation

- - - - - - - - - - - - - - - - - - -

Since some time I'd like to give this basic built-ins feature to 
user-defined types too. (I have seen that Ada2012 has 
"Static_Predicate" and "Dynamic_Predicate" that are related to 
this feature).

A bare-bones library implementation of a ranged value (this 
interval is closed on the right, to make it more easy to include 
the last representable values of ints, ubytes, etc):


import std.stdio, std.typecons;

struct RangedValue(T, alias inf, alias sup) {
     T _data;
     alias access this;

     this(T x)
     in {
         assert(_data >= inf && _data <= sup);
     } body {
         _data = x;
     }

     invariant() { /*...*/ }

     @property T access() { return _data; }

     @property T access(T x)
     in {
         assert(_data >= inf && _data <= sup);
     } body {
         return _data = x;
     }
}

void main() {
     alias Month = RangedValue!(int, 1, 12);
     auto m = Month(12);
     m = 13; // line 29, core.exception.AssertError at test(11)
}

- - - - - - - - - - - - - - - - - - -

An idea is to introduce compile-time preconditions, here I have 
used the syntax:
"enum in(arg1, arg2, ...)":


     this(T x)
     enum in(x) {
         assert(_data >= inf && _data <= sup);
     } in {
         assert(_data >= inf && _data <= sup);
     } body {
         _data = x;
     }

- - - - - - - - - - - - - - - - - - -

Some of the rules:
- A function with one or more arguments is allowed to have a 
(regular, dynamic) precondition, an enum ("static") precondition, 
or both.
- An enum pre-condition is run only at compile-time, or never.
- An enum pre-condition lists one or more arguments that will be 
"verified". If all the arguments listed inside the "in()" are 
literals or explicit compile-time constants, then the enum 
precondition is run, otherwise it is ignored. No attempts are 
done to run the enum pre-condition in all other cases.
- This analysis of "literalness" or "near-literalness" is 
conservative. This means a D compiler is free to assume a value 
is not a literal and to ignore the enum pre-condition.
- enum pre-conditions are run at the calling point. This means 
the code they contain is never present inside the binary, it 
never causes code slowdown or binary bloat. And errors and 
violations are reported at the calling point, so in the code 
above the compile-time error message of the violation shows the 
line number 29 instead of 11. So enum preconditions are a bit 
like default values of functions, they are something relative to 
the calling point and not to the function itself.
- Like CTFE enum preconditions should be pure (but they can call 
__ctWrite, so purity is not fully enforced).
- The D programmer that uses enum preconditions has to keep in 
mind that the run-rime correctness of a program can't rely on an 
enum precondition. An enum precondition is just meant to catch at 
compile time some violations and bugs that otherwise will be 
caught at run-time. So typically there is some code duplication 
between the precondition and the enum precondition. But this in 
not terrible.
- D programmers that use enum preconditions should remember that 
enum preconditions must contain very light code, to keep 
compilation time low. (Given the performance of "interpreters" 
like JavaScript V8 or LuaJIT, future D compilers will run 
compile-time code much faster even if they will keep not using a 
JIT, so this problem will become much smaller).
- Future smart IDEs for D are allowed to run an enum precondition 
while you type code, to catch and underline some bugs in red 
color even before compile-time.

[Note: to speed up compilation it could be added a compiler 
switch that makes the D compiler ignore all enum preconditions. 
But probably this is not necessary.]

Data in external files loaded at compile-time with 
mixin(import("data.txt")) can be verified with an enum 
precondition (currently you can call a test function at 
compile-time and assign its result to an enum).

[I am not sure but I think enum preconditions of class members 
follow inheritance rules similar to the ones of the normal 
dynamic pre-conditions.]

- - - - - - - - - - - - - - - - - - -

Alternative syntax, one more usage of the "static" keyword:

"static in(arg1, arg2, ...)"

- - - - - - - - - - - - - - - - - - -

Some use cases for enum preconditions:

1-2) MyArray() and RangedValue!(int, 1, 12): enum preconditions 
can catch at compile-time some bugs, they make the usage of such 
simple data structures more similar to built-ins giving 
compile-time errors like "Error: array index 5 is out of bounds 
a[0 .. 5]".

- - - - - - - - -

3) EnumSubset() is an idea to define enumeration subsets in 
library code, that is sometimes useful to me. enum preconditions 
allow to catch some wrong assignments at compile time (I think 
this is an use case of the "Static_Predicate" of Ada2012):

Given a struct that represents:
enum Foo { A, B, C }
alias Foo2 = EnumSubset!(Foo.A, Foo.B);

enum pre-conditions verify:

Foo f = Foo.A; // OK
Foo f = Foo.C; // Compile-time error.

- - - - - - - - -

4) enum array literals, for table games, etc.:

Sometimes I like to use enums of chars:

enum Cell : char { A='A', B='B', C='C' }

Then later in the program you can use a with() to catch mistakes 
in literals:

with (Cell)
     foo([A,B,A,A,B,B,A,A,D], 3);

But putting them in a string is more handy when such tables 
become larger (Ada language allows something like this, and 
verifies at compile-time that such array of enumerated values is 
correct!):

foo(ToEnumArray!"ABAABBAAD", 3);

ToEnumArray veryfies its string input is correct and turns it 
into an array, at compile-time (std.conv.to can't be used 
directly here because it manages strings differently when you 
want to convert them to enums).

With an enum pre-condition I can simplify the code (this foo will 
still need to convert the input string to an array of enums, 
perhaps even using just a cast):

foo("ABAABBAAD", 3);

- - - - - - - - -

5) In high-reliability D code you want to move as many tests of 
the input data from run-time to compile-time. You want to 
minimize the number of run-time asserts you need in the code. I 
think that D could (someday) be used to replace some of the 
usages of the Ada language for medium-integrity programs, like in 
some of the control units used in cars. In such programs you call 
functions with various kinds of literals. Verifying such literals 
at compile-time could improve the run-time integrity of the 
program.

Here you can see a small D program that accepts complex literal 
data:
http://rosettacode.org/wiki/Universal_Turing_machine#Nearly_Strongly_Typed_Version

     enum States2 : ubyte { A, B, C, H }
     alias Symbols2 = Symbols1;
     alias M2 = UTM!(States2, Symbols2);
     M2.TuringMachine tm2;
     with (tm2) with (States2) with (Symbols2) with (M2.Direction) 
{
         alias R = M2.Rule;
         initialState = A;
         rules = [A: [s0: R(s1, right, B), s1: R(s1, left,  C)],
                  B: [s0: R(s1, left,  A), s1: R(s1, right, B)],
                  C: [s0: R(s1, left,  B), s1: R(s1, stay,  H)]];
         symbolMap = ["0", "1"];
     }


The Ada entry of this program is able to verify its whole input 
data at compile-time:
http://rosettacode.org/wiki/Universal_Turing_machine#Ada


type States is (A, B, C, Stop);
    type Symbols is range 0 .. 1;
    package UTM is new Turing(States, Symbols); use UTM;

    Map: Symbol_Map := (1 => '1', 0 => '0');

    Rules: Rules_Type :=
      (A => (0 => (New_State => B, Move_To => Right, New_Symbol => 
1),
             1 => (New_State => C, Move_To => Left,  New_Symbol => 
1)),
       B => (0 => (New_State => A, Move_To => Left,  New_Symbol => 
1),
             1 => (New_State => B, Move_To => Right, New_Symbol => 
1)),
       C => (0 => (New_State => B, Move_To => Left,  New_Symbol => 
1),
             1 => (New_State => Stop, Move_To => Stay, New_Symbol 
=> 1)));

    Tape:  Tape_Type := To_Tape("", Map);


enum pre-conditions could be used in D (with some changes in the 
D entry) to verify all the input at compile-time (again, 
currently you can write compile-time functions in D to perform 
similar tests, but the code becomes less nice and less natural, 
you need two functions, etc).

- - - - - - - - -

6) writef("%d", 5.5), format("%d", 5.5):

void writef(Char, A...)(in Char[] fmt, A args) ...
enum in (fmt) {
     // Verify fmt correctness
} ...


This stops most writeflns in my D code giving formatting string 
errors at run-time. And this causes no binary bloat, unlike this 
template function call (that also can't be used with a run-time 
formatting string, unlike the writef with an enum precondition, 
needing only one function for both cases, and allowing writef to 
be retrofitted with no changes in its API):

cWritef!"%d"(5.5);

(Rust language uses a different strategy to verify those 
formatting strings at compile-time.)

- - - - - - - - -

I am sure more usage cases could be found.

The ideas discussed here are also related to the 
__builtin_constant_p() of GCC:
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

DMD is able to test some array bounds at compile-time so a 
related feature is already present inside the compiler. enum 
preconditions pull this compiler feature out for some user-code 
usages, just like __traits often does.

Bye,
bearophile


More information about the Digitalmars-d mailing list