safety model in D

Wed Nov 4 14:39:15 PST 2009

On Wed, 04 Nov 2009 14:24:47 -0600, Andrei Alexandrescu wrote:

>> But efficiency is also important, and if you want it, why not move the
>> code subjected to bounds checks to trusted/system module - I hope they
>> are not checked for bounds in release mode. Moving parts of the code to
>> trusted modules is more semantically describing, compared to crude tool
>> of ad-hoc compiler switch.
> 
> Well it's not as simple as that. Trusted code is not unchecked code -
> it's code that may drop redundant checks here and there, leaving code
> correct, even though the compiler cannot prove it. So no, there's no
> complete removal of bounds checking. But a trusted module is allowed to
> replace this:
> 
> foreach (i; 0 .. a.length) ++a[i];
> 
> with
> 
> foreach (i; 0 .. a.length) ++a.ptr[i];
> 
> The latter effectively escapes checks because it uses unchecked pointer
> arithmetic. The code is still correct, but this time it's the human
> vouching for it, not the compiler.
> 
>> One thing I'm concerned with, whether there is compiler switch or not,
>> is that module numbers will increase, as you will probably want to
>> split some modules in two, because some part may be safe, and some not.
>> I'm wondering why the safety is not discussed on function level,
>> similarly as pure and nothrow currently exists. I'm not sure this would
>> be good, just wondering. Was this topic already discussed?
> 
> This is a relatively new topics, and you pointed out some legit kinks.
> One possibility I discussed with Walter is to have version(safe) vs.
> version(system) or so. That would allow a module to expose different
> interfaces depending on the command line switches.
> 
> 
> Andrei

Sorry for the long post, but it should explain how safety specification 
should work (and how not).

Consider these 3 ways of specifying memory safety:

safety specification at module level (M)
safety specification at function level (F)
safety specification using version switching (V)

I see a very big difference between these things:
while the M and F are "interface" specification, V is implementation 
detail.

This difference applies only to library/module users, it causes no 
difference for library/module writer - he must always decide if he writes 
safe, unsafe or trusted code

Imagine scenario with M safety for library user:
Library user wants to make memory safe application. He marks his main 
module as safe, and can be sure (and/or trust), that his application is 
safe from this point on; because safety is explicit in "interface" he 
cannot import and use unsafe code.

scenario with V safety:
Library user wants to make memory safe application. He can import any 
module. He can use -safe switch on compiler so compiler will use safe 
version of code - if available! User can be never sure if his application 
is safe or not. Safety is implementation detail!

For this reason, I think V safety is very unsuitable option. Absolutely 
useless.

But there are also problems with M safety.
Imagine module for string manipulation with 10 independent functions. The 
module is marked safe. Library writer then decides add another function, 
which is unsafe. He can now do following:

Option 1: He can mark the module trusted, and implement the function in 
unsafe way. Compatibility with safe clients using this module will 
remain. Bad thing: there are 10 provably safe functions, which are not 
checked by compiler. Also the trust level of module is lower in eyes of 
user. Library may end us with all modules as trusted (no safe).

Option 2: He will implement this in separate unsafe module. This has 
negative impact on library structure.

Option 3: He will implement this in separate trusted module and publicly 
import this trusted module in original safe module. 

The thirds options is transparent for module user, and probably the best 
solution, but I have a feeling that many existing modules will end having 
their unsafe twin. I see this pattern to emerge:

module(safe) std.string
module(trusted) std.string_trusted // do not import, already exposed by 
std.string

Therefore I propose to use F safety. 

It is in fact the same beast as pure and nothrow - they also guarantee 
some kind of safety, and they are also part of function interface 
(signature). Compiler also needs to perform stricter check as normally.

Just imagine marking entire module pure or nothrow. If certainly 
possible, is it practical? You would find yourself splitting your 
functions into separate modules with specific check, or not using pure 
and nothrow entirely.

This way, if you mark your main function safe, you can be sure(and/or 
trust) your application is safe. More usually - you can use safe only for 
some functions and this requirement will propagate to all called 
functions, the same way as for pure or nothrow.

One think to figure out remains how to turn of runtime bounds checking 
for trusted code (and probably safe too). This is legitimate requirement, 
because probably all the standard library will be safe or trusted, and 
users which are not concerned with safety and want speed, need to have 
this compiler switch.