Thoughts on memory safety

Fri Apr 17 17:05:39 UTC 2020

I was writing response to another thread but it took too long to 
write it and its kind of too late for it so im posting it as new 
thread.
--------------------------------------

Over the years many organizations have revealed what were the 
causes of their security vulnerabilities. If we take very 
unscientific average of the reported causes we get that ~75% of 
all vulnerabilities were because of memory management and that 
~50% of all vulnerabilities were because of buffer overflows. 
Because in D arrays are pointer + length pairs half of all 
vulnerabilities would not have happened if applications were 
written in D for this one reason alone. Based on these numbers if 
we go into speculation and we remove buffer overflows from our 
calculation we get that around half of vulnerabilities would be 
caused because of memory management if applications were written 
in D and the rest are logic bugs.

https://youtu.be/rQWjF8NvqAU?t=462
In this video Microsoft shows that ~5-10% of patched 
vulnerabilities were because of uninitialized memory use. D also 
solves this problem by having default initialization values so if 
we remove that from our calculation/speculation we get that ~52% 
of vulnerabilities would have been (if written in D) due to logic 
bugs and the biggest reason for the rest is use after free. So 
this is the data and it needs to be kept in mind if we make 
changes to the language. Ofcourse this is not real data and only 
speculation and if we could go back in time and force programmers 
to write in D instead and then do A B comparison we might get 
different result but for not this is the best we have.
--------------------------------------

The other thing we need to keep in mind is that no matter what 
system we will come up with we would not achieve 100% memory 
safety for all applications and that we need to perform cost 
benefit analysis to choose appropriate system for D language. At 
this point I dont see any such analysis being done and see all of 
the effort for improving memory safety akin to optimizing without 
running a profiler. For example yes buffer overflows are a big 
problem for C but not for D therefore it should not influence the 
future work on safety.

One system that has no cost in terms of writing code would be to 
improve current @safe system. If I understand it correctly 
current system looks at a block of code and if it finds it doing 
something from a list of bad things it fails to mark it as @safe. 
Free() is in a lift of bad things so code
{
     int* tmp = malloc(int.sizeof);
     scope(exit) free(tmp);
}
Is not marked safe even though we can clearly see that its safe. 
If compiler did more tests when it encountered free() instead of 
just marking it as unsafe like test if pointer is not passed to 
function and if it doesn't escape current scope is could mark 
code above as save. In compiler code this kind of system could be 
expressed as if you see free() mark it unsafe unless it satisfies 
these conditions. And we could add to those conditions more stuff 
as the time passes.

I believe with pointer tracking that is implemented in compiler 
already and with improvements that are in DIPs already we could 
make a system that could mark a lot of simple allocations and 
deallocations as safe code without introducing any language 
changes.

We could also use that system to turn some GC allocations into 
malloc and free or stack allocations but this post is getting too 
long already.

Destroy.