D enters Tiobe top 20

Sat Nov 9 21:11:14 UTC 2019

On Saturday, 9 November 2019 at 17:16:40 UTC, Ola Fosheim Grøstad 
wrote:
>
> Well, not necessarily, some C++ compilers support it.  The 
> problem is that you probably don't want to introduce too many 
> auxiliary facts, as that would just slow down compilation with 
> no gain (and perhaps even confuse a naive optimizer). Anyway, 
> you should be able to add facts to LLVM by compiling a boolean 
> expression and then assuming it to hold.
>
> But maybe the language syntax should allow you to select which 
> propositions you want to propagate to the optimizer. Not sure 
> if LLVM can express quantifiers though (forall/exists).
>

I'm pretty ignorant of how all that would all work, is there some 
sort of a standard way to give optimizers "hints" based on what 
the solver can show? I don't want to over-sell the potential 
performance gains. (Like Rust fans talking about how 
_theoretically_, Rust is much faster than C because of the lack 
of pointer aliasing, but the benchmarks just don't bear it out. 
(yet))

>
> But I guess one will have to distinguish the two cases somehow. 
> Which is basically why I think trying to improve basic 
> libraries with assume-statements and manual proofs is the best 
> starting point. Then you get an idea for how it can affect 
> optimization.

Oof. You're gonna have a hard time finding volunteers for that 
one. I think that's how projects like SeL4 do it though, 
translating C into HOL and then adding verification conditions 
manually. I want to say the SeL4 proofs are something like 10x 
the number of lines as the code itself. Working manual proofs can 
be kind of fun, like a puzzle, but I've read about some negative 
case studies where manually verified code had some incorrect 
assumptions in the proofs, or proved things that ended up being 
orthogonal to the desired outcome. It's murky.

>
> I will certainly read and comment on it, if you want to write 
> one. You could start with specc'ing out a list of requirements?
>

I'll take a look at the DIP author guidelines and start putting 
something together. I think it's worth the community's 
consideration if we can do this in a way that minimizes impact. 
As a long time mostly-lurker and D user, I don't have any 
credibility with compiler changes, so I don't want to be one of 
those guys who jumps in with a big DIP that tries to make Walter 
and all the other contributors jump through their ass.

I was toying with the idea of 2 or more parts: start with a 
single DIP for "D Formal Verification Support" that maybe just 
describes which UDAs should be agreed upon for external proof 
tools to use, the behavior that should be expected when using 
contracts in those tools, and maybe a small set of changes to the 
language spec:

- Universal quantification: The low-impact way would be to set 
aside an attribute @forall or @foreach, then figure out what 
tuple elements you'd need to generate a VC from them and how 
you'd tie them to a function, module, variable, etc.

(Longer-term, consider a change to the language to create a 
foreach expression or add a new expression "forall" (or 
something). I would anticipate a lot of weeping and gnashing of 
teeth.)

- Existential quantification: I think you can do this in a 
low-impact way is to set aside an attribute @exists or @forsome 
and sort out the UDA tuple spec that provers would use. Longer 
term, add an "exists" or "forsome" expression.

- Loop invariants: Allow invariant{} blocks inside loop 
statements in addition to classes and structs. This is the only 
part of the DIP that I can foresee requiring a language change; I 
don't know any way of attaching a UDA to a statement. I _think_ 
this would be the only blocker.

LATER ON: if there's interest and any promise, then we could look 
at making parts of Phobos verified, adding compiler support for 
provers, etc. in follow-on DIPs. A very powerful addition would 
be user-defined scalar types with their own invariants, but this 
is a BIG language change.

(Even better would be strong subtypes, refinement types, or 
first-order types, but you can simulate them well enough with 
contracts in an invariant clause).

A smaller language change would be to give invariant a parameter 
with a hypothetical or ghost variable and allow it on aliases:

alias uint ShoeSize;
invariant(ShoeSize T){
   assert(T > 0);
   assert(T < 24);
}

void buyBiggerShoes(ShoeSize s){
   Shoe t;

   t.size = s + 1;   //Consider this line

//Invariant asserts get run if in Debug mode, so hopefully during
// unit-testing you tried it with s == 23.

//During proof time, this would warn with something like:
// "Cannot prove t.size < 24 (example: when s == 23)
// Possible solution: add s to list of preconditions for 
buyNewSchoolShoes"
// (this is almost exactly what SPARK gives)

   sendOrder(t);
}

Built-in types would have implied invariants, at least as far as 
proofs are concerned:
invariant(int i){
   assert(i <= int.max);
   assert(i >= int.min);
}

So it might be that this initial DIP could just focus on beefing 
up invariant{}?

I'd welcome any more-informed thoughts on how to proceed and go 
through the DIP process, if there's interest!

>
> I actually meant source code only. You distribute the source 
> code, but with certified "requires"/"assume"/"ensures" 
> statements that is bound to the source code using cryptographic 
> hashing and an online index (so that it can be checked or 
> revoked).
>
> I guess one issue is that it has to be a standalone library 
> (e.g. only depend on a specific version of the standard 
> library) for this to work. Otherwise you would have to tie it 
> to a specific certified version of all the libraries you depend 
> on.
>
> But... I don't actually think that is a big deal, you typically 
> would pre-verify basic libraries. You could use a simple 
> concatenation for the hash with sorted filenames and some 
> sentinel character sequence for separating files, e.g.:  
> filename1, sep, file1, sep, filename2, sep, file2, sep, sep.
>
>
> I am actually not sure how this could be done well. Because you 
> could have array accesses within a conditional expression.

Flow analysis can sort that out, checking both branches of the 
conditional and making sure that contracts hold in every case.

>
> Maybe you'd actually have to annotate the indexing with some 
> new syntax to tell the compiler that the access does not need 
> bounds checks as it has been proven already to hold... So I 
> guess this is an area that needs to be fleshed out with many 
> examples.
>
> Anyway, it seems reasonable that a tool would take source code, 
> verify it and emit an annotated version of the source code. 
> Then less capable compilers could use the verified results.
>
> Most users are not going to write verifiable code themselves.
>

Ideally, it will be easy to run the proof tool on existing code 
as a sort of linter in addition to unit tests and other static 
analysis. The first time you see warnings like "Cannot prove abc 
is less than def (example: def = 255)" because you didn't 
consider edge cases, it opens your eyes to how proofs can really 
save you from a lot of testing and debugging.

The goal is that a user doesn't have to do anything different 
than normal to write verifiable code. It's all verifiable, but 
it's up to the user how deep they want to go down the rabbit hole 
to prove functional correctness by adding a ton of contracts.

> I think so, if they are proven to hold by a tool/compiler then 
> they hold. Dataflow analysis is sufficient, when it succeeds, 
> but will fail in many cases at it is a weak approach. However, 
> it is also much faster than a proper solver. Basically just an 
> optimization for performance reasons.
>
> But you also want to use assume manually sometimes when 
> describing hardware registers (e.g. describing that bit 8-32 
> always are zero), or when something has been proven by hand and 
> has a significant performance impact.
>

One kind of neat thing that SPARK has is the ability to break 
volatile down into sub-categories like "Effective_Reads", where 
reading a value takes something off a UART, or 
"Effective_Writes", where writing to a hardware register has some 
effect on the system, and "Async_Readers/Writers" where some 
hardware will be accessing memory behind the scenes... now how 
that affects the state of a proof, I'm not totally sure.

>
> Yes, this is where different goals may lead to different 
> designs... so the challenge is to come up with something that 
> can both support correctness and optimization without being 
> extremely complicated in terms of syntax.
>

I'm not super smart on the optimization side of the house, my 
focus would be on correctness first and then let the optimization 
wizards extract what useful information they can from that.

>
> Sure, you could also just define a dummy "ensures", "requires", 
> whatever… function that takes a boolean as a parameter. With an 
> empty body, or an assert.
>

I'll have to give some thought to this.

>
> There are papers on proof-carrying-assembly code...
>
> Although the basic idea behind certification would be that you 
> get to use the postconditions and assumes without having to run 
> costly verification on your own machine.
>
> You might even see a commercial service with very strong 
> provers and effective hardware.
>

I think you'll see SaaS offerings like this in 10 years, kind of 
like big ML farms now. Proofs on big code bases can take a long 
time, definitely something you would want as part of an overnight 
build in a CI pipeline. Single file proofs can go pretty quick 
though, depending on how deep you let them search.

>
>
> Perhaps Why3ML is a good choice, but I would've started with 
> Boogie.

I hadn't heard of Boogie before, but it looks promising - It 
doesn't feel quite as mature as Why3, but the syntax looks a lot 
friendlier, for sure.

So, having done no DMD compiler hacking myself, is the state of 
the "DMD as a library" mature enough now to where somebody could 
import dmd.compiler, get an AST and start figuring out VCs?

A DIP without some sort of prototype tooling, even if primitive, 
is going to be pretty toothless, IMO.

-Doc