D enters Tiobe top 20
Doc Andrew
x at x.com
Sat Nov 9 21:11:14 UTC 2019
On Saturday, 9 November 2019 at 17:16:40 UTC, Ola Fosheim Grøstad
wrote:
>
> Well, not necessarily, some C++ compilers support it. The
> problem is that you probably don't want to introduce too many
> auxiliary facts, as that would just slow down compilation with
> no gain (and perhaps even confuse a naive optimizer). Anyway,
> you should be able to add facts to LLVM by compiling a boolean
> expression and then assuming it to hold.
>
> But maybe the language syntax should allow you to select which
> propositions you want to propagate to the optimizer. Not sure
> if LLVM can express quantifiers though (forall/exists).
>
I'm pretty ignorant of how all that would all work, is there some
sort of a standard way to give optimizers "hints" based on what
the solver can show? I don't want to over-sell the potential
performance gains. (Like Rust fans talking about how
_theoretically_, Rust is much faster than C because of the lack
of pointer aliasing, but the benchmarks just don't bear it out.
(yet))
>
> But I guess one will have to distinguish the two cases somehow.
> Which is basically why I think trying to improve basic
> libraries with assume-statements and manual proofs is the best
> starting point. Then you get an idea for how it can affect
> optimization.
Oof. You're gonna have a hard time finding volunteers for that
one. I think that's how projects like SeL4 do it though,
translating C into HOL and then adding verification conditions
manually. I want to say the SeL4 proofs are something like 10x
the number of lines as the code itself. Working manual proofs can
be kind of fun, like a puzzle, but I've read about some negative
case studies where manually verified code had some incorrect
assumptions in the proofs, or proved things that ended up being
orthogonal to the desired outcome. It's murky.
>
> I will certainly read and comment on it, if you want to write
> one. You could start with specc'ing out a list of requirements?
>
I'll take a look at the DIP author guidelines and start putting
something together. I think it's worth the community's
consideration if we can do this in a way that minimizes impact.
As a long time mostly-lurker and D user, I don't have any
credibility with compiler changes, so I don't want to be one of
those guys who jumps in with a big DIP that tries to make Walter
and all the other contributors jump through their ass.
I was toying with the idea of 2 or more parts: start with a
single DIP for "D Formal Verification Support" that maybe just
describes which UDAs should be agreed upon for external proof
tools to use, the behavior that should be expected when using
contracts in those tools, and maybe a small set of changes to the
language spec:
- Universal quantification: The low-impact way would be to set
aside an attribute @forall or @foreach, then figure out what
tuple elements you'd need to generate a VC from them and how
you'd tie them to a function, module, variable, etc.
(Longer-term, consider a change to the language to create a
foreach expression or add a new expression "forall" (or
something). I would anticipate a lot of weeping and gnashing of
teeth.)
- Existential quantification: I think you can do this in a
low-impact way is to set aside an attribute @exists or @forsome
and sort out the UDA tuple spec that provers would use. Longer
term, add an "exists" or "forsome" expression.
- Loop invariants: Allow invariant{} blocks inside loop
statements in addition to classes and structs. This is the only
part of the DIP that I can foresee requiring a language change; I
don't know any way of attaching a UDA to a statement. I _think_
this would be the only blocker.
LATER ON: if there's interest and any promise, then we could look
at making parts of Phobos verified, adding compiler support for
provers, etc. in follow-on DIPs. A very powerful addition would
be user-defined scalar types with their own invariants, but this
is a BIG language change.
(Even better would be strong subtypes, refinement types, or
first-order types, but you can simulate them well enough with
contracts in an invariant clause).
A smaller language change would be to give invariant a parameter
with a hypothetical or ghost variable and allow it on aliases:
alias uint ShoeSize;
invariant(ShoeSize T){
assert(T > 0);
assert(T < 24);
}
void buyBiggerShoes(ShoeSize s){
Shoe t;
t.size = s + 1; //Consider this line
//Invariant asserts get run if in Debug mode, so hopefully during
// unit-testing you tried it with s == 23.
//During proof time, this would warn with something like:
// "Cannot prove t.size < 24 (example: when s == 23)
// Possible solution: add s to list of preconditions for
buyNewSchoolShoes"
// (this is almost exactly what SPARK gives)
sendOrder(t);
}
Built-in types would have implied invariants, at least as far as
proofs are concerned:
invariant(int i){
assert(i <= int.max);
assert(i >= int.min);
}
So it might be that this initial DIP could just focus on beefing
up invariant{}?
I'd welcome any more-informed thoughts on how to proceed and go
through the DIP process, if there's interest!
>
> I actually meant source code only. You distribute the source
> code, but with certified "requires"/"assume"/"ensures"
> statements that is bound to the source code using cryptographic
> hashing and an online index (so that it can be checked or
> revoked).
>
> I guess one issue is that it has to be a standalone library
> (e.g. only depend on a specific version of the standard
> library) for this to work. Otherwise you would have to tie it
> to a specific certified version of all the libraries you depend
> on.
>
> But... I don't actually think that is a big deal, you typically
> would pre-verify basic libraries. You could use a simple
> concatenation for the hash with sorted filenames and some
> sentinel character sequence for separating files, e.g.:
> filename1, sep, file1, sep, filename2, sep, file2, sep, sep.
>
>
> I am actually not sure how this could be done well. Because you
> could have array accesses within a conditional expression.
Flow analysis can sort that out, checking both branches of the
conditional and making sure that contracts hold in every case.
>
> Maybe you'd actually have to annotate the indexing with some
> new syntax to tell the compiler that the access does not need
> bounds checks as it has been proven already to hold... So I
> guess this is an area that needs to be fleshed out with many
> examples.
>
> Anyway, it seems reasonable that a tool would take source code,
> verify it and emit an annotated version of the source code.
> Then less capable compilers could use the verified results.
>
> Most users are not going to write verifiable code themselves.
>
Ideally, it will be easy to run the proof tool on existing code
as a sort of linter in addition to unit tests and other static
analysis. The first time you see warnings like "Cannot prove abc
is less than def (example: def = 255)" because you didn't
consider edge cases, it opens your eyes to how proofs can really
save you from a lot of testing and debugging.
The goal is that a user doesn't have to do anything different
than normal to write verifiable code. It's all verifiable, but
it's up to the user how deep they want to go down the rabbit hole
to prove functional correctness by adding a ton of contracts.
> I think so, if they are proven to hold by a tool/compiler then
> they hold. Dataflow analysis is sufficient, when it succeeds,
> but will fail in many cases at it is a weak approach. However,
> it is also much faster than a proper solver. Basically just an
> optimization for performance reasons.
>
> But you also want to use assume manually sometimes when
> describing hardware registers (e.g. describing that bit 8-32
> always are zero), or when something has been proven by hand and
> has a significant performance impact.
>
One kind of neat thing that SPARK has is the ability to break
volatile down into sub-categories like "Effective_Reads", where
reading a value takes something off a UART, or
"Effective_Writes", where writing to a hardware register has some
effect on the system, and "Async_Readers/Writers" where some
hardware will be accessing memory behind the scenes... now how
that affects the state of a proof, I'm not totally sure.
>
> Yes, this is where different goals may lead to different
> designs... so the challenge is to come up with something that
> can both support correctness and optimization without being
> extremely complicated in terms of syntax.
>
I'm not super smart on the optimization side of the house, my
focus would be on correctness first and then let the optimization
wizards extract what useful information they can from that.
>
> Sure, you could also just define a dummy "ensures", "requires",
> whatever… function that takes a boolean as a parameter. With an
> empty body, or an assert.
>
I'll have to give some thought to this.
>
> There are papers on proof-carrying-assembly code...
>
> Although the basic idea behind certification would be that you
> get to use the postconditions and assumes without having to run
> costly verification on your own machine.
>
> You might even see a commercial service with very strong
> provers and effective hardware.
>
I think you'll see SaaS offerings like this in 10 years, kind of
like big ML farms now. Proofs on big code bases can take a long
time, definitely something you would want as part of an overnight
build in a CI pipeline. Single file proofs can go pretty quick
though, depending on how deep you let them search.
>
>
> Perhaps Why3ML is a good choice, but I would've started with
> Boogie.
I hadn't heard of Boogie before, but it looks promising - It
doesn't feel quite as mature as Why3, but the syntax looks a lot
friendlier, for sure.
So, having done no DMD compiler hacking myself, is the state of
the "DMD as a library" mature enough now to where somebody could
import dmd.compiler, get an AST and start figuring out VCs?
A DIP without some sort of prototype tooling, even if primitive,
is going to be pretty toothless, IMO.
-Doc
More information about the Digitalmars-d
mailing list