D enters Tiobe top 20

Sat Nov 9 14:49:32 UTC 2019

On Saturday, 9 November 2019 at 11:19:08 UTC, Ola Fosheim Grøstad 
wrote:
>
> PREREQUISITE:
>
> Add something like «assume» to the language and make the 
> optimizer add those to its fact database. For LLVM this should 
> be trivial.

As a stop-gap we might be able to use an
@assume UDA, but I do think "assume" could be added as a reserved 
keyword in the near future in anticipation...

>
> Then add «assume» clauses to the standard library that are 
> manually proven correct and also add an option to convert 
> assume to assert for additional unit-testing.

I like the idea of having a compiler flag to treat assumes as 
asserts, maybe even in release mode?

> This allows you to get started working on the optimizer and 
> figure out how to make good use of the extra facts.

That's probably a much longer-term goal, but just being able to 
move normal error-checking into contracts and depending on them 
in release code is a big win IMO.

> Also, optimizing the standard library affects all compiled 
> programs... so even small gains might be worthwhile.
>
>
> STAGE 2:
>
> Add ensure-statements to the language. Those will be converted 
> to assume-statements if you throw a switch, but display a 
> warning that all ensure-statements have to be proven correct by 
> an external verifier before compilation. Thus the entire burden 
> has been moved to an external tool.
>

This might be more difficult to enforce without some tight 
compiler integration. An attribute like @ensured or @certified 
could be the "gold standard" of safe D code, which would require 
a proof of no run-time errors and all contracts. Without the 
proof, the compiler could throw an error for functions with this 
attribute.

Or, just treat it as a UDA and let the proof be a separate thing 
and let the user sort it all out.

SPARK does the latter, but I like the idea of better compiler 
integration and support. We'd probably need a DIP to hash this 
out.

> This would encourage third parties to work on tools to 
> translate D code into verifiable code to be consumed by 
> standard verification tooling.
>
> Extension: create an online certification service that allows 
> library code to carry a certificate that the library code has 
> been verified. So that all their ensure statements can be 
> turned into assume without throwing a switch. Certificates can 
> be revoked (in case you find a bug in the verification process).
>

This might tie in with some of the work going on with 
reproducible builds, but I love the idea of certs for known good 
object files. Could you use a Merkle tree built-up by the 
compiler? So the source file hashes are baked into the object 
file (maybe just have a special symbol), then executables would 
contain a hash of all the object hashes. This is sort of an 
ancillary project to FV but if we are talking high-integrity 
software, it might be an easy win and a cool feather in D's cap.

>
> STAGE 3:
>
> Then break all dynamic checks out as asserts by transforming 
> the code. So that all bounds checks turn up as asserts. Break 
> composite asserts into individual asserts.

Using something like the @certified (or @ensured, or @verified, 
or whatever) attribute, we could just turn off run-time checks in 
these functions if a proof exists, could be as simple as a 
source.d.proof file with the function name (and hash?) in it.

>
> Verifying individual functions:
> step 1.1: use data-flow to convert as many asserts as possible 
> into assumes

I'm not sure about this one. Usually assumes are big red flags, 
they're the FV version of unsafe - "trust me, I know what I'm 
doing", that sort of thing. I might have misunderstood your 
intent here though.

> step 1.2: use an external verifier to prove others if possible
>
> If those fail, try to do the verification at the call site:
> step 2.1: create a unique copy of the function for the call site
> step 2.2: use dataflow to bring additional assumed 
> preconditions into the called function
> step 2.3: go to step 1.1

This is how SPARK works, even if you can't prove a (normal Ada) 
function body, because of inline asm, or pointer arithmetic or 
something, you can still specify pre and post-contracts on the 
interface. I think there's a little bit of a @trusted feel to it, 
since the post-conditions on a non-SPARK function become sort of 
an assumption that the function does what it claims, given the 
preconditions hold. I think SPARK may leave these as run-time 
checks (normal Ada behavior, normal D _Debug_ behavior), which 
you'd have to do to have any sort of confidence in your proof.

This is where things get a little messy. Verified functions 
calling non-verified functions with post-condition checks would 
need these checks enabled even in release mode, I would think.

The cool thing is that D _today_ has all the syntax it needs to 
start working on tooling for this with no language changes.

Short-term, it may be of some benefit to reserve a couple 
keywords like assume or ensure, maybe set aside some attributes 
like @verified or @certified or @ensured.

Medium term, allowing invariant blocks to be associated with 
specific loops would probably be needed, maybe some extra 
compiler support for ensuring the presence of proofs.

Longer term, I really like the idea of a "trusted language base" 
with certificates and object hashes that a maintainer can use to 
ensure complete accounting of everything that went into a 
particular build. I'm not as concerned as some are with proven 
builds and linking, but there may be high-integrity requirements 
in the future where this sort of thing is required. I'm not sure 
if it's a thing yet, maybe NASA does it, but I'd be surprised if 
anybody else is yet.

All it takes is a high-profile case of somebody's cloud CI 
pipeline being compromised to use a malicious library, and 
everybody is going to be jumping on this sort of thing.

It may be worth a DIP or a DConf session or something to hash out 
some of the long-term vision, but I think it's possible to start 
working on tools now.

I'm not totally familiar with the landscape, but generating 
Why3ML from D seems like it would be one good way to start. You 
get access to a lot of solvers for free, it seems like it's 
well-documented and fairly easy to use.

-Doc