Benchmarking Crimes Meet Formal Verification
Published on: 2025-06-27 04:04:22
No, this isn’t about formally verifying benchmarking (BM) crimes. It’s about the use of misleading statistics in papers that apply formal methods (FM) to verify (prove correct) operating systems (OS) code – something that has been bugging me for a while.
Name Language Spec. Lang. Proof:Code seL4 C+Asm Isabelle 20:1 CertiKOS C+Asm Coq 15:1 SeKVM C+Asm Coq 7:1 Ironclad Dafny Dafny 5:1 XXX Rust Verus 10:1 YYY Rust Verus 7.5:1 Table 1: Language and (incomplete) proof-size statistics of a number of verified systems [Anonymous].
The Table 1 above shows a recent example from a peer-reviewed paper (it would be unfair to pillory the authors, given how widespread this flaw is, so I’m keeping it anonymous). It seems to show a significant improvement in verification efficiency since the original seL4 work, given that the lines of proof required to verify a certain amount of code have been reduced by factors of 3–4. Note that the systems in the upper half of the table use interactive (i.e. manu
... Read full article.