Looks very interesting! We have done a lot with WasmCert-Isabelle (and there's also WasmRef-Isabelle and their 2023 paper, and the earlier WasmCert-Coq); other than being in Lean instead of Isabelle/HOL or Coq, how would you compare the approach you used? E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?
I'm also curious -- are you just implementing the Wasm binary and text parsing, validation algorithm, and execution semantics in Lean from scratch by reading the English prose in the spec document, and then checking it against the spec tests and the SpecTec description? Or do you have some sort of automated (classical or LLMy) transformation happening? (One could imagine directly transforming the SpecTec, or the OCaml reference interpreter, into Lean... but it sounds like you're not doing that? I think one needs to be a little careful here because e.g. at this point some of the English prose and reference interpreter implementation, and I think maybe some of the tests, are autogenerated from the SpecTec.) Which parts (if any) are outside the scope of the formalization? E.g. for WasmCert-Isabelle, I believe the binary and definitely the text parsing, and I think some of the arithmetic ops, are not covered.
How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc., and that's all before we get to the threads proposal? Because if the goal is to prove programs correct, one risk is that I prove my program correct against your Wasm interpreter (which maybe makes certain choices that aren't determined by the spec), and then I run it against another fully-conforming interpreter in the wild and it behaves incorrectly.
talos is already in use by https://github.com/siderolabs/talos, was confused for a second when I saw talos and wasm for a second, got excited about native wasm pod support.
What is the program logic used here? The num_integer verification example seems to be hardcoding addresses in the spec; what if I want to reason about larger programs that dynamically allocate, where the addresses may not be known statically? How can I make sure these do not overlap? And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
> what if I want to reason about larger programs that dynamically allocate, where the addresses may not be known statically? How can I make sure these do not overlap?
We are actively working on this, as it is a pre-condition :P to reason about the simplest of useful programs. The idea is to develop an API around separation logic that allows you to reason about logic that manipulate non-overlapping regions of memory.
It won't be relevant if address are not known statically since API theorems will be parametrized over non-relevant constants such as addresses, function indices, etc...
> And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
To use the interpreter there is the concept of fuel, which we explicitly hide from the reasoning layer. Using fuel you can write statements of the form, this function returns out of fuel for any value of fuel passed to the interpreter, which is equivalent to prove that your program doesn't terminate.
Is the plan to build a new separation logic framework, or use e.g. iris-lean or splean as a base?
And even if fuel isn’t exposed in the program logic, I’d imagine you’d still want step indexing to allow reasoning around cyclic heap structures. My experience with getting Claude (even fable) to do step indexed logical relations/SL proofs autonomously is that it struggles hard even on toy examples, particularly around the index shuffling. Do you have a setup in mind that can scale to larger programs?
Fun project in any case! I look forward to seeing how it develops :)
You can write "annotate" your rust code using asserts. On the wasm side asserts are converted to trap instructions, so the Lean spec will simply be: For every input this code never traps.
Part of our focus is making sure that specs are both easy to write and read, since they are human facing. Eventually you could imagine how writing code will mostly be writing specs, and both the code and the proofs will be handled by AI agents. In this scenario it is very important that humans can easily audit and modify the specs.
Doesn't that put the Rust compiler (and its assert lowering) in the trusted base? How do you know the asserts you wrote are the traps you're reasoning about?
How do you actually prove it though? I understand if it's fully automated SMT-style proof, but doesn't Lean require tediously explicit proofs? If it doesn't prove automatically do you have to write out Lean helper proofs about the compiled WASM?
I’m on the Cajal team - not OP, but happy to answer questions.
The core bet is that Wasm is a good verification target (close to compiled artifacts, many languages target it), and Lean is the right place to do verification.
Super interested in hearing from people working with Lean, compilers or other Wasm verification frameworks (eg Iris-Wasm).
Initially we considered formalizing rust code, aeneas is a very promising project that would unlock a lot of features right way by transpiling to lean. However, we didn't want to lock ourselves to rust, so we decided to use a lower level target such that we could verify code from "any" language.
We considered LLVM-IR, and RISC-V.
Ultimately WASM felt like the right decisions. More importantly WASM spec is very well done in details, and it is written with formal verification in mind early on, and there are are plans from to include Lean as one of the targets for generating the spec automatically from SpecTec. Once this exist, we will formalize that our interpreter is correct under the definition generated from the official Wasm-Lean-Spec so we remove it from the Trusted-Base going forward.
More on this - LLVM-IR has no official formal semantics and it's riddled with UB. RISC-V has a formal model in Sail, but it's an ISA so you throw away the structured control flow and types which we want for proving.
Wasm has different levels we can validate against - starting with W3C test suite, then later full verification against the SpecTec-generated Lean semantics so that we can drop our interpreter from the trusted base.
I wish these comments were banned. They come up every time someone names a project with a name that was also used by one guy for his forgotten Lisp dialect in the 70s.
Looks very interesting! We have done a lot with WasmCert-Isabelle (and there's also WasmRef-Isabelle and their 2023 paper, and the earlier WasmCert-Coq); other than being in Lean instead of Isabelle/HOL or Coq, how would you compare the approach you used? E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?
I'm also curious -- are you just implementing the Wasm binary and text parsing, validation algorithm, and execution semantics in Lean from scratch by reading the English prose in the spec document, and then checking it against the spec tests and the SpecTec description? Or do you have some sort of automated (classical or LLMy) transformation happening? (One could imagine directly transforming the SpecTec, or the OCaml reference interpreter, into Lean... but it sounds like you're not doing that? I think one needs to be a little careful here because e.g. at this point some of the English prose and reference interpreter implementation, and I think maybe some of the tests, are autogenerated from the SpecTec.) Which parts (if any) are outside the scope of the formalization? E.g. for WasmCert-Isabelle, I believe the binary and definitely the text parsing, and I think some of the arithmetic ops, are not covered.
How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc., and that's all before we get to the threads proposal? Because if the goal is to prove programs correct, one risk is that I prove my program correct against your Wasm interpreter (which maybe makes certain choices that aren't determined by the spec), and then I run it against another fully-conforming interpreter in the wild and it behaves incorrectly.
talos is already in use by https://github.com/siderolabs/talos, was confused for a second when I saw talos and wasm for a second, got excited about native wasm pod support.
Also collides with the Power9 desktop system https://www.raptorcs.com/TALOSII/
What is the program logic used here? The num_integer verification example seems to be hardcoding addresses in the spec; what if I want to reason about larger programs that dynamically allocate, where the addresses may not be known statically? How can I make sure these do not overlap? And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
> what if I want to reason about larger programs that dynamically allocate, where the addresses may not be known statically? How can I make sure these do not overlap?
We are actively working on this, as it is a pre-condition :P to reason about the simplest of useful programs. The idea is to develop an API around separation logic that allows you to reason about logic that manipulate non-overlapping regions of memory.
It won't be relevant if address are not known statically since API theorems will be parametrized over non-relevant constants such as addresses, function indices, etc...
> And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
To use the interpreter there is the concept of fuel, which we explicitly hide from the reasoning layer. Using fuel you can write statements of the form, this function returns out of fuel for any value of fuel passed to the interpreter, which is equivalent to prove that your program doesn't terminate.
Is the plan to build a new separation logic framework, or use e.g. iris-lean or splean as a base?
And even if fuel isn’t exposed in the program logic, I’d imagine you’d still want step indexing to allow reasoning around cyclic heap structures. My experience with getting Claude (even fable) to do step indexed logical relations/SL proofs autonomously is that it struggles hard even on toy examples, particularly around the index shuffling. Do you have a setup in mind that can scale to larger programs?
Fun project in any case! I look forward to seeing how it develops :)
Interesting. Do I have to write specs in Lean against the Wasm semantics or can you annotate Rust directly?
Both.
You can write "annotate" your rust code using asserts. On the wasm side asserts are converted to trap instructions, so the Lean spec will simply be: For every input this code never traps.
Part of our focus is making sure that specs are both easy to write and read, since they are human facing. Eventually you could imagine how writing code will mostly be writing specs, and both the code and the proofs will be handled by AI agents. In this scenario it is very important that humans can easily audit and modify the specs.
Doesn't that put the Rust compiler (and its assert lowering) in the trusted base? How do you know the asserts you wrote are the traps you're reasoning about?
How do you actually prove it though? I understand if it's fully automated SMT-style proof, but doesn't Lean require tediously explicit proofs? If it doesn't prove automatically do you have to write out Lean helper proofs about the compiled WASM?
I’m on the Cajal team - not OP, but happy to answer questions.
The core bet is that Wasm is a good verification target (close to compiled artifacts, many languages target it), and Lean is the right place to do verification.
Super interested in hearing from people working with Lean, compilers or other Wasm verification frameworks (eg Iris-Wasm).
Cool. I've been working on a compiler for a subset of Lean that targets WASM. The compiler is implemented in Lean.
https://github.com/jsmorph/leanexe
I think I managed to use Talos to prove the WAT generated from an example LeanExe program is correct. ?
https://gist.github.com/jsmorph/275a15dc21af037e1d02a1b433be...
Fun.
What other verification targets did you consider?
Initially we considered formalizing rust code, aeneas is a very promising project that would unlock a lot of features right way by transpiling to lean. However, we didn't want to lock ourselves to rust, so we decided to use a lower level target such that we could verify code from "any" language.
We considered LLVM-IR, and RISC-V.
Ultimately WASM felt like the right decisions. More importantly WASM spec is very well done in details, and it is written with formal verification in mind early on, and there are are plans from to include Lean as one of the targets for generating the spec automatically from SpecTec. Once this exist, we will formalize that our interpreter is correct under the definition generated from the official Wasm-Lean-Spec so we remove it from the Trusted-Base going forward.
More on this - LLVM-IR has no official formal semantics and it's riddled with UB. RISC-V has a formal model in Sail, but it's an ISA so you throw away the structured control flow and types which we want for proving.
Wasm has different levels we can validate against - starting with W3C test suite, then later full verification against the SpecTec-generated Lean semantics so that we can drop our interpreter from the trusted base.
Only thing left is to make a Kanban out of it…
Interesting, have you also looked at other formal methods, like Abstract Interpretation?
For "Lean"? LeaRn? Lean Manufacturing? Stupid one-word techbro product names.
Lean is a programming language [1]
> Lean is an open-source programming language and proof assistant that enables correct, maintainable, and formally verified code
[1]: https://lean-lang.org/
Do people just not even search their proposed name anymore?
It's ok for multiple different things to use the same name.
https://en.wikipedia.org/wiki/Lynx_(disambiguation)
I wish these comments were banned. They come up every time someone names a project with a name that was also used by one guy for his forgotten Lisp dialect in the 70s.