Skip to main content

🦀 Formal verification of the type checker of Sui – part 2

· 10 min read

We are working on formally verifying the 🦀 Rust implementation of the Move type-checker for bytecode in the proof system 🐓 Coq. You can find the code of this type-checker in the crate move-bytecode-verifier.

This requires translating all the Rust code in idiomatic Coq on which we will write our specifications and proofs. We write this translation by hand relying as much as possible on generative AI tools such as GitHub Copilot, as there are many particular cases. We plan, eventually, to prove it equivalent to the translation automatically generated by coq-of-rust.

In this blog post we present how we organize our 🔮 monad to represent the side-effects used in this Rust code. We believe this organization should work for other Rust projects as well.

Get started

To ensure your code is secure today, contact us at  📧contact@formal.land! 🚀

Formal verification goes further than traditional audits to make 100% sure you cannot lose your funds. It can be integrated into your CI pipeline to make sure that every commit is correct without running a full audit again.

We make bugs such as the DAO hack ($60 million stolen) virtually impossible to happen again.

Forge in forest

Primitive effects

In functional programming, effects (or side-effects) are every operation that cannot be directly represented as a mathematical function, that is to say, a procedure that returns an output purely based on the value of its inputs and does nothing else. For example, the function returning the current time makes an effect as it depends on a hidden state (the current time) that is not passed as an argument. The function printing a message to the console makes an effect as it modifies the state of the console, in addition to returning a value that is generally either empty or a confirmation of the printing. Arithmetic operations (+, *, ...) are an example of pure functions.

We consider three primitive effects in our Rust code:

  • Panic For many reasons, a Rust program can panic, as a result of an out-of-bounds access to an array or a wrong unwrap, for example. This is an effect as no outputs are returned in case of a panic.
  • Result The Result type is used to represent the result of a computation that can fail. It is a sum type with two constructors: Ok for the successful result and Err for the error. The Rust operator ? is used to propagate errors in a function that returns a Result. This is another effect for us.
  • State Finally, we consider the functions that mutate one of their arguments as effectful. The mutated parameter is generally typed as a mutable reference &mut.

All our Coq definitions to represent the effects are in the file simulations/M.v.

Panic

We define a monad Panic.t to represent the effect of a panic with:

Module Panic.
Inductive t (A : Set) : Set :=
| Value : A -> t A
| Panic {Error : Set} : Error -> t A.

Note that the type Error in this position is an existential type. This has a few consequences:

  • We do not need to annotate the type Panic.t with the type of the error.
  • We can use any type for Error when we trigger a panic operation. This is useful for debugging, as we can add any payload to the panic message to help us understand what went wrong.
  • We cannot compute on the panic payload. We do not consider this as a limitation, as panics should not be caught and handled in a Rust program, only propagated.

We define the monadic return and bind operations as usual:

Definition return_ {A : Set} (value : A) : t A := Value value.

Definition bind {A B : Set} (value : t A) (f : A -> t B) : t B :=
match value with
| Value value => f value
| Panic error => Panic error
end.

We introduce notations based on the exclamation mark ! to make the code more readable:

Notation "M!" := Panic.t.

Notation "return!" := Panic.return_.

Notation "'let!' x ':=' X 'in' Y" :=
(Panic.bind X (fun x => Y))
(at level 200, x pattern, X at level 100, Y at level 200).

Result

We define the monad Result.t to represent the propagation of errors with the ? operator with:

Module Result.
Inductive t (A Error : Set) : Set :=
| Ok : A -> t A Error
| Err : Error -> t A Error.

The difference with the Panic.t monad is that the error type is not existential anymore. This is because we want to be able to compute on the error payload, as some functions depend on the error value.

We define the return and bind operations as:

Definition return_ {A Error : Set} (value : A) : t A Error := Ok value.

Definition bind {Error A B : Set} (value : t A Error) (f : A -> t B Error) : t B Error :=
match value with
| Ok value => f value
| Err error => Err error
end.

The bind corresponds to the question mark operator ? in Rust. We also introduce notation to make the code more readable:

Notation "M?" := (fun A Error => Result.t Error A).

Notation "return?" := Result.return_.

Notation "'let?' x ':=' X 'in' Y" := ...

State

Finally, we define the monad State.t 🇺🇸 to represent the effect of one or several mutable references with a mutable state type S:

Module State.
Definition t (State A : Set) : Set := State -> A * State.

Definition return_ {State A : Set} (value : A) : t State A :=
fun state => (value, state).

Definition bind {State A B : Set} (value : t State A) (f : A -> t State B) : t State B :=
fun state =>
let (value, state) := value state in
f value state.

The state S will typically be the tuple of all the current mutable references in the Rust code. We use notations based on the letter S.

We also introduce lens operations that mimic how we can extract a mutable reference to the part of a data structure from a mutable reference to the whole data structure in Rust. Here is the definition of the lens type:

Record t {Big_A A : Set} : Set := {
read : Big_A -> M! A;
write : Big_A -> A -> M! Big_A
}.

The read and write operations correspond to the dereferencing and the assignment of a mutable reference in Rust. The type Big_A is the type of the whole data structure, and the type A is the type of the part that we are referencing. These primitives might fail (there are in the panic monad) if the mutable reference is not valid, for example, for an out-of-bounds access in an array or an invalid case in an enum.

We can use a lens to lift a computation that operates on a part of a data structure to a computation that operates on the whole data structure. We provide various lift operators to help with this.

Combinaisons

Depending on the Rust code we want to translate, we might need to use none, one, or several of the effects above. We explicitly define all the possible combinations of the above monads, as well as return operations to go from one monad to another, more general monad.

The special case is for the combination of the panic and state effect. When a panic occurs, we do not return the resulting state, as we are not supposed to continue the evaluation after a panic so the current state should not be relevant. We lose the information about the state of the program when a panic occurs, which can be a limitation for debugging, but:

  • It simplifies some definitions of simulations, and forces us not to speak about the specification of a state after a panic, what should not be relevant.
  • We can still return the current state as an additional payload in the panic operator. This is actually what our panic operator does by default.

The most complete monad combines all the effects:

Module StatePanicResult.
Definition t (State Error A : Set) : Set :=
MS! State (M? Error A).

Definition return_ {State Error A : Set} (value : A) : t State Error A :=
returnS! (Result.Ok value).

Definition bind {State Error A B : Set}
(value : t State Error A)
(f : A -> t State Error B) :
t State Error B :=
letS! value := value in
match value with
| Result.Ok value => f value
| Result.Err error => returnS! (Result.Err error)
end.

with the notations:

Notation "MS!?" := StatePanicResult.t.

Notation "returnS!?" := StatePanicResult.return_.

Notation "'letS!?' x ':=' X 'in' Y" := ...
info

We are repeating our notations a lot, as our three effects and their combinations are very similar. In addition, we always have to explicitly choose in our code which monad we use and add explicit conversions to go from one to another. A future enhancement could be to add some automation at this level, through the use of type-classes, for example, to automatically infer the monad to use based on the operations used in the code 🦾. For now, we prefer to stay explicit.

Iterations

To convert code involving for loops 🔁 or manipulations with the .map method of iterators, we introduce the effectful version of the for loop (fold or reduce in functional languages) and the map method. For example, for the folding operation:

(** The order of parameters is the same as in the source `for` loops. *)
Definition fold_left {State Error A B : Set}
(init : A)
(l : list B)
(f : A -> B -> t State Error A) :
t State Error A :=
List.fold_left (fun acc x => bind acc (fun acc => f acc x)) l (return_ init).

with the notation:

Notation "foldS!?" := StatePanicResult.fold_left.

Conclusion

Thanks to the definitions and notations above, we were able to translate (manually/with GitHub Copilot) all the code of the type-checker for the Move bytecode to Coq in an idiomatic Coq code of a size roughly similar to the original Rust code. This translation is available in our folder move_sui/simulations/move_bytecode_verifier 🚀.

In the next post we will present how we tested this translation to be faithful to the original Rust code, waiting to have an efficient way to prove it equivalent.

For more

Follow us on X or LinkedIn, or comment on this post below! Feel free to DM us for any services you need.