🦀 Monadic notation for the Rust translation

April 3, 2024 · 6 min read

At Formal Land our mission is to reduce the cost of finding bugs in software. We use formal verification, that is to say mathematical reasoning on code, to make sure we find more bugs than with testing. As part of this effort, we are working on a tool coq-of-rust to translate Rust code to Coq, a proof assistant, to analyze Rust programs. Here we present a technical improvement we made in this tool.

One of the challenges of our translation from Rust to Coq is that the generated code is very verbose. The size increase is about ten folds in our examples. A reasons is that we use a monad to represent side effects in Coq, so we need to name each intermediate result and apply the bind operator. Here, we will present a monadic notation that prevents naming intermediate results to make the code more readable.

Contact

This work is funded by the Aleph Zero crypto-currency to verify their Rust smart contracts. You can follow us on X to get our updates. We propose tools and services to make your codebase bug-free with formal verification.

Example 🔎

Here is the Rust source code that we consider:

fn add(a: i32, b: i32) -> i32 {
    a + b
}

Before, we were generating the following Coq code, with let* as the notation for the bind:

Definition add (τ : list Ty.t) (α : list Value.t) : M :=
  match τ, α with
  | [], [ a; b ] =>
    let* a := M.alloc a in
    let* b := M.alloc b in
    let* α0 := M.read a in
    let* α1 := M.read b in
    BinOp.Panic.add α0 α1
  | _, _ => M.impossible
  end.

Now, with the new monadic notation, we generate:

Definition add (τ : list Ty.t) (α : list Value.t) : M :=
  match τ, α with
  | [], [ a; b ] =>
    ltac:(M.monadic
      (let a := M.alloc (| a |) in
      let b := M.alloc (| b |) in
      BinOp.Panic.add (| M.read (| a |), M.read (| b |) |)))
  | _, _ => M.impossible
  end.

The main change is that we do not need to introduce intermediate let* expressions with generated names. The code structure is more similar to the original Rust code, with additional calls to memory primitives such as M.alloc and M.read.

The notation f (| x1, ..., xn |) represents the call to the function f with the arguments x1, ..., xn returning a monadic result. We bind the result with the current continuation that goes up to the wrapping ltac:(M.monadic ...) tactic. We automatically transform the let into a let* with the M.monadic tactic when needed.

Where do we use this notation? 🤔

We use this notation in all the function bodies that we generate, that are all in a monad to represent side effects. We call the ltac:(M.monadic ...) tactic at the start of the functions, as well as at the start of closure bodies that are defined inside functions. This also applies to the translation of if, match, and loop expressions, as we represent their bodies as functions.

Here is an example of code with a match expression:

fn add(a: i32, b: i32) -> i32 {
    match a - b {
        0 => a + b,
        _ => a - b,
    }
}

We translate it to:

Definition add (τ : list Ty.t) (α : list Value.t) : M :=
  match τ, α with
  | [], [ a; b ] =>
    ltac:(M.monadic
      (let a := M.alloc (| a |) in
      let b := M.alloc (| b |) in
      M.read (|
        M.match_operator (|
          M.alloc (| BinOp.Panic.sub (| M.read (| a |), M.read (| b |) |) |),
          [
            fun γ =>
              ltac:(M.monadic
                (let _ :=
                  M.is_constant_or_break_match (|
                    M.read (| γ |),
                    Value.Integer Integer.I32 0
                  |) in
                M.alloc (|
                  BinOp.Panic.add (| M.read (| a |), M.read (| b |) |)
                |)));
            fun γ =>
              ltac:(M.monadic (
                M.alloc (|
                  BinOp.Panic.sub (| M.read (| a |), M.read (| b |) |)
                |)
              ))
          ]
        |)
      |)))
  | _, _ => M.impossible
  end.

We see that we call the tactic M.monadic for each branch of the match expression.

How does it work? 🛠️

The M.monadic tactic is defined in M.v. The main part is:

Ltac monadic e :=
  lazymatch e with
  (* ... *)
  | context ctxt [M.run ?x] =>
    lazymatch context ctxt [M.run x] with
    | M.run x => monadic x
    | _ =>
      refine (M.bind _ _);
        [ monadic x
        | let v := fresh "v" in
          intro v;
          let y := context ctxt [v] in
          monadic y
        ]
    end
  (* ... *)
  end.

In our translation of Rust, all of the values have the common type Value.t. The monadic bind is of type M -> (Value.t -> M) -> M where M is the type of the monad. The M.run function is an axiom that we use as a marker to know where we need to apply M.bind. The type of M.run is:

Axiom run : M -> Value.t.

The notation for monadic function calls is defined using the M.run axiom with:

Notation "e (| e1 , .. , en |)" := (M.run ((.. (e e1) ..) en)).

When we encounter a M.run (line 4) we apply the M.bind (line 8) to the monadic expression x (line 9) and its continuation ctx that we obtain thanks to the context keyword (line 4) of the matching of expressions in Ltac.

There is another case in the M.monadic tactic to handle the let expressions, that is not shown here.

Conclusion

Thanks to this new monadic notation, the generated Coq code is more readable and closer to the original Rust code. This should simplify our work in writing proofs on the generated code, as well as debugging the translation.

If you are interested in formally verifying your Rust projects, do not hesitate to get in touch with us at contact@formal.land 💌! Formal verification provides the highest level of safety for critical applications, with a mathematical guarantee of the absence of bugs for a given specification.

Example 🔎​

Where do we use this notation? 🤔​

How does it work? 🛠️​

Conclusion​

Example 🔎

Where do we use this notation? 🤔

How does it work? 🛠️

Conclusion