Skip to main content

🐍 Simulation of Python code from traces in Coq

· 9 min read

In order to formally verify Python code in Coq our approach is the following:

  1. Import Python code in Coq by running coq-of-python.
  2. Write a purely functional simulation in Coq of the code.
  3. Show that this simulation is equivalent to the translation.
  4. Verify the simulation.

We will show in this article how we can merge the steps 2. and 3. to save time in the verification process. We do so by relying on the proof mode of Coq and unification.

Our mid-term goal is to formally specify the Ethereum Virtual Machine (EVM) and prove that this specification is correct according to reference implementation of the EVM in Python. This would ensure that it is always up-to-date and exhaustive. The code of this project is open-source and available on GitHub: formal-land/coq-of-python.

Python at work

Our Python's monad 🐍

We put the Python code that we import in Coq in a monad M to represent all the features that are hard to express in Coq, mainly the side effects. This monad is a combination of two levels:

  • LowM for the side effects except the control flow.
  • M that adds an error monad on top of LowM to handle the control flow (exceptions, break instruction, ...).


Here is the definition of the LowM monad in CoqOfPython.v:

Module Primitive.
Inductive t : Set -> Set :=
| StateAlloc (object : Object.t Value.t) : t (Pointer.t Value.t)
| StateRead (mutable : Pointer.Mutable.t Value.t) : t (Object.t Value.t)
| StateWrite (mutable : Pointer.Mutable.t Value.t) (update : Object.t Value.t) : t unit
| GetInGlobals (globals : Globals.t) (name : string) : t Value.t.
End Primitive.

Module LowM.
Inductive t (A : Set) : Set :=
| Pure (a : A)
| CallPrimitive {B : Set} (primitive : Primitive.t B) (k : B -> t A)
| CallClosure {B : Set} (closure : Data.t Value.t) (args kwargs : Value.t) (k : B -> t A)
| Impossible.
Arguments Pure {_}.
Arguments CallPrimitive {_ _}.
Arguments CallClosure {_ _}.
Arguments Impossible {_}.

Fixpoint bind {A B : Set} (e1 : t A) (e2 : A -> t B) : t B :=
match e1 with
| Pure a => e2 a
| CallPrimitive primitive k => CallPrimitive primitive (fun v => bind (k v) e2)
| CallClosure closure args kwargs k => CallClosure closure args kwargs (fun a => bind (k a) e2)
| Impossible => Impossible
End LowM.

This is a monad defined by continuation (the variable k):

  • We terminate a computation with the primitive Pure and some result a, that can be any purely functional expression.
  • We can call some primitives grouped in Primitive.t that are side effects:
    • StateAlloc to allocate a new object in the memory,
    • StateRead to read an object from the memory,
    • StateWrite to write an object in the memory,
    • GetInGlobals to read a global variable, doing name resolution. This is a side effects as function definitions in Python do not need to be ordered.
  • We can call a closure (an anonymous function) with CallClosure. This is required for termination, as we cannot define an eval function on the type of Python values since some do not terminate like the Ω expression. See our previous post Translation of Python code to Coq for our definition of Python values. The combinator CallClosure is also very convenient to modularize our proofs: we reason on each closure independently.
  • We can mark a code path as unreachable with Impossible.


The final monad M is defined as:

Definition M : Set :=
LowM.t (Value.t + Exception.t).

It has no parameters as Python is untyped, so all expressions have the same result type:

  • either a success value of type Value.t,
  • or an exception of type Exception.t, with some special cases to represent a return, a break, or a continue instruction.

We define the monadic bind of M like for the error monad:

Definition bind (e1 : M) (e2 : Value.t -> M) : M :=
LowM.bind e1 (fun v => match v with
| inl v => e2 v
| inr e => LowM.Pure (inr e)

Traces 🐾

We define our semantics of a computation e of type M in simulations/proofs/CoqOfPython.v with the predicate:

{{ stack, heap | e ⇓ to_value | P_stack, P_heap }}

that we call a run or a trace, saying that:

  • starting from the initial state stack, heap,
  • the computation e terminates with a value,
  • that is in the image of the function to_value,
  • and with a final stack and heap that satisfy the predicates P_stack and P_heap.

Note that we do not explicit the resulting value and memory state of a computation in this predicate. We only say that it exists and verifies a few properties, that are here for compositionality. We have a purely functional function evaluate that can derive the result of a run of a computation:

evaluate :
forall `{Heap.Trait} {A B : Set}
{stack : Stack.t} {heap : Heap} {e : LowM.t B}
{to_value : A -> B} {P_stack : Stack.t -> Prop} {P_heap : Heap -> Prop}
(run : {{ stack, heap | e ⇓ to_value | P_stack, P_heap }}),
A * { stack : Stack.t | P_stack stack } * { heap : Heap | P_heap heap }

The function evaluate is defined in Coq by a Fixpoint. Its result is what we call a simulation, which is a purely functional definition equivalent to the orignal computation e from Python. It is equivalent by construction.

Building a trace 🔨

A trace is an inductive in Set that we can build with the following constructors:

Inductive t `{Heap.Trait} {A B : Set}
(stack : Stack.t) (heap : Heap)
(to_value : A -> B) (P_stack : Stack.t -> Prop) (P_heap : Heap -> Prop) :
LowM.t B -> Set :=
(* [Pure] primitive *)
| Pure
(result : A)
(result' : B) :
result' = to_value result ->
P_stack stack ->
P_heap heap ->
{{ stack, heap |
LowM.Pure result'
| P_stack, P_heap }}
(* [StateRead] primitive *)
| CallPrimitiveStateRead
(mutable : Pointer.Mutable.t Value.t)
(object : Object.t Value.t)
(k : Object.t Value.t -> LowM.t B) :
IsRead.t stack heap mutable object ->
{{ stack, heap |
k object ⇓
| P_stack, P_heap }} ->
{{ stack, heap |
LowM.CallPrimitive (Primitive.StateRead mutable) k ⇓
| P_stack, P_heap }}
(* [CallClosure] primitive *)
| CallClosure {C : Set}
(f : Value.t -> Value.t -> M)
(args kwargs : Value.t)
(to_value_inter : C -> Value.t + Exception.t)
(P_stack_inter : Stack.t -> Prop) (P_heap_inter : Heap -> Prop)
(k : Value.t + Exception.t -> LowM.t B) :
let closure := Data.Closure f in
{{ stack, heap |
f args kwargs ⇓
| P_stack_inter, P_heap_inter }} ->
(* We quantify over every possible values as we cannot compute the result of the closure here.
We only know that it exists and respects some constraints in this inductive definition. *)
(forall value_inter stack_inter heap_inter,
P_stack_inter stack_inter ->
P_heap_inter heap_inter ->
{{ stack_inter, heap_inter |
k (to_value_inter value_inter)
| P_stack, P_heap }}
) ->
{{ stack, heap |
LowM.CallClosure closure args kwargs k ⇓
| P_stack, P_heap }}
(* ...cases for the other primitives of the monad... *)


In the Pure case we return the final result of the computation. We check the state fulfills the predicate P_stack and P_heap, and that the result is the image by the function to_value of some result.


To read a value in memory, we rely on another predicate IsRead that checks if the mutable pointer is valid in the stack or heap and that the object is the value at this pointer. We then call the continuation k with this object. We have similar rules for allocating a new object in memory and writing at a pointer.

Note that we parameterize all our semantics by `{Heap.Trait} that provides a specific Heap type with read and write primitives. We can choose the implementation of the memory model that we want to use in our simulations in order to simplify the reasoning.


To call a closure, we first evaluate the closure with the arguments and keyword arguments. We then call the continuation k with the result of the closure. We quantify over all possible results of the closure, as we cannot compute it here. This would require to be able to define Fixpoint together with Inductive, which is not possible in Coq. So we only know that the result of the closure exists, and can use the constraints on its result (the function to_value and the predicates P_stack_inter and P_heap_inter) to build a run of the continuation.

The other constructors are not presented here but are similar to the above. We will also add a monadic primitive for loops with the following idea: we show that a loop terminates by building a trace, as traces are Inductive so must be finite. We have no rules for the Impossible case so that building the trace of a computation also shows that the Impossible calls are in unreachable paths.

Example 🔍

We have applied these technique to a small code example with allocation, memory read, and closure call primitives. We were able to show that the resulting simulation obtained by running evaluate on the trace is equal to a simulation written by hand. The proof was just the tactic reflexivity. We believe that we can automate most of the tactics used to build a run, except for the allocations were the user needs to make a choice (immediate, stack, or heap allocation, which address, ...).

To continue our experiments we now need to complete our semantics of Python, especially to take into account method and operator calls.


We have presented an alternative way to build simulations of imperative Python code in purely functional Coq code. The idea is to enable faster reasoning over Python code by removing the need to build explicit simulations. We plan to port this technique to other tools like coq-of-rust as well.

To see what we can do for you talk with us at 🏇. For our previous projects, see our formal verification of the Tezos' L1!