Links

The "links" phase is the first refinement step we do in order to simplify the code generated by coq-of-rust and to give it a semantics.

The main idea is to add back typing and handle naming/trait resolution. These steps are already done by the Rust compiler, but do not translate these information to Rocq with coq-of-rust as we have not found a reliable way to do so. An example of complexity is the handling of mutually dependent functions or types, that are hard to represent in a formal system such as Rocq. By letting the user do the work, we can have handle more case (in particular, having a translation of the core library that type-checks), at the expense of more complexity for the user.

On this page we aim to summarize the conventions we follow to define the "links" refinements. A large part of it is automated in tactics, and the rest that is very verbose can be handled by AI autocomplete rather well. You can look at the file revm/revm_interpreter/links/gas.v for a reference example.

Link class

The main file for links definitions is links/M.v. The type-class for links on types is:

Class Link (A : Set) : Set := {
  Φ : Ty.t;
  φ : A -> Value.t;
}.

This defines, for a Rocq type A, how it maps to a translated Rust type for two components:

Ty.t which is the descriptor of the type, used for the resolution of trait instances or associated functions,
Value.t which are the translated values of the type.

Here is as an example the definition for the boolean type:

Global Instance IsLink : Link bool := {
  Φ := Ty.path "bool";
  φ b := Value.Bool b;
}.

For the result type, which is an example of polymorphic type, we have:

Module Result.
  Inductive t (T E : Set) : Set :=
  | Ok : T -> t
  | Err : E -> t.
  Arguments Ok {_ _}.
  Arguments Err {_ _}.

  Global Instance IsLink (T E : Set) `{Link T} `{Link E} : Link (t T E) := {
    Φ := Ty.apply (Ty.path "core::result::Result") [] [Φ T; Φ E];
    φ x :=
      match x with
      | Ok x => Value.StructTuple "core::result::Result::Ok" [φ x]
      | Err x => Value.StructTuple "core::result::Result::Err" [φ x]
      end;
  }.
End Result.

A few remarks:

We set the link Instance as Global instead of export for now for ease of use.
We only have a function from the link types to the base definitions Ty.t and Value.t. We think that it would be hard to define a function that goes in the other direction, as for example it would need to be global from the start and make composition more complex.
The type descriptor for generic types is Ty.apply with the path of the type and the list of constant arguments (that are values) and the list of type arguments (that are other type descriptors).
Names generated by coq-of-rust as strings are always unique and global. This is one of the nice things provided by the Rust compiler.

We have some conventions to represent Rust values in Value.t. You can see the complete definition in M.v:

Module Value.
  Inductive t : Set :=
  | Bool : bool -> t
  | Integer (kind : IntegerKind.t) (z : Z) : t
  (** For now we do not know how to represent floats so we use a string *)
  | Float : string -> t
  | UnicodeChar : Z -> t
  | String : string -> t
  | Tuple : list t -> t
  | Array : list t -> t
  | StructRecord : string -> list (string * t) -> t
  | StructTuple : string -> list t -> t
  | Pointer : Pointer.t t -> t
  (** The two existential types of the closure must be [Value.t] and [M]. We
      cannot enforce this constraint there yet, but we will do when defining the
      semantics. *)
  | Closure : {'(Value, M) : (Set * Set) @ list Value -> M} -> t
  (** A special value that does not appear in the translation, but that we use
      to implement primitive functions over values that are not total. We
      statically know, from the fact that the source Rust code is well-typed,
      that these error values are impossible. In these values appear in a proof,
      this might indicate invalid pre-conditions or mistakes in the translation
      to Rocq. *)
  | Error (message : string)
  (** To implement the ability to declare a variable but not give it a value
      yet. *)
  | DeclaredButUndefined.
End Value.

Link helpers

When we define links, we also add three kinds of operations:

of_ty to convert a type descriptor back to a Rocq type;
of_value_with to convert a value back to a Rocq value knowing the expected type;
of_value to convert a value back to a Rocq value without knowing the expected type.

Here is the example for the boolean type:

Definition of_ty : OfTy.t (Ty.path "bool").
Proof. eapply OfTy.Make with (A := bool); reflexivity. Defined.
Smpl Add apply of_ty : of_ty.

Lemma of_value_with (b : bool) :
  Value.Bool b = φ b.
Proof. reflexivity. Qed.
Smpl Add apply of_value_with : of_value.

Definition of_value (b : bool) :
  OfValue.t (Value.Bool b).
Proof.
  eapply OfValue.Make with (A := bool); smpl of_value.
Defined.
Smpl Add apply of_value : of_value.

Here is the example for a struct type:

(*
pub struct MemoryGas {
    pub words_num: usize,
    pub expansion_cost: u64,
}
*)
Module MemoryGas.
  Record t : Set := {
    words_num : Usize.t;
    expansion_cost : U64.t;
  }.

  Global Instance IsLink : Link t := {
    Φ := Ty.path "revm_interpreter::gas::MemoryGas";
    φ x :=
      Value.StructRecord "revm_interpreter::gas::MemoryGas" [
        ("words_num", φ x.(words_num));
        ("expansion_cost", φ x.(expansion_cost))
      ];
  }.

  Definition of_ty : OfTy.t (Ty.path "revm_interpreter::gas::MemoryGas").
  Proof. eapply OfTy.Make with (A := t); reflexivity. Defined.
  Smpl Add apply of_ty : of_ty.

  Lemma of_value_with words_num words_num' expansion_cost expansion_cost' :
    words_num' = φ words_num ->
    expansion_cost' = φ expansion_cost ->
    Value.StructRecord "revm_interpreter::gas::MemoryGas" [
      ("words_num", words_num');
      ("expansion_cost", expansion_cost')
    ] = φ (Build_t words_num expansion_cost).
  Proof. now intros; subst. Qed.
  Smpl Add apply of_value_with : of_value.

  Definition of_value (words_num : Usize.t) words_num' (expansion_cost : U64.t) expansion_cost' :
    words_num' = φ words_num ->
    expansion_cost' = φ expansion_cost ->
    OfValue.t (
      Value.StructRecord "revm_interpreter::gas::MemoryGas" [
        ("words_num", words_num');
        ("expansion_cost", expansion_cost')
      ]
    ).
  Proof. econstructor; apply of_value_with; eassumption. Defined.
  Smpl Add apply of_value : of_value.
End MemoryGas.

A few notes:

We use the Smpl tactic to add the conversion to the database of_ty or of_value of the smpl tactic. This is a tactic that tries to apply all the conversion rules in the database, what will be useful later in the proofs. We could also use type-classes for it, and it may be more efficient, especially as type-classes have a mechanism to control which hints are available in the current context (with the default export visibility of instances).
These definitions are very verbose but can be efficiently generated by AI autocomplete. We also a Python script to generate them that helps us sometimes, but it not complete yet.
Most of the definitions are straightforward, but some might need a custom solutions, especially if the type has some mutual dependencies with other types.
Some cases are still unclear. For example, how to handle the None constructor as it can be used for Option<A> for any A? For now, we have no automation for this case. It will probably require generating more annotation in the translated Rust code, to make explicit the type of constructors.

Sub-pointer helpers

Computing pointers to structure fields or, in a functional programming vocabulary, lens to fields of structures is an important part of our work. For each datatype which has a notion of fields, we define a module with all the possible lens functions and a proof that they are correct, meaning compatible with how sub-fields are computed on the Rust translation. If we take the MemoryGas.t example from above, we get:

Module SubPointer.
  Definition get_words_num : SubPointer.Runner.t t
    (Pointer.Index.StructRecord "revm_interpreter::gas::MemoryGas" "words_num") :=
  {|
    SubPointer.Runner.projection x := Some x.(words_num);
    SubPointer.Runner.injection x y := Some (x <| words_num := y |>);
  |}.

  Lemma get_words_num_is_valid :
    SubPointer.Runner.Valid.t get_words_num.
  Proof.
    now constructor.
  Qed.
  Smpl Add apply get_words_num_is_valid : run_sub_pointer.

  Definition get_expansion_cost : SubPointer.Runner.t t
    (Pointer.Index.StructRecord "revm_interpreter::gas::MemoryGas" "expansion_cost") :=
  {|
    SubPointer.Runner.projection x := Some x.(expansion_cost);
    SubPointer.Runner.injection x y := Some (x <| expansion_cost := y |>);
  |}.

  Lemma get_expansion_cost_is_valid :
    SubPointer.Runner.Valid.t get_expansion_cost.
  Proof.
    now constructor.
  Qed.
  Smpl Add apply get_expansion_cost_is_valid : run_sub_pointer.
End SubPointer.

The sub-pointer definition is a lens annotated by an index generated in the Rust translation. They are also added to an Smpl database in order to be used automatically later in the proofs.

Expressions

Now that we have seen how to define link structures for types and values, we are to going to see how we make these definitions for expressions. To be able to understand what is going on and debug cases that might fail, it is interesting to look at:

The definition of the monad M that we use to represent expressions with side-effects in the Rocq translation. This is defined in M.v.
The definition of the inductive rules Run.t to add typing and name/trait resolution in links/M.v.
The definition of the tactic run_symbolic also in links/M.v that is used to automatically apply the rules of Run.t to the generated code, when possible.

The general idea is to go step-by-step through the whole expressions that are used to define functions in Rust to add the necessary information for typing and name/trait resolution in a way that is:

Proportional to the size of the expressions. For example, there are $n$ if expressions in sequence, we should have $n$ rules to handle them and not $2^n$ .
Using the unification mechanism of Rocq at a maximum, as well as the automations capabilities such as type-classes inference or custom tactics.

Predicates

Here is the general predicate to say that a translated expression e can be seen as evaluating to a certain typed value (that we do not compute at all here) of a certain type Output:

{{ e 🔽 R , Output }}

Here the type R represent the type of returned values in case there is a return statement in the Rust code. For top-level expressions, like the body of a function, the return type is the same as the output type, so we can use this shortcut:

{{ e 🔽 Output }}

This predicate has an inductive definition and generated automatically by the run_symbolic tactic for all common cases.

Functions

Top-level Rust functions can always be polymorphic with both constant and type parameters. Combined with their list of runtime arguments, this means Rust functions always have three lists of parameters. We define the type of a translated function as:

Module PolymorphicFunction.
  Definition t : Set :=
    list Value.t -> list Ty.t -> list Value.t -> M.
End PolymorphicFunction.

Because we often apply the link predicate to such functions, and in order to make it a type-class to add automation with the instance inference mechanism, we define the following type-class:

Module Run.
  Class Trait
      (f : PolymorphicFunction.t)
      (ε : list Value.t)
      (τ : list Ty.t)
      (α : list Value.t)
      (Output : Set) `{Link Output} :
      Set :=
  {
    run_f : {{ f ε τ α 🔽 Output }};
  }.
End Run.

Then, for top-level definitions or functions in an impl block (but not for impl for blocks that are trait instances), we make the following statement:

Instance run_spent (self : Ref.t Pointer.Kind.Ref Self) :
  Run.Trait gas.Impl_revm_interpreter_gas_Gas.spent [] [] [φ self] U64.t.

for a function with a signature such like:

pub const fn spent(&self) -> u64 {
    self.limit - self.remaining
}

Some remarks:

The name gas.Impl_revm_interpreter_gas_Gas.spent is the name of the function in the translated Rust code. You need to find it in the corresponding .v file.
The two first lists of parameters are empty [] as there are no constant or type parameters.
The last list of parameters is [φ self] as the function takes a single parameter named self.
The output type is U64.t as the function returns a u64 in Rust.
The type of self is Ref.t Pointer.Kind.Ref Self as it is a read-only reference to the type Self in Rust.
We use an Instance so that the proof that the function can be evaluated to the expected type can be automatically reused in other statements. We use the default visibility of Instance which is export. It means that to use this instance in another file, you need to import its module with Import. This helps to reduce the size of the exploration space of the type-class inference mechanism.

To construct this instance, we use the proof mode of Rocq with:

Proof.
  constructor.
  run_symbolic.
Defined.

THe fact that we must always start by constructor prevents the run_symbolic tactic to go into the body of a function that we are calling. Instead, we should have one link per function and reuse existing ones when we call another function.

For impl block, the corresponding module in Rocq should start with:

Module Impl_TypeName.
  Definition Self : Set :=
    TypeName.t.

  (* The instances... *)
End Impl_TypeName.

The run_symbolic tactic might not always work fully automatically. When you encounter such cases, here are some strategies to follow:

Check if the types that you expect for your function are correct. Maybe you have made a typo there.
If you are at the beginning of an if or match construct, you will first need to manually do a destruct or similar to explore all cases.
You might need to Import the modules where are defined the functions yuo are calling, so that their instances can be used by the type-class inference mechanism.
You might be missing some link definitions or helpers for your types.

A good way to debug is to look at the rules of Run.t to see which one should have been applied, and why it failed.

Traits

The Rust traits are a bit more complicated. There are two distinct parts:

The trait definition itself, that is a list of associated types and functions. This is translated to a type-class in Rocq.
The trait implementation, that is a list of function implementations for a certain trait and Self type.

Definition

There are a lot of trait definition examples in the file revm/revm_interpreter/links/interpreter_types.v. Here is one of these:

(*
pub trait LegacyBytecode {
    fn bytecode_len(&self) -> usize;
    fn bytecode_slice(&self) -> &[u8];
}
*)
Module LegacyBytecode.
  Definition trait (Self : Set) `{Link Self} : TraitMethod.Header.t :=
    ("revm_interpreter::interpreter_types::LegacyBytecode", [], [], Φ Self).

  Definition Run_bytecode_len (Self : Set) `{Link Self} : Set :=
    TraitMethod.C (trait Self) "bytecode_len" (fun method =>
      forall (self : Ref.t Pointer.Kind.Ref Self),
      Run.Trait method [] [] [ φ self ] Usize.t
    ).

  Definition Run_bytecode_slice (Self : Set) `{Link Self} : Set :=
    TraitMethod.C (trait Self) "bytecode_slice" (fun method =>
      forall (self : Ref.t Pointer.Kind.Ref Self),
      Run.Trait method [] [] [ φ self ] (Ref.t Pointer.Kind.Ref (list U8.t))
    ).

  Class Run (Self : Set) `{Link Self} : Set := {
    bytecode_len : Run_bytecode_len Self;
    bytecode_slice : Run_bytecode_slice Self;
  }.
End LegacyBytecode.

We first define what we expect for each method of the trait independently, and then group all these definitions in the Run type-class for the trait definition. Splitting what is required for each method is useful for when we define a trait instance where some methods are calling other ones, so before the whole trait is instantiated.

The definition trait is for the header of the trait, meaning:

Its global unique name, that needs to match the one in the generated Rust code in order to make progress with run_symbolic.
The list of constant parameters, that is empty here.
The list of type parameters, that is empty here.
The type descriptor of the Self type.

Note that in Rust methods can be polymorphic in addition of the trait itself. These are two different polymorphisms. We handle the trait polymorphism in the definition of trait, and the method polymorphism in the definitions of the Run_* for the individual methods. In this example, nothing is polymorphic expect for the Self type, that is always a parameter in trait definitions.

Instances

For trait instances, you can see an example with an instance of the Default trait for the Gas.t type in the gas file revm/revm_interpreter/links/gas.v:

Module Impl_Default_for_Gas.
  Definition run_default : Default.Run_default Gas.t.
  Proof.
    eexists.
    { eapply IsTraitMethod.Defined.
      { apply gas.Impl_core_default_Default_for_revm_interpreter_gas_Gas.Implements. }
      { reflexivity. }
    }
    { constructor.
      pose (default.Impl_Default_for_integer.run_default IntegerKind.U64).
      pose (default.Impl_Default_for_integer.run_default IntegerKind.I64).
      pose Impl_Default_for_MemoryGas.run_default.
      run_symbolic.
    }
  Defined.

  Instance run : Default.Run Gas.t := {
    Default.default := run_default;
  }.
End Impl_Default_for_Gas.

Here there is some administrative work that needs to be done for the instances (we should automate it more later). The command:

apply gas.Impl_core_default_Default_for_revm_interpreter_gas_Gas.Implements.

relates to the name of the trait instance in the generated Rust code. In must be correct at this point and is verified by Rocq. Note that names in the generated code are sometimes long, but stable on code changes (no added numbers to avoid collisions for example).

Something interesting is that before the run_symbolic we also do:

pose (default.Impl_Default_for_integer.run_default IntegerKind.U64).
pose (default.Impl_Default_for_integer.run_default IntegerKind.I64).
pose Impl_Default_for_MemoryGas.run_default.
run_symbolic.

This is to have the link of the instances of the trait methods we will call. For now, they need to be explicitly in context. You could have also done a destruct of the whole instance of the trait, with for example:

destruct (default.Impl_Default_for_integer.run IntegerKind.U64).

as it reveals all the methods.

Associated types

For short, we represent associated types in traits as additional parameters for the trait. Here is an example with the Deref trait:

(*
pub trait Deref {
    type Target: ?Sized;
    fn deref(&self) -> &Self::Target;
}
*)
Module Deref.
  Definition trait (Self : Set) `{Link Self} : TraitMethod.Header.t :=
    ("core::ops::deref::Deref", [], [], Φ Self).

  Definition Run_deref
      (Self : Set) `{Link Self}
      (Target : Set) `{Link Target} :
      Set :=
    TraitMethod.C (trait Self) "deref" (fun method =>
      forall (self : Ref.t Pointer.Kind.Ref Self),
      Run.Trait method [] [] [ φ self ] (Ref.t Pointer.Kind.Ref Target)
    ).

  Class Run
      (Self : Set) `{Link Self}
      (Target : Set) `{Link Target} :
      Set := {
    deref : Run_deref Self Target;
  }.
End Deref.

The reason is that we can force these associated types to have a certain value in Rust. In that case, we can provide this value as a parameter. If this was not the case, we would be using existential types that are well supported in Rocq.

When a trait depends on other traits, for its Self type, polymorphic types, or associated types, and when these types have associated types, we propagate these types parameters. For example, for the DerefMut trait:

(*
pub trait DerefMut: Deref {
    fn deref_mut(&mut self) -> &mut Self::Target;
}
*)
Module DerefMut.
  Definition trait (Self : Set) `{Link Self} : TraitMethod.Header.t :=
    ("core::ops::deref::DerefMut", [], [], Φ Self).

  Definition Run_deref_mut
      (Self : Set) `{Link Self}
      (Target : Set) `{Link Target} :
      Set :=
    TraitMethod.C (trait Self) "deref_mut" (fun method =>
      forall (self : Ref.t Pointer.Kind.MutRef Self),
      Run.Trait method [] [] [ φ self ] (Ref.t Pointer.Kind.MutRef Target)
    ).

  Class Run
      (Self : Set) `{Link Self}
      (Target : Set) `{Link Target} :
      Set := {
    deref_mut : Run_deref_mut Self Target;
  }.
End DerefMut.

When associated types become too numerous (as these parameters are not visible on the Rust code, the programmers might introduce a lot of them), a pattern is to group them in a single record of types. You can see this approach used at the end of the file revm/revm_interpreter/links/interpreter_types.v with the definition of the InterpreterTypes trait. Here is an example of use of this trait, in the revm/revm_interpreter/instructions/links/arithmetic.v file:

(*
pub fn add<WIRE: InterpreterTypes, H: Host + ?Sized>(
    interpreter: &mut Interpreter<WIRE>,
    _host: &mut H,
)
*)
Instance run_add
    {WIRE H : Set} `{Link WIRE} `{Link H}
    {WIRE_types : InterpreterTypes.Types.t} `{InterpreterTypes.Types.AreLinks WIRE_types}
    (run_InterpreterTypes_for_WIRE : InterpreterTypes.Run WIRE WIRE_types)
    (interpreter : Ref.t Pointer.Kind.MutRef (Interpreter.t WIRE WIRE_types))
    (_host : Ref.t Pointer.Kind.MutRef H) :
  Run.Trait
    instructions.arithmetic.add [] [ Φ WIRE; Φ H ] [ φ interpreter; φ _host ]
    unit.

We do not reference the Host trait in the example above as the variable _host is not used.

Link class​

Link helpers​

Sub-pointer helpers​

Expressions​

Predicates​

Functions​

Traits​

Definition​

Instances​

Associated types​