Links
The "links" phase is the first refinement step we do in order to simplify the code generated by coq-of-rust and to give it a semantics.
The main idea is to add back typing and handle naming/trait resolution. These steps are already done by the Rust compiler, but do not translate these information to Rocq with coq-of-rust
as we have not found a reliable way to do so. An example of complexity is the handling of mutually dependent functions or types, that are hard to represent in a formal system such as Rocq. By letting the user do the work, we can have handle more case (in particular, having a translation of the core
library that type-checks), at the expense of more complexity for the user.
On this page we aim to summarize the conventions we follow to define the "links" refinements. A large part of it is automated in tactics, and the rest that is very verbose can be handled by AI autocomplete rather well. You can look at the file revm/revm_interpreter/links/gas.v for a reference example.
Link class
The main file for links definitions is links/M.v. The type-class for links on types is:
Class Link (A : Set) : Set := {
Φ : Ty.t;
φ : A -> Value.t;
}.
This defines, for a Rocq type A
, how it maps to a translated Rust type for two components:
Ty.t
which is the descriptor of the type, used for the resolution of trait instances or associated functions,Value.t
which are the translated values of the type.
Here is as an example the definition for the boolean type:
Global Instance IsLink : Link bool := {
Φ := Ty.path "bool";
φ b := Value.Bool b;
}.
For the result type, which is an example of polymorphic type, we have:
Module Result.
Inductive t (T E : Set) : Set :=
| Ok : T -> t
| Err : E -> t.
Arguments Ok {_ _}.
Arguments Err {_ _}.
Global Instance IsLink (T E : Set) `{Link T} `{Link E} : Link (t T E) := {
Φ := Ty.apply (Ty.path "core::result::Result") [] [Φ T; Φ E];
φ x :=
match x with
| Ok x => Value.StructTuple "core::result::Result::Ok" [φ x]
| Err x => Value.StructTuple "core::result::Result::Err" [φ x]
end;
}.
End Result.
A few remarks:
- We set the link
Instance
asGlobal
instead of export for now for ease of use. - We only have a function from the link types to the base definitions
Ty.t
andValue.t
. We think that it would be hard to define a function that goes in the other direction, as for example it would need to be global from the start and make composition more complex. - The type descriptor for generic types is
Ty.apply
with the path of the type and the list of constant arguments (that are values) and the list of type arguments (that are other type descriptors). - Names generated by
coq-of-rust
as strings are always unique and global. This is one of the nice things provided by the Rust compiler. - We have some conventions to represent Rust values in
Value.t
. You can see the complete definition in M.v:Module Value.
Inductive t : Set :=
| Bool : bool -> t
| Integer (kind : IntegerKind.t) (z : Z) : t
(** For now we do not know how to represent floats so we use a string *)
| Float : string -> t
| UnicodeChar : Z -> t
| String : string -> t
| Tuple : list t -> t
| Array : list t -> t
| StructRecord : string -> list (string * t) -> t
| StructTuple : string -> list t -> t
| Pointer : Pointer.t t -> t
(** The two existential types of the closure must be [Value.t] and [M]. We
cannot enforce this constraint there yet, but we will do when defining the
semantics. *)
| Closure : {'(Value, M) : (Set * Set) @ list Value -> M} -> t
(** A special value that does not appear in the translation, but that we use
to implement primitive functions over values that are not total. We
statically know, from the fact that the source Rust code is well-typed,
that these error values are impossible. In these values appear in a proof,
this might indicate invalid pre-conditions or mistakes in the translation
to Rocq. *)
| Error (message : string)
(** To implement the ability to declare a variable but not give it a value
yet. *)
| DeclaredButUndefined.
End Value.
Link helpers
When we define links, we also add three kinds of operations:
of_ty
to convert a type descriptor back to a Rocq type;of_value_with
to convert a value back to a Rocq value knowing the expected type;of_value
to convert a value back to a Rocq value without knowing the expected type.
Here is the example for the boolean type:
Definition of_ty : OfTy.t (Ty.path "bool").
Proof. eapply OfTy.Make with (A := bool); reflexivity. Defined.
Smpl Add apply of_ty : of_ty.
Lemma of_value_with (b : bool) :
Value.Bool b = φ b.
Proof. reflexivity. Qed.
Smpl Add apply of_value_with : of_value.
Definition of_value (b : bool) :
OfValue.t (Value.Bool b).
Proof.
eapply OfValue.Make with (A := bool); smpl of_value.
Defined.
Smpl Add apply of_value : of_value.
Here is the example for a struct
type:
(*
pub struct MemoryGas {
pub words_num: usize,
pub expansion_cost: u64,
}
*)
Module MemoryGas.
Record t : Set := {
words_num : Usize.t;
expansion_cost : U64.t;
}.
Global Instance IsLink : Link t := {
Φ := Ty.path "revm_interpreter::gas::MemoryGas";
φ x :=
Value.StructRecord "revm_interpreter::gas::MemoryGas" [
("words_num", φ x.(words_num));
("expansion_cost", φ x.(expansion_cost))
];
}.
Definition of_ty : OfTy.t (Ty.path "revm_interpreter::gas::MemoryGas").
Proof. eapply OfTy.Make with (A := t); reflexivity. Defined.
Smpl Add apply of_ty : of_ty.
Lemma of_value_with words_num words_num' expansion_cost expansion_cost' :
words_num' = φ words_num ->
expansion_cost' = φ expansion_cost ->
Value.StructRecord "revm_interpreter::gas::MemoryGas" [
("words_num", words_num');
("expansion_cost", expansion_cost')
] = φ (Build_t words_num expansion_cost).
Proof. now intros; subst. Qed.
Smpl Add apply of_value_with : of_value.
Definition of_value (words_num : Usize.t) words_num' (expansion_cost : U64.t) expansion_cost' :
words_num' = φ words_num ->
expansion_cost' = φ expansion_cost ->
OfValue.t (
Value.StructRecord "revm_interpreter::gas::MemoryGas" [
("words_num", words_num');
("expansion_cost", expansion_cost')
]
).
Proof. econstructor; apply of_value_with; eassumption. Defined.
Smpl Add apply of_value : of_value.
End MemoryGas.
A few notes:
- We use the
Smpl
tactic to add the conversion to the databaseof_ty
orof_value
of thesmpl
tactic. This is a tactic that tries to apply all the conversion rules in the database, what will be useful later in the proofs. We could also use type-classes for it, and it may be more efficient, especially as type-classes have a mechanism to control which hints are available in the current context (with the default export visibility of instances). - These definitions are very verbose but can be efficiently generated by AI autocomplete. We also a Python script to generate them that helps us sometimes, but it not complete yet.
- Most of the definitions are straightforward, but some might need a custom solutions, especially if the type has some mutual dependencies with other types.
- Some cases are still unclear. For example, how to handle the
None
constructor as it can be used forOption<A>
for anyA
? For now, we have no automation for this case. It will probably require generating more annotation in the translated Rust code, to make explicit the type of constructors.
Sub-pointer helpers
Computing pointers to structure fields or, in a functional programming vocabulary, lens to fields of structures is an important part of our work. For each datatype which has a notion of fields, we define a module with all the possible lens functions and a proof that they are correct, meaning compatible with how sub-fields are computed on the Rust translation. If we take the MemoryGas.t
example from above, we get:
Module SubPointer.
Definition get_words_num : SubPointer.Runner.t t
(Pointer.Index.StructRecord "revm_interpreter::gas::MemoryGas" "words_num") :=
{|
SubPointer.Runner.projection x := Some x.(words_num);
SubPointer.Runner.injection x y := Some (x <| words_num := y |>);
|}.
Lemma get_words_num_is_valid :
SubPointer.Runner.Valid.t get_words_num.
Proof.
now constructor.
Qed.
Smpl Add apply get_words_num_is_valid : run_sub_pointer.
Definition get_expansion_cost : SubPointer.Runner.t t
(Pointer.Index.StructRecord "revm_interpreter::gas::MemoryGas" "expansion_cost") :=
{|
SubPointer.Runner.projection x := Some x.(expansion_cost);
SubPointer.Runner.injection x y := Some (x <| expansion_cost := y |>);
|}.
Lemma get_expansion_cost_is_valid :
SubPointer.Runner.Valid.t get_expansion_cost.
Proof.
now constructor.
Qed.
Smpl Add apply get_expansion_cost_is_valid : run_sub_pointer.
End SubPointer.
The sub-pointer definition is a lens annotated by an index generated in the Rust translation. They are also added to an Smpl
database in order to be used automatically later in the proofs.
Expressions
Now that we have seen how to define link structures for types and values, we are to going to see how we make these definitions for expressions. To be able to understand what is going on and debug cases that might fail, it is interesting to look at:
- The definition of the monad
M
that we use to represent expressions with side-effects in the Rocq translation. This is defined in M.v. - The definition of the inductive rules
Run.t
to add typing and name/trait resolution in links/M.v. - The definition of the tactic
run_symbolic
also in links/M.v that is used to automatically apply the rules ofRun.t
to the generated code, when possible.
The general idea is to go step-by-step through the whole expressions that are used to define functions in Rust to add the necessary information for typing and name/trait resolution in a way that is:
- Proportional to the size of the expressions. For example, there are
if
expressions in sequence, we should have rules to handle them and not . - Using the unification mechanism of Rocq at a maximum, as well as the automations capabilities such as type-classes inference or custom tactics.
Predicates
Here is the general predicate to say that a translated expression e
can be seen as evaluating to a certain typed value (that we do not compute at all here) of a certain type Output
:
{{ e 🔽 R , Output }}
Here the type R
represent the type of returned values in case there is a return
statement in the Rust code. For top-level expressions, like the body of a function, the return type is the same as the output type, so we can use this shortcut:
{{ e 🔽 Output }}
This predicate has an inductive definition and generated automatically by the run_symbolic
tactic for all common cases.
Functions
Top-level Rust functions can always be polymorphic with both constant and type parameters. Combined with their list of runtime arguments, this means Rust functions always have three lists of parameters. We define the type of a translated function as:
Module PolymorphicFunction.
Definition t : Set :=
list Value.t -> list Ty.t -> list Value.t -> M.
End PolymorphicFunction.
Because we often apply the link predicate to such functions, and in order to make it a type-class to add automation with the instance inference mechanism, we define the following type-class:
Module Run.
Class Trait
(f : PolymorphicFunction.t)
(ε : list Value.t)
(τ : list Ty.t)
(α : list Value.t)
(Output : Set) `{Link Output} :
Set :=
{
run_f : {{ f ε τ α 🔽 Output }};
}.
End Run.
Then, for top-level definitions or functions in an impl
block (but not for impl for
blocks that are trait instances), we make the following statement:
Instance run_spent (self : Ref.t Pointer.Kind.Ref Self) :
Run.Trait gas.Impl_revm_interpreter_gas_Gas.spent [] [] [φ self] U64.t.
for a function with a signature such like:
pub const fn spent(&self) -> u64 {
self.limit - self.remaining
}
Some remarks:
- The name
gas.Impl_revm_interpreter_gas_Gas.spent
is the name of the function in the translated Rust code. You need to find it in the corresponding.v
file. - The two first lists of parameters are empty
[]
as there are no constant or type parameters. - The last list of parameters is
[φ self]
as the function takes a single parameter namedself
. - The output type is
U64.t
as the function returns au64
in Rust. - The type of
self
isRef.t Pointer.Kind.Ref Self
as it is a read-only reference to the typeSelf
in Rust. - We use an
Instance
so that the proof that the function can be evaluated to the expected type can be automatically reused in other statements. We use the default visibility ofInstance
which is export. It means that to use this instance in another file, you need to import its module withImport
. This helps to reduce the size of the exploration space of the type-class inference mechanism.
To construct this instance, we use the proof mode of Rocq with:
Proof.
constructor.
run_symbolic.
Defined.
THe fact that we must always start by constructor
prevents the run_symbolic
tactic to go into the body of a function that we are calling. Instead, we should have one link per function and reuse existing ones when we call another function.
For impl
block, the corresponding module in Rocq should start with:
Module Impl_TypeName.
Definition Self : Set :=
TypeName.t.
(* The instances... *)
End Impl_TypeName.
The run_symbolic
tactic might not always work fully automatically. When you encounter such cases, here are some strategies to follow:
- Check if the types that you expect for your function are correct. Maybe you have made a typo there.
- If you are at the beginning of an
if
ormatch
construct, you will first need to manually do adestruct
or similar to explore all cases. - You might need to
Import
the modules where are defined the functions yuo are calling, so that their instances can be used by the type-class inference mechanism. - You might be missing some link definitions or helpers for your types.
A good way to debug is to look at the rules of Run.t
to see which one should have been applied, and why it failed.
Traits
The Rust traits are a bit more complicated. There are two distinct parts:
- The trait definition itself, that is a list of associated types and functions. This is translated to a type-class in Rocq.
- The trait implementation, that is a list of function implementations for a certain trait and
Self
type.
Definition
There are a lot of trait definition examples in the file revm/revm_interpreter/links/interpreter_types.v. Here is one of these:
(*
pub trait LegacyBytecode {
fn bytecode_len(&self) -> usize;
fn bytecode_slice(&self) -> &[u8];
}
*)
Module LegacyBytecode.
Definition trait (Self : Set) `{Link Self} : TraitMethod.Header.t :=
("revm_interpreter::interpreter_types::LegacyBytecode", [], [], Φ Self).
Definition Run_bytecode_len (Self : Set) `{Link Self} : Set :=
TraitMethod.C (trait Self) "bytecode_len" (fun method =>
forall (self : Ref.t Pointer.Kind.Ref Self),
Run.Trait method [] [] [ φ self ] Usize.t
).
Definition Run_bytecode_slice (Self : Set) `{Link Self} : Set :=
TraitMethod.C (trait Self) "bytecode_slice" (fun method =>
forall (self : Ref.t Pointer.Kind.Ref Self),
Run.Trait method [] [] [ φ self ] (Ref.t Pointer.Kind.Ref (list U8.t))
).
Class Run (Self : Set) `{Link Self} : Set := {
bytecode_len : Run_bytecode_len Self;
bytecode_slice : Run_bytecode_slice Self;
}.
End LegacyBytecode.
We first define what we expect for each method of the trait independently, and then group all these definitions in the Run
type-class for the trait definition. Splitting what is required for each method is useful for when we define a trait instance where some methods are calling other ones, so before the whole trait is instantiated.
The definition trait
is for the header of the trait, meaning:
- Its global unique name, that needs to match the one in the generated Rust code in order to make progress with
run_symbolic
. - The list of constant parameters, that is empty here.
- The list of type parameters, that is empty here.
- The type descriptor of the
Self
type.
Note that in Rust methods can be polymorphic in addition of the trait itself. These are two different polymorphisms. We handle the trait polymorphism in the definition of trait
, and the method polymorphism in the definitions of the Run_*
for the individual methods. In this example, nothing is polymorphic expect for the Self
type, that is always a parameter in trait definitions.
Instances
For trait instances, you can see an example with an instance of the Default
trait for the Gas.t
type in the gas file revm/revm_interpreter/links/gas.v:
Module Impl_Default_for_Gas.
Definition run_default : Default.Run_default Gas.t.
Proof.
eexists.
{ eapply IsTraitMethod.Defined.
{ apply gas.Impl_core_default_Default_for_revm_interpreter_gas_Gas.Implements. }
{ reflexivity. }
}
{ constructor.
pose (default.Impl_Default_for_integer.run_default IntegerKind.U64).
pose (default.Impl_Default_for_integer.run_default IntegerKind.I64).
pose Impl_Default_for_MemoryGas.run_default.
run_symbolic.
}
Defined.
Instance run : Default.Run Gas.t := {
Default.default := run_default;
}.
End Impl_Default_for_Gas.
Here there is some administrative work that needs to be done for the instances (we should automate it more later). The command:
apply gas.Impl_core_default_Default_for_revm_interpreter_gas_Gas.Implements.
relates to the name of the trait instance in the generated Rust code. In must be correct at this point and is verified by Rocq. Note that names in the generated code are sometimes long, but stable on code changes (no added numbers to avoid collisions for example).
Something interesting is that before the run_symbolic
we also do:
pose (default.Impl_Default_for_integer.run_default IntegerKind.U64).
pose (default.Impl_Default_for_integer.run_default IntegerKind.I64).
pose Impl_Default_for_MemoryGas.run_default.
run_symbolic.
This is to have the link of the instances of the trait methods we will call. For now, they need to be explicitly in context. You could have also done a destruct of the whole instance of the trait, with for example:
destruct (default.Impl_Default_for_integer.run IntegerKind.U64).
as it reveals all the methods.
Associated types
For short, we represent associated types in traits as additional parameters for the trait. Here is an example with the Deref
trait:
(*
pub trait Deref {
type Target: ?Sized;
fn deref(&self) -> &Self::Target;
}
*)
Module Deref.
Definition trait (Self : Set) `{Link Self} : TraitMethod.Header.t :=
("core::ops::deref::Deref", [], [], Φ Self).
Definition Run_deref
(Self : Set) `{Link Self}
(Target : Set) `{Link Target} :
Set :=
TraitMethod.C (trait Self) "deref" (fun method =>
forall (self : Ref.t Pointer.Kind.Ref Self),
Run.Trait method [] [] [ φ self ] (Ref.t Pointer.Kind.Ref Target)
).
Class Run
(Self : Set) `{Link Self}
(Target : Set) `{Link Target} :
Set := {
deref : Run_deref Self Target;
}.
End Deref.
The reason is that we can force these associated types to have a certain value in Rust. In that case, we can provide this value as a parameter. If this was not the case, we would be using existential types that are well supported in Rocq.
When a trait depends on other traits, for its Self
type, polymorphic types, or associated types, and when these types have associated types, we propagate these types parameters. For example, for the DerefMut
trait:
(*
pub trait DerefMut: Deref {
fn deref_mut(&mut self) -> &mut Self::Target;
}
*)
Module DerefMut.
Definition trait (Self : Set) `{Link Self} : TraitMethod.Header.t :=
("core::ops::deref::DerefMut", [], [], Φ Self).
Definition Run_deref_mut
(Self : Set) `{Link Self}
(Target : Set) `{Link Target} :
Set :=
TraitMethod.C (trait Self) "deref_mut" (fun method =>
forall (self : Ref.t Pointer.Kind.MutRef Self),
Run.Trait method [] [] [ φ self ] (Ref.t Pointer.Kind.MutRef Target)
).
Class Run
(Self : Set) `{Link Self}
(Target : Set) `{Link Target} :
Set := {
deref_mut : Run_deref_mut Self Target;
}.
End DerefMut.
When associated types become too numerous (as these parameters are not visible on the Rust code, the programmers might introduce a lot of them), a pattern is to group them in a single record of types. You can see this approach used at the end of the file revm/revm_interpreter/links/interpreter_types.v with the definition of the InterpreterTypes
trait. Here is an example of use of this trait, in the revm/revm_interpreter/instructions/links/arithmetic.v file:
(*
pub fn add<WIRE: InterpreterTypes, H: Host + ?Sized>(
interpreter: &mut Interpreter<WIRE>,
_host: &mut H,
)
*)
Instance run_add
{WIRE H : Set} `{Link WIRE} `{Link H}
{WIRE_types : InterpreterTypes.Types.t} `{InterpreterTypes.Types.AreLinks WIRE_types}
(run_InterpreterTypes_for_WIRE : InterpreterTypes.Run WIRE WIRE_types)
(interpreter : Ref.t Pointer.Kind.MutRef (Interpreter.t WIRE WIRE_types))
(_host : Ref.t Pointer.Kind.MutRef H) :
Run.Trait
instructions.arithmetic.add [] [ Φ WIRE; Φ H ] [ φ interpreter; φ _host ]
unit.
We do not reference the Host
trait in the example above as the variable _host
is not used.