Library stlc

This file models a very simple functional language known as the simply-typed lambda calculus (with nat's).
Require Import String.
Require Import List.

We only have two kinds of types: natural numbers and functions.
Inductive type : Type :=
| Nat_t : type
| Arrow_t : type -> type -> type.

We will represent variables as strings -- later, we'll discuss alternative representations and the problems that arise when using simple first-order data structures to represent and reason about variables.
Definition var := string.

Expressions have only 4 forms: variables, constants, functions, and function calls.
Inductive exp : Type :=
| Var_e : var -> exp
| Num_e : nat -> exp
| Lam_e : var -> exp -> exp
| App_e : exp -> exp -> exp.

When we evaluate a program, we get out a value. Values are a subset of the expressions and include just numbers and functions.
Definition value(e:exp) : Prop :=
  match e with
    | Num_e _ => True
    | Lam_e _ _ => True
    | _ => False
  end.

We'll model function call using substitution. Note that this is not a form of capture-avoiding substitution, so it assumes that the free variables of v do not conflict with the bound variables in e. Here, we're only going to consider substituting closed expressions (those with no free variables) so this won't be an issue.
Fixpoint subst(v:exp)(x:var)(e:exp) :=
  match e with
    | Var_e y => if string_dec x y then v else Var_e y
    | Num_e n => Num_e n
    | Lam_e y e => if string_dec x y then Lam_e y e else Lam_e y (subst v x e)
    | App_e e1 e2 => App_e (subst v x e1) (subst v x e2)
  end.

A little notation for substitution.
Notation "e @ [ x '|->' v ] " := (subst v x e) (at level 60).

We reserve some notation for use in the definition below. This will be our big-step semantics for expressions and is intended to mean that expression e evaluates to value v. Notice that here we are modeling call-by-name, whereas most languages use call-by-value. In particular, we don't evaluate the argument before substituting it for a formal parameter. In a call-by-value setting, we evaluate the argument before substitution. In a call-by-need setting (e.g., Haskell) we have to do something much more complicated.
Reserved Notation "e ==> v" (at level 70).

Inductive cbn_evals : exp -> exp -> Prop :=
| Num_cbn : forall n, Num_e n ==> Num_e n
| Lam_cbn : forall x e, Lam_e x e ==> Lam_e x e
| App_cbn : forall e1 e2 x e1' v,
  e1 ==> Lam_e x e1' ->
  e1' @ [ x |-> e2 ] ==> v ->
    App_e e1 e2 ==> v
where "e ==> v" := (cbn_evals e v).

Lemma evals_value : forall e1 e2, cbn_evals e1 e2 -> value e2.
Proof.
  induction 1 ; subst ; simpl ; auto.
Qed.

Let us now define a small-step semantics for our language. We begin by defining the notion of a single step.
Reserved Notation "e1 --> e2" (at level 70).

Inductive cbn_step : exp -> exp -> Prop :=
| Beta_cbn : forall x e1 e2,
  App_e (Lam_e x e1) e2 --> e1 @ [ x |-> e2 ]
| App_steps : forall e1 e2 e1',
  e1 --> e1' ->
  App_e e1 e2 --> App_e e1' e2
where "e1 --> e2" := (cbn_step e1 e2).

We then extend this to a reflexive, transitive closure of the single-step relation.
Reserved Notation "e1 '-->*' e2" (at level 70).
Inductive cbn_steps : exp -> exp -> Prop :=
| Refl_cbn : forall e, e -->* e
| Step_cbn : forall e1 e2 e3, e1 --> e2 -> e2 -->* e3 -> e1 -->* e3
where "e1 '-->*' e2" := (cbn_steps e1 e2).

Lemma steps_append : forall e1 e2 e3,
  e1 -->* e2 ->
  e2 -->* e3 ->
  e1 -->* e3.
Proof.
  Hint Constructors cbn_steps.
  induction 1; eauto.
Qed.

Lemma steps_app : forall e1 e2 e1',
  e1 -->* e1' ->
  App_e e1 e2 -->* App_e e1' e2.
Proof.
  Hint Constructors cbn_steps.
  Hint Constructors cbn_step.
  induction 1 ; eauto.
Qed.

Lemma steps_cons : forall e1 e2,
  e1 --> e2 ->
  forall e3, e2 ==> e3 -> e1 ==> e3.
Proof.
  induction 1 ; intros. econstructor ; eauto. econstructor.
  inversion H0. subst. econstructor. apply IHcbn_step. eauto. auto.
Qed.

We can argue that the two semantics are equivalent.
Lemma cbn_eval_equiv_cbn_steps :
  forall e v, value v -> (e ==> v <-> e -->* v).
Proof.
  intros ; split ; induction 1 ; simpl in * ; try congruence ; try (econstructor ; fail).
  specialize (IHcbn_evals1 I). specialize (IHcbn_evals2 H).
  eapply steps_append. eapply steps_app. eauto. eauto.
  destruct e ; simpl in H ; try contradiction ; constructor.
  eapply steps_cons ; eauto.
Qed.

Now we're going to consider a well-formedeness (typing) relation on terms. The basic idea is that we want to rule out "stuck" expressions where e.g., we have an unbound variable, or we pass an argument of the wrong type to a function, or we try to "apply" a number instead of a function.

We begin by defining contexts as lists of variables and types -- these are our assumptions. If you like, this is a kind of symbol table.
Definition context(A:Type) := list (var * A).

We use a last-in-first-out lookup to lookup a variable's type.
Fixpoint lookup {A} (x:var) (G:context A) : option A :=
  match G with
    | nil => None
    | p::rest => if string_dec x (fst p) then Some (snd p) else lookup x rest
  end.

Our typing relation can be read: Under the assumptions in context G, expression e has type t. That is, when evaluated, e will not get stuck and will produce a value of type t.
Reserved Notation "G '|-' e ';' t" (at level 70).

Inductive hasType : context type -> exp -> type -> Prop :=
| Nat_ht : forall G n, G |- Num_e n ; Nat_t
| Var_ht : forall G x t, lookup x G = Some t -> G |- Var_e x ; t
| App_ht : forall G e1 e2 t2 t,
  G |- e1 ; Arrow_t t2 t ->
    G |- e2 ; t2 ->
      G |- App_e e1 e2 ; t
| Lam_ht : forall G x e t1 t2,
  (x,t1)::G |- e ; t2 ->
    G |- Lam_e x e ; Arrow_t t1 t2
where "G |- e ';' t" := (hasType G e t).

Next, we want to prove that well-typed programs won't get stuck. Alternatively, we can argue that every well-typed program is already a (well-typed) value, or else it can take a step to a well-typed program. That's the positive way of saying it won't get stuck. Notice that we can't easily talk about this using the big-step semantics because it doesn't differentiate getting stuck from possible divergence. So we'll use the small-step semantics to argue this.

Our proof will break into two lemmas. One is a lemma that says well-formedness is preserved when taking a step. The other argues that if we're well-formed, then we are either a value or can take a step. Then soundness follows by induction on the number of steps we take.

Lemma progress' : forall G e t,
  G |- e ; t ->
  G = nil ->
      value e \/ (exists e', e --> e').
Proof.
  induction 1 ; intros ; subst ; (try (left ; simpl ; auto ; fail)) ; right ;
    simpl in * ; try congruence.
  specialize (IHhasType1 (eq_refl nil)). destruct IHhasType1.
  inversion H ; subst ; clear H ; simpl in * ; try contradiction.
  eauto. destruct H1. eauto.
Qed.

progress follows as a corollary of the above.
Lemma progress : forall e t,
  nil |- e ; t ->
    value e \/ (exists e', e --> e').
Proof.
  intros. eapply progress' ; eauto.
Qed.

Next, we need to define a substitution lemma, but to do so, we need a lot of auxiliary lemmas dealing with the context. Here's one group that I found good enough to push the result through.

Lookup on two contexts appended can be broken into two steps.
Lemma lookup_weaken : forall {A} x (G1 G2:context A),
  lookup x (G1 ++ G2) =
  match lookup x G1 with
    | None => lookup x G2
    | Some t => Some t
  end.
Proof.
  induction G1 ; simpl ; auto ; intros. destruct a. simpl.
  destruct (string_dec x v). auto. auto.
Qed.

We can always add more assumptions to the right of a proof of has-type.
Lemma weaken_right : forall G e t,
  G |- e ; t ->
    forall G',
      G ++ G' |- e ; t.
Proof.
  induction 1 ; intros. constructor. constructor. rewrite lookup_weaken.
  rewrite H. auto. econstructor ; eauto. econstructor. simpl in IHhasType.
  eauto.
Qed.

As a special case, when e has type t under the empty context, it continues to have type t under any context.
Lemma weaken : forall e t, nil |- e ; t -> forall G, G |- e ; t.
Proof.
  intros. generalize (weaken_right _ _ _ H G). simpl. auto.
Qed.

This lemma is used in the substitution case where we have a lambda that shadows the variable we are substituting for. In essence, it says we can drop the shadowed variable in the context and still construct a proof that the term e has type t.
Lemma shadow : forall G e t,
  G |- e ; t ->
    forall G1 x t1 t2,
      G = G1 ++ (x,t2)::nil ->
      lookup x G1 = Some t1 ->
      G1 |- e ; t.
Proof.
  induction 1 ; intros ; subst. constructor. constructor.
  rewrite lookup_weaken in H. simpl in H. remember (lookup x G1) as e.
  destruct e. auto. destruct (string_dec x x0). subst. congruence. congruence.
  econstructor ; eauto.
  constructor. destruct (string_dec x0 x). subst. eapply IHhasType. simpl. eauto.
  simpl. destruct (string_dec x x) ; eauto. congruence.
  eapply (IHhasType ((x,t1)::G1)). simpl. eauto. simpl.
  destruct (string_dec x0 x) ; try congruence. eauto.
Qed.

Our substitution lemma has to be stated carefully to get the induction to go through, and so that we can use the auxiliary lemmas above to discharge all of the cases.
Lemma substitution : forall e2 t2 G e t x,
  nil |- e2 ; t2 ->
  G |- e ; t ->
    forall G1,
      lookup x G1 = None ->
      G = G1 ++ (x,t2)::nil ->
      G1 |- e @ [x |-> e2] ; t.
Proof.
  induction 2 ; simpl ; intros ; subst. constructor.
  destruct (string_dec x x0). subst.
  rewrite lookup_weaken in H0. rewrite H1 in H0. simpl in H0.
  destruct (string_dec x0 x0) ; try congruence. injection H0 ; intros ; subst.
  apply weaken. auto. constructor. rewrite lookup_weaken in H0.
  simpl in H0. destruct (string_dec x0 x) ; try congruence. rewrite <- H0.
  destruct (lookup x0 G1) ; auto. econstructor. eauto. eauto.
  destruct (string_dec x x0). econstructor. eapply shadow. eauto. simpl. eauto. subst.
  simpl. destruct (string_dec x0 x0) ; try congruence. eauto.
  constructor. eapply IHhasType. simpl. destruct (string_dec x x0) ; try congruence.
  simpl. auto.
Qed.

Once we have substitution, the rest of preservation is trivial.
Lemma preservation : forall e1 e2,
  e1 --> e2 ->
  forall t,
  nil |- e1 ; t ->
    nil |- e2 ; t.
Proof.
  induction 1 ; intros. inversion H ; clear H ; subst. inversion H3 ; clear H3 ; subst.
  eapply substitution ; eauto.
  inversion H0 ; clear H0 ; subst. specialize (IHcbn_step (Arrow_t t2 t) H4).
  econstructor ; eauto.
Qed.

And once we have progress and preservation, we can prove soundness.
Theorem soundness : forall e t e',
  nil |- e ; t ->
    e -->* e' -> value e' \/ (exists e'', e' --> e'').
Proof.
  intros e t e' H1 H2. induction H2. apply (progress _ _ H1).
  apply IHcbn_steps. eapply preservation ; eauto.
Qed.

Next, I'm going to show you a couple more ways to evaluate expressions in our simply typed lambda calculus. We'll begin with an environment- based semantics, where instead of substituting terms for variables, we carry around an environment, mapping variables to terms, as part of our evaluation. If you will, we're simultaneously substituting the environment while we're evaluating. Of course, we stop evaluating in some situations (e.g., when we hit a lambda). In those situations, we need to keep around the substitution that we promised to do. This pair, of a substitution and a terms, is known as a closure. In what follows, we give call-by-name and call-by-value, environment-based semantics.

We need a way to represent closures -- at this point, it's useful to break out values as a distinct category. We could simplify this definition by getting rid of the Num_v case and just carry around a substitution, but that's a little less traditional.
Inductive val : Type :=
| Num_v : nat -> val
| Closure_v : context val -> exp -> val.

Inductive env_cbn_evals : context val -> exp -> val -> Prop :=
| Num_env_cbn : forall rho n, env_cbn_evals rho (Num_e n) (Num_v n)
| Lam_env_cbn : forall rho x e, env_cbn_evals rho (Lam_e x e) (Closure_v rho (Lam_e x e))
| Var_env_cbn : forall rho x rho' e v,
  lookup x rho = Some (Closure_v rho' e) ->
    env_cbn_evals rho' e v -> env_cbn_evals rho (Var_e x) v
| App_env_cbn : forall rho e1 e2 rho' x e v,
  env_cbn_evals rho e1 (Closure_v rho' (Lam_e x e)) ->
  env_cbn_evals ((x,Closure_v rho e2)::rho') e v ->
  env_cbn_evals rho (App_e e1 e2) v.

Inductive env_cbv_evals : context val -> exp -> val -> Prop :=
| Num_env_cbv : forall rho n, env_cbv_evals rho (Num_e n) (Num_v n)
| Lam_env_cbv : forall rho x e, env_cbv_evals rho (Lam_e x e) (Closure_v rho (Lam_e x e))
| Var_env_cbv : forall rho x v,
  lookup x rho = Some v -> env_cbv_evals rho (Var_e x) v
| App_env_cbv : forall rho e1 e2 rho' x e v2 v,
  env_cbv_evals rho e1 (Closure_v rho' (Lam_e x e)) ->
  env_cbv_evals rho e2 v2 ->
  env_cbv_evals ((x,v2)::rho') e v ->
  env_cbv_evals rho (App_e e1 e2) v.

Here's a more fascinating way to define STLC. We begin by defining a term language that incorporates typing. So you simply cannot build a term that is not well-typed.
Inductive texp : context type -> type -> Type :=
| Var_te : forall x G t, lookup x G = Some t -> texp G t
| Num_te : forall G, nat -> texp G Nat_t
| Lam_te : forall x G t1 t2, texp ((x,t1)::G) t2 -> texp G (Arrow_t t1 t2)
| App_te : forall G t1 t2, texp G (Arrow_t t1 t2) -> texp G t1 -> texp G t2.

Now we give a denotation for our types as Coq Types.
Fixpoint tinterp(t:type) : Type :=
  match t with
    | Nat_t => nat
    | Arrow_t t1 t2 => (tinterp t1) -> (tinterp t2)
  end.

We represent a context as a tuple Type, where we've applied the type interpretation to the type components of the context.
Fixpoint cinterp(G:context type) : Type :=
  match G with
    | nil => unit
    | p::rest => (tinterp (snd p)) * (cinterp rest)
  end%type.

It's going to be a little easier to write this function using tactic mode than doing an explicit bit of programming, at least until I teach you how to do dependent matches. This basically says that given a variable x, context G, and type t, such that G(x) = t, and given a tuple X in the interpretation of G, we can produce a value of type tinterp t. We do so by simply running down the context G and the tuple X in parallel, until we find the variable we are looking for.
Lemma look : forall x t G, lookup x G = Some t -> cinterp G -> tinterp t.
Proof.
  induction G ; simpl.
    intro. discriminate H.
  destruct a as (y,t'). simpl in *.
  destruct (string_dec x y).
  intro. injection H. intro. rewrite H0.
  intro env. apply (fst env).
  intros H env.
  apply (IHG H (snd env)).
Defined.

So given a typed expression e (under context G and with type t), and given an environment in the interpretation of G, we can produce a value of type t. This is kind of magical, as note that we're essentially proving that evaluation of all simply typed terms always terminates -- if they didn't, then we couldn't write a Coq function to do the evaluation!
Fixpoint denote {G t} (e : texp G t) : cinterp G -> tinterp t :=
  match e in texp G t return cinterp G -> tinterp t with
    | Var_te x G t H => fun env => look x t G H env
    | Num_te G n => fun _ => n
    | Lam_te x G t1 t2 e => fun env => fun x => denote e (x,env)
    | App_te G t1 t2 e1 e2 => fun env => (denote e1 env) (denote e2 env)
  end.