I want to construct an abstract machine that is somewhat faithful to the time and space requirements of a more realistic functional language implementation. The goal here is to examine *intensional* properties of programs (such as time, space, etc.) and not just extensional aspects (eg. input/output). Let's start with the following expression language: e ::= var(i) | i | p | () | (e1,e2) | #i e | \e | e1 e2 Expressions include tuples, lambda-expressions, etc. However, instead of using variables, we're going to use deBruijn notation. Recall, that var(i) refers to the ith-nearest enclosing lambda. The role of pointers will become clearer below where we see that we'll be using an *allocation*-based semantics. In such a semantics, the only values that we'll directly manipulate are small variables such as integers (i), unit (), and pointers (p). Thus, larger values (e.g., tuples) must be allocated and referenced by pointer. As we saw earlier, to build a "tail-recursive" interpreter for such a language, we'll need to use a control-stack of some sort. This allows us to avoid re-computing the evaluation context over-and-over again. In addition, to avoid the expensive operation of substituting a value for a variable within some code, we'll be using an *environment-based* semantics. Environments will be represented using tuples and a variable will be treated as an environment lookup. With this informal overview in mind, here are the syntactic constructs of our abstract machine: (configurations) M ::= (H,S,v,e) (heaps) H ::= {p1=h1,...,pn=hn} (heap values) h ::= (v1,v2) | [\e,v] (small values) v ::= i | p | () (expressions) e ::= var(i) | v | (e1,e2) | #i e | \e | e1 e2 (stacks) S ::= nil | (F,v)::S (frames) F ::= ([],e) | (v,[]) | #i [] | [] e | v [] So a machine configuration (M) consists of a heap, a stack, a value (representing the current environment) and an expression to execute. Heaps are partial functions from pointers to heap-values. Heap-values are either pairs of small-values, or a closure consisting of a lambda and its environment (itself a small value.) In practice, lambda-expressions will be represented by reference to a piece of code (i.e., yet another kind of pointer) so notice that all heap values are really pairs of small values (of known size -- we're assuming machine integers here, not bignums.) Small values include integers, unit, and pointers. Expressions are as sketched above, but it is convenient to add closures [\e,v] so that they include all heap-values. Stacks are lists of frames coupled with an environemnt value. The frames record what to do with a value once it's computed. Note that in something like ML, we would write: datatype frame = LeftPair of exp | RightPair of value | Proj of int | LeftApp of exp | RightApp of value so frames are themselves simple data structures that include a tag and either a reference to an expression or a small-value. As we will see, it's crucial that we record the environment that was active when we push a frame on the stack. Now we can phrase the rewriting rules for this abstract machine as below. First, we deal with variables: H(p) = (v1,v2) ---------------------------- (H,S,p,var(0)) -> (H,S,p,v1) H(p) = (v1,v2) ----------------------------------- (H,S,p,var(i+1)) -> (H,S,v2,var(i)) The first rule says that the environment must be a pointer p bound in the heap to a tuple (v1,v2) and var(0) returns v1 as its result. That is, the 0th variable is at the head of the list represented by the environment pointer p. For var(i+1), we simply look at the tail of the list and extract var(i) from it. The next rule says we allocate heap-values: (H,S,v,h) -> (H+{p=h},S,v,p) So for instance we have (H,S,v,(1,2)) -> (H+{p=(1,2)},S,v,p). At this point it's worth remarking that we consider a heap H = {p1=h1,...,pn=hn} to bind all of the free occurrences of the pointers pi within the machine configuration and take configurations to be equivalent up to alpha-conversion of pointers. Thus, the abstract machine can always pick a "fresh" pointer p when allocating a heap-value. (We will formalize free and binding occurrences of pointers below.) Next we deal with all of the congruences which are responsible for pushing frames on the stack and evaluating a sub-expression: (H,S,v,(v1,e2)) -> (H,((v1,[]),v)::S,v,e2) (e2 not a small value) (H,S,v,(e1,e2)) -> (H,(([],e2),v)::S,v,e1) (e1 not a small value) (H,S,v,v1 e2) -> (H,(v1 [],v)::S,v,e2) (e2 not a small value) (H,S,v,e1 e2) -> (H,([] e2,v)::S,v,e1) (e1 not a small value) (H,S,v,#i e) -> (H,(#i [],v)::S,v,e) (e not a small value) In each of these cases, we're pushing a frame coupled with the current environment value v. We need to record what the environment was because rules, such as the variable rules, may end up throwing information away regarding the environment. (You may be tempted to avoid this overhead, so we'll come back to this later.) One thing to note here is that there is a simple, constant-time test to tell if an expression is or isn't a small-value. Note that with our earlier stack machine, we may have to crawl over a large value (e.g., ((v1,(v2,v3)),(v4,v5))) to determine this. As before, when the machine gets down to a (small) value, we can "return" the value by popping a frame from the stack of the abstract machine: (H,(F,v)::S,v',v1) -> (H,S,v,F[v1]) again, the constant-time check for being a small value helps the situtation. I'm going to define the good terminal configurations for this machine to be of the form (H,nil,v,i) where i is an integer. That is, well-typed programs will always produce integer results. I'll say that eval_R(M,i) if M -R->* (H,nil,v,i) for some evaluation relation R and I'll say eval_R(M,_|_) if there exists an infinite sequence M1,M2,M3,... s.t. M -R-> M1 -R-> M2 -R-> M3 -R-> ... The reduction of a projection operation is relatively straightforward: H(p) = (v1,v2) -------------------------- (H,S,v,#i p) -> (H,S,v,vi) Now we are left with lambda-expressions and application. Note that a lambda-expression is not itself a heap-value. Rather, we must turn the lambda into a closure. The way to think about this machine is that we're being lazy about substituting the environment within the expression. To avoid this, we're interleaving substitution of the environment with evaluation. Of course, when we get to a lambda, we stop evaluating, but we can't stop doing the substitution! So we must remember whenever we invoke the lambda to continue with the substitution that was in effect at the time we created the closure, which is generally not the environment at the point where the function is invoked. With these facts in mind, the rules for lambda and application are as follows: (H,S,V,\.e) -> (H+{p=[v,\e],S,v,p) Finally, the rule for application is as follows: H(p) = [v2,\e] ------------------------------------- (H,S,v,p v1) -> (H+{p'=(v1,v2)},S,p',e) Here, p must be bound in the heap to a closure [v2,\e]. We step to a configuration where we are evaluating e. But we also change environments. The new environment p' points to a list where the head is the argument to the function (v1) and the tail is the old environment that was packed up with the lambda to make the closure (v2). One of the things that I like about this abstract machine is that it is relatively faithful to the actual time it takes for a program to evaluate. In particular, none of the operations, with the exception of allocation, really requires more than constant time to implement on real machines. In this respect, the abstract machine that we've given could be used to calculate big-O running times of programs. Of course, ignoring allocation is probably a bad idea... Again, it is worth remarking that we don't need to save anything on the stack when we do a function call with this abstract machine---this is crucial for getting loops to run in constant stack space. Indeed, one is tempted to take this abstract machine as *the* definition of a language like Scheme where we would like to dictate to implementors that they must use constant stack space for iterative procedures. But of course, this doesn't work. Consider an implementor that first CPS-converts the program. Then we won't be doing *any* stack allocation, and yet satisfy the "letter" of the law. That's because all of the stack-frames will be represented using closures in the heap! Really, we must take garbage collection into account. But how do we do this? One option is to add a new rewriting rule to the language: FP(H1,S,v,e) = {} gc -------------------------- (H1+H2,S,v,e) => (H1,S,v,e) This gc rule says that we can eliminate a portion of the heap (H2) as long as the resulting program stays closed---that is, it has no dangling references to H2. Formally we define: FP(H,S,v,e) = (FP(H) + FP(S) + FP(v) + FP(e)) \ Dom(H) Dom({p1=h1,...,pn=hn}) = {p1,...,pn} FP({p1=h1,...,pn=hn}) = FP(h1) + ... + FP(hn) \ Dom(H) FP([\e,v]) = FP(e) + FP(v) FP(var(i)) = {} FP(i) = {} FP(()) = {} FP(p) = {p} FP((e1,e2)) = FP(e1) + FP(e2) FP(#i e) = FP(e) FP(\e) = FP(e) FP(e1 e2) = FP(e1) + FP(e2) We know that informally, the gc rule is justified since this is what tracing garbage collectors do. But how do we formalize this? What should a gc be allowed to do? One idea is that we could say a rewriting rule (H,S,v,e) => (H',S',v',e') is *safe* if adding it doesn't change the possible behaviors of any program. Formally, A rewriting relation R2 is safe with respect to R1 if for any configuration M and any answer ans (an integer or bottom), eval_R1(M,ans) <==> eval_(R1+R2)(M,ans). Unfortunately, the gc rule as stated above is *not* safe with respect to our evaluation relation. The reason is a small technical one---we can loop forever doing gc's on any program. We can fix this by defining an evaluation relation that forbids GC's from occurring one after the other. How do we prove that adding the gc rule is now safe? Two lemmas help with this: [GC postponement]: If M1 =gc=> M2 and M2 -> M3, then there exists an M4 s.t. M1 -> M4 =gc=> M3. [GC fusion]: If M1 =gc=> M2 =gc=> M3 then M1 =gc=> M3. This rule says that gc commutes with evaluation. So the intuition is that we can take any (finite) evaluation sequence which interleaves evaluation with gc and argue that all of the gc steps can be post-poned until the end and fused into one big gc-step. Since gc doesn't change the answer of the final configuration, it must result in the same answer. Now the great thing about our gc rule is that it can be applied non-deterministically without affecting the meaning of the program. The bad thing about the gc rule is that it is far from a constant- time "step" in the evaluation of our abstract machine. Furthermore, the non-determinism makes it hard to say just how much space or time we'll take, because we can effectively trade space for gc steps. One can formalize a small-step rewriting relation for gc (see Morrisett, Felleisen, & Harper, FPCA'95) to make those steps explicit for something like a copying-collector, which will reveal that the time to do a gc is roughly proportional to the stack and the part of the heap that is preserved. One could also imagine interleaving these small steps with the evaluation steps (plus some extra state) to get a form of incremental collection. Of course, all of this makes it very hard to say just how much time or space a given program will take. We have a very operational model of this, but we can only really find out answers by "running" the whole program (on ground inputs). I find this very disatisfying, but it doesn point out that in a modern language, you can't really trade space for time (at least if you're using a copying collector): some work proportional to the live data is done just to support allocation. And that work seems to grow with the live data. It would be nice if we could come up with a way of specifying a model that allows us to optimize the space-time product of a given algorithm. In my opinion this is still a big open problem. Another thing that I find interesting is that the definition of a "safe" rewriting relation above really supports a *semantic* notion of garbage collection. It allows us to do all sorts of crazy things like deallocating any object that isn't accessed in the future. It also allows us to rewrite code (i.e., online optimization) or to compress/rewrite the stack, as long as we get the same behavior out of the code. The idea of semantic garbage is intruiging because there are in fact algorithms that go beyond tracing to reclaim some reachable objects which, nonetheless, aren't used in the future. A good example of this is Baker's unification-based algorithm (rediscovered by Appel, and a few other folks.) To understand this, we need to see the typing rules for the abstract machine. The key rules are this: |- H : P P |- S : t'->int P |- v : G P;G |- e : t' -------------------------------------------------------- |- (H,S,v,e) : int All p in H. P;0 |- H(p) : P(p) ------------------------------ |- H : P The key here is the notion of heap typing: We say P describes H if whenever p=h is in the heap, P(p) is a type describing h. P is the "interface" between the heap and the rest of the machine including the stack, the environment, and the expression being evaluated. It turns out that if a given pointer is unreachable, we can assign it any type we like. Furthermore, we only need to assign a pointer a type that is consistent with how that pointer may be used by the stack, environment, and expression. If, for instance, p : (int*int) * (int*int) but the only free occurrence of p in the program is in e and is of the form #1 p, then we can assign p the type (int*int) * Top. If there's another use of p in the program that say performs #1 (#2 p) + #2 (#2 p) then we can't assign p this type. So the idea behind Baker's algorithm is to crawl around the expression e, the stack, and the environment and try to assign each pointer the least constained type we possibly can. In essence, we use ML-style type-checking during GC to come up with constraints on pointers and try to find the least specialized type that is consistent with those constraints. If we ever end up assigning a pointer the type Top, then we know the contents of that pointer will never be dereferenced, so we can safely deallocate the pointer. In practice, Baker's algorithm has proven too hard to implement and get real speedups. But it does suggest that we can do a lot more in terms of deallocation. As another example, the region inference work of Tofte & Talpin figures out that some values aren't needed in the future based on effect information. Still another example is that we could calculate when variables are live/dead and use this to avoid preserving values (in practice, this is crucial for minimizing leaks in a real language.) All of these tricks are aiming at closing in on semantic garbage.