cs252r : Advanced Functional Programming - Fall 2006
These pages are a record of the in-class discussions for the graduate class "Advanced Functional Programming" given at Harvard University in the Fall of 2006.November 20, 2006
- Towards Efficient, Typed {LR} Parsers. Fran{c c}ois Pottier and Yann R{'e}gis-Gianas.
Analysis of the paper
What is this contribution of this paper?
This paper is a good example of encoding invariants in the type system, and is a good example of using GADTs for this purpose.
In which of the following venues would you judge it most appropriate to publish this paper: a major conference, a minor conference, or a workshop?
One vote for minor conference, one vote for workshop.
In Section 6.1, where have the stack tags gone?
The stack tags are implicit in the "GADT value constructors". The type arguments in the GADT ensure that the tags always correspond to the data constructors.
In Section 6.2, limit the type state to just two alternatives, S0 and S1, and using either Haskell 98 or ML, write a datatype definition that is the closest possible analog to state. What are the types of S0 and S1?
data State a = S0 | S1 S1 :: forall a. State a S0 :: forall a. State a
remark: In this paper, the GADTs are allowing us to instantiate these polymorphic types with Empty and (Ce Empty) respectively.
How would you characterize the result type of any value constructor of an algebraic datatype defined in Haskell 98 or ML?
Unlike GADTs, "normal" ADT data constructors are polymorphic in all of their type constructor's arguments.
Application: VMs
Here is a simple stack-based virtual machine, which supports first-class functions. Below are the values, instructions, and the interpreter.
data Val = I Int
| S String
| B Bool
| P Val Val -- pair
| F (Val -> Val)
data Inst = Push Val | Pop | Pair | Fst | Snd | Add | App | Halt
step :: Inst -> [Val] -> [Val]
step (Push v) vs = v : vs
step Pop (v : vs) = vs
step Fst (P a b : vs) = a : vs
step Snd (P a b : vs) = b : vs
step Pair (a : b : vs) = P a b : vs
step Add (I a : I b : vs) = I (a+b) : vs
step App (v : F f : vs) = f v : vs
step Halt _ = error "halt"
interp :: [Inst] -> Val
interp is = run is []
where run (Halt : _) (v : _) = v
run (i : is) vs = run is (step i vs)
In the virtual machine, where are the "coordinate data structures" mentioned in the paper?
[Val] and Inst: Inst depends on Val.
What sort of VM stack is expected then the next instruction is Pair? How about Fst?
Pair expects (a:b:vs)
Fst expects (P a b : vs)
Suppose that a sequence of instructions is the result of translating a well-typed source program. Does the VM have an invariant similar to the invariant shown in Figure 4? If so, sketch a few of the salient parts of the invariant.
The invariant is contained in the step function: for a well-typed program, the patter match is total.
Can the tagging overhead be eliminated using generalized algebraic data types (GADTs)? If so, explain how. If not, either explain what modifications to the interpreter would be necessary to make it so, or else explain why it's never going to work, no matter how much we hack the interpreter.
remark: To answer this question, it's fair game to modify any of the data types or functions shown above.
One group tried to work with a modified Val type:
data Val a where I :: Int -> Val Int ...
Another group tried various things, but nothing conclusive.
remark: One of the key ideas at work here is phantom types. We can add a type parameter to the Inst type without changing its definition. The type parameter does not appear in the definition, it is "phantom". We can then use the type parameter to relate the instruction value to the types of other values, for instance, the stack value. These types are "interesting":
data Inst s s' = .. same as above ... data E -- a void type type Empty = Stack E E -- A stack with a value of type a at the top, -- followed by a stack with a value of type b at the top. data Stack a b where Empty :: Empty Push :: a -> (Stack b c) -> Stack a (Stack b c) x :: Stack Int (Stack Char Empty) x = Push 1 (Push '1' Empty)
Compiling to 2D
Suppose you replace the Val type with the following type, which is more suited to the 2D language:
data V2d = U | P' V2d V2d | Inl V2d | Inr V2d
Suppose that you then drop inappropriate instructions and add a Case instruction:
data I2d = Push' V2d | Pop'
| Pair' | Fst' | Snd'
| L' | R'
| Case' [I2d] [I2d]
Would the Haskell type [I2d] be a good representation of a 2D program? Would it be a good intermediate representation in a compiler targeting 2D?
BibTeX
@Article{Fran{c-c}ois-Pottier-and-Yann-R{'e}gis-Gianas2006
, number = 2
, author = "Fran{c c}ois Pottier and Yann R{'e}gis-Gianas"
, journal = "Electr. Notes Theor. Comput. Sci"
, title = "Towards Efficient, Typed {LR} Parsers"
, volume = 148
, url = "http://dx.doi.org/10.1016/j.entcs.2005.11.044"
, pages = "155--180"
, year = 2006
}