Bayesian graphical models have become an important explanatory strategy in cognitive science ([#!Knill96!#],[#!Kording04!#], [#!Stocker05!#]). Recent work strongly supports their biological plausibility in general and that of dynamic Bayesian models in particular [#!Rao05!#]. Dynamic models are geared towards prediction and classification of sequences. As such, they are naturally suitable for language modeling and have already been aplied to tasks like speech recognition [#!Bilmes03!#] and part-of-speech tagging [#!Peshkin03!#]. However, grammar learning and parsing with such models generally appears out-of-reach, because of their Markovian character.
Markov models restrict possible dependencies to a bounded, local context. At one extreme, the context is confined to the symbol occupying the current position in the sequence (order-0 or unigram models). In more relaxed versions, context may include a fixed number of positions before the current symbol (k-order), typically no more than three (trigram models). The restricted space of possible dependencies allows transition probabilities to be infered from the data and stored in a look-up table with relatively little technical sophistication.
Not surprisingly however, the restricted space of representable dependencies is also the main disadvantage of Markov models in syntax-related tasks like parsing. Syntactic dependencies in natural language are unboundedly non-local, in the sense that no fixed amount of context is guaranteed to contain the members of a given constituent. For example, consider the sentences in examples (
-
). In the first sentence, the subject king and verb bought are adjacent to one another. Thus, the dependecy between them would be captured by a bigram (order-1) model. However, the same model would be unable to represent the dependendency in the second example, because the subject and verb are separated by two words. To capture this dependency, we need a 3rd-order Markov model. Similarly, the 3rd-order model would prove inadequate for the third example, where the subject and verb are separated by four words.
Our solution to this problem relies on representing sentences with non-local dependencies like (
,
) as derived from their local dependency variants, akin to (
). This intuition is based on the formal notion that a string with non-local dependency is obtained from a dependency tree via a recursive linearization procedure. The string obtained at each step of the linearization procedure contains new local dependencies, which push apart local dependencies from previous levels. This way of conceptualizing the linearization of syntactic structure allows us to use a Dynamic Bayesian Network despite its Markov properties. We construct a DBN parser which decides only on local attachments. We then call the parser recursively to uncover the underlying dependency tree. Our results show that the model captures grammatical knowledge for all levels of the derivation. The biological plausibility and remarkable compactness of learned representation may suggest that parsing in the brain is accomplished in a similar manner.
Leonid Peshkin 2005-09-22