<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:iweb="http://www.apple.com/iweb" version="2.0">
  <channel>
    <title>Papers</title>
    <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Papers.html</link>
    <description>Please email me with any code/data requests or discussion.</description>
    <generator>iWeb 3.0.4</generator>
    <image>
      <url>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Papers_files/CIMG0028.jpg</url>
      <title>Papers</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Papers.html</link>
    </image>
    <item>
      <title>Estimating Compact Yet Rich Tree Insertion Grammars</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2012/9/27_Estimating_Compact_Yet_Rich_Tree_Insertion_Grammars.html</link>
      <guid isPermaLink="false">60bd744c-a13f-4848-bf6f-395eeca1e412</guid>
      <pubDate>Thu, 27 Sep 2012 13:35:50 -0400</pubDate>
      <description>Elif Yamangil and Stuart Shieber. Estimating Compact Yet Rich Tree Insertion Grammars. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, July 8-14, 2012.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.aclweb.org/anthology-new/P/P12/P12-2022.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.&lt;br/&gt;</description>
    </item>
    <item>
      <title>Correction Detection and Error Type Selection as an ESL Educational Aid</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2012/9/27_Correction_Detection_and_Error_Type_Selection_as_an_ESL_Educational_Aid.html</link>
      <guid isPermaLink="false">91b4dd87-77ed-46a9-b66d-a5e4fb897268</guid>
      <pubDate>Thu, 27 Sep 2012 13:28:47 -0400</pubDate>
      <description>Ben Swanson and Elif Yamangil. Correction Detection and Error Type Selection as an ESL Educational Aid. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, Canada, June 3-8, 2012.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://aclweb.org/anthology-new/N/N12/N12-1037.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;We present a classifier that discriminates between types of corrections made by teachers of English in student essays. We define a set of linguistically motivated feature templates for a loglinear classification model, train this classifier on sentence pairs extracted from the Cambridge Learner Corpus, and achieve 89% accuracy improving upon a 33% baseline. Furthermore, we incorporate our classifier into a novel application that takes as input a set of corrected essays that have been sentence aligned with their originals and outputs the individual corrections classified by error type. We report the F-Score of our implementation on this task.&lt;br/&gt;</description>
    </item>
    <item>
      <title>Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2011/1/19_Bayesian_Synchronous_Tree-Substitution_Grammar_Induction_and_Its_Application_to_Sentence_Compression.html</link>
      <guid isPermaLink="false">e6f96a3b-62e8-490b-bd09-f09a8b5d8954</guid>
      <pubDate>Wed, 19 Jan 2011 19:11:06 -0500</pubDate>
      <description>Elif Yamangil and Stuart M. Shieber. Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.aclweb.org/anthology/P/P10/P10-1096.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.&lt;br/&gt;</description>
    </item>
    <item>
      <title>Mining Wikipedia Revision Histories for Improving Sentence Compression</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2011/1/19_Mining_Wikipedia_Revision_Histories_for_Improving_Sentence_Compression.html</link>
      <guid isPermaLink="false">abd99473-df43-4198-9aa4-561b5b04f691</guid>
      <pubDate>Wed, 19 Jan 2011 18:25:14 -0500</pubDate>
      <description>Elif Yamangil and Rani Nelken. Mining Wikipedia Revision Histories for Improving Sentence Compression. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, June 15-20, 2008.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.aclweb.org/anthology/P/P08/P08-2035.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this newfound data, we propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.&lt;br/&gt;</description>
    </item>
    <item>
      <title>Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2011/1/19_Mining_Wikipedias_Article_Revision_History_for_Training_Computational_Linguistics_Algorithms.html</link>
      <guid isPermaLink="false">11bf4830-5d6a-4f2f-b8aa-a6407b15f014</guid>
      <pubDate>Wed, 19 Jan 2011 17:41:12 -0500</pubDate>
      <description>Rani Nelken and Elif Yamangil. Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms. In Proceedings of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, Chicago, Illinois, July 13-14, 2008.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.eecs.harvard.edu/~elif/pubs/eggcorn.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;We present a novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia’s article revision history. By comparing adjacent versions of the same article, we extract voluminous training data for tasks for which data is usually scarce or costly to obtain. We illustrate this paradigm by applying it to three separate text processing tasks at various levels of linguistic granularity. We first apply this approach to the collection of textual errors and their correction, focusing on the specific type of lexical errors known as “eggcorns”. Second, moving up to the sentential level, we show how to mine Wikipedia revisions for training sentence compression algorithms. By dramatically increasing the size of the available training data, we are able to create more discerning lexicalized models, providing improved compression results. Finally, moving up to the document level, we present some preliminary ideas on how to use the Wikipedia data to bootstrap text summarization systems. We propose to use a sentence’s persistence throughout a document’s evolution as an indicator of its fitness as part of an extractive summary.&lt;br/&gt;</description>
    </item>
    <item>
      <title>Towards collaborative intelligent tutors: Automated recognition of users' strategies</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2011/1/19_Towards_collaborative_intelligent_tutors__Automated_recognition_of_users_strategies.html</link>
      <guid isPermaLink="false">7e5317c4-6c91-4d25-9843-7d67bf6fdcfa</guid>
      <pubDate>Wed, 19 Jan 2011 15:54:44 -0500</pubDate>
      <description>Ya'akov Gal, Elif Yamangil, Stuart M. Shieber, Andee Rubin, and Barbara J. Grosz. Towards collaborative intelligent tutors: Automated recognition of users' strategies. In Proceedings of the 9th International Conference on Intelligent Tutoring Systems, Montreal, Canada, 23-27 June 2008.&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.eecs.harvard.edu/~shieber/Biblio/Papers/gal-2008-tci.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;This paper addresses the problem of inferring students’ strategies when they interact with data-modeling software that is used for pedagogical purposes. This software enables students to learn about statistical data by building and analyzing their own models. Automatic recognition of students’ activities when interacting with pedagogical software is challenging. Students can pursue several plans in parallel and interleave the execution of these plans. The algorithm presented in this paper decomposes students’ complete interaction histories with the software into hierarchies of interdependent tasks that may be subsequently compared with ideal solutions. This algorithm is evaluated empirically using commercial software that is used in many schools. Results indicate that the algorithm is able to (1) identify the plans students use when solving problems using the software; (2) distinguish between those actions in students’ plans that play a salient part in their problem-solving and those representing exploratory actions and mistakes; and (3) capture students’ interleaving and free-order action sequences.&lt;br/&gt;</description>
    </item>
    <item>
      <title>Scalable Lexical Correction from Wikipedia Edits Using Perceptron Reranking</title>
      <link>http://www.eecs.harvard.edu/%7Eelif/EECS/Papers/Entries/2011/1/19_Scalable_Lexical_Correction_from_Wikipedia_Edits_Using_Perceptron_Reranking.html</link>
      <guid isPermaLink="false">8349cd95-46e4-4ce0-9919-85d6e9f197c2</guid>
      <pubDate>Wed, 19 Jan 2011 16:10:22 -0500</pubDate>
      <description>Elif Yamangil and Rani Nelken. Scalable Lexical Correction from Wikipedia Edits Using Perceptron Reranking. Coursework for CS 287: Natural Language Processing, Spring 2008 (unpublished).&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://www.eecs.harvard.edu/~elif/pubs/wikiedit.pdf&quot;&gt;Click here for pdf&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;We propose a novel model of large-scale lexical correction of all document words, including both context-sensitive spelling correction and stylistic lexical modifications, trained on Wikipedia’s edit revisions. In this task, we wish to correct all possible errors, rather than focusing on a set of predetermined target words, making the learning problem much more difficult. Our contribution is twofold. First, we find a new source of training data for text corrections by mining Wikipedia’s edit history. Since Wikipedia articles are edited collaboratively, errors introduced by one writer are likely to be subsequently corrected by others. We mine a set of 1.5 million such correction training samples. Second, we use the Wikipedia data to train a novel model of text correction, based on a generative HMM, and a reranking perceptron, forming a highly effective model of correction. We evaluate our method against context-sensitive spelling correction, obtaining state-of-the-art accuracy at a more general setting.&lt;br/&gt;</description>
    </item>
  </channel>
</rss>
