Scalable Large-Margin Structured Learning: Theory and Algorithms

Liang Huang (CUNY), Kai Zhao (CUNY), and Lemao Liu (CUNY)

Much of NLP tries to map structured input (sentences) to some form of structured output (tag sequences, parse trees, semantic graphs, or translated/paraphrased/compressed sentences). Thus structured prediction and its learning algorithm are of central importance to us NLP researchers. However, when applying machine learning to structured domains, we often face scalability issues for two reasons:

1. Even the fastest exact search algorithms for most NLP problems (such as parsing and translation) is too slow for repeated use on the training data, but approximate search (such as beam search) unfortunately breaks down the nice theoretical properties (such as convergence) of existing machine learning algorithms.

2. Even with inexact search, the scale of the training data in NLP still makes pure online learning (such as perceptron and MIRA) too slow on a single CPU.

This tutorial reviews recent advances that address these two challenges. In particular, we will cover principled machine learning methods that are designed to work under vastly inexact search, and parallelization algorithms that speed up learning on multiple CPUs. We will also extend structured learning to the latent variable setting, where in many NLP applications such as translation and semantic parsing the gold-standard derivation is hidden.


  1. Overview of Structured Learning
    • key challenge 1: search efficiency
    • key challenge 2: interactions b/w search and learning

  2. Structured Perceptron
    • the basic algorithm
    • the convergence proof (a geometric view)
    • voted perceptron, averaged perceptron and efficient implementation tricks
    • applications in tagging, parsing, etc.

  3. Structured Perceptron under Inexact Search
    • convergence theory breaks under inexact search
    • early update
    • violation-fixing perceptron
    • applications in tagging, parsing, etc.


  4. From Perceptron to MIRA
    • 1-best MIRA; geometric solution
    • k-best MIRA; hildreth algorithm
    • MIRA with all constraints; loss-augmented decoding
    • MIRA under inexact search

  5. Large-Margin Structured Learning with Latent Variables
    • examples: machine translation, semantic parsing, transliteration
    • separability condition and convergence proof
    • latent-variable perceptron under inexact search
    • applications in machine translation

  6. Parallelizing Large-Margin Structured Learning
    • iterative parameter mixing (IPM)
    • minibatch perceptron and MIRA

Instructor Bios:

Liang Huang is an Assistant Professor at the City University of New York (CUNY). He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. His work is mainly on the theoretical aspects (algorithms and formalisms) of computational linguistics, as well as theory and algorithms of structured learning. He has received a Best Paper Award at ACL 2008, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), and a University Graduate Teaching Prize at Penn (2005). He has given two tutorials at COLING 2008 and NAACL 2009, being the most popular tutorial at both venues.

Kai Zhao is a Ph.D. candidate at the City University of New York (CUNY), working with Liang Huang. He received his B.S. from the University of Science and Technology in China (USTC). He has published on structured prediction, online learning, machine translation, and parsing algorithms. He was a summer intern with IBM TJ Watson Research Center in 2013.

Lemao Liu is a postdoctoral research associate at the City University of New York (CUNY), working with Liang Huang. He received his Ph.D. from the Harbin Institute of Technology in 2013. Much of his Ph.D. work was done while visiting NICT, Japan, under the guidance of Taro Watanabe. His research area is machine translation and machine learning.