integrating learning and search for structured predictionjana/pubs/ijcai-tutorial-all-parts.pdf ·...
TRANSCRIPT
![Page 1: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/1.jpg)
1
Integrating Learning and Search for Structured Prediction
Alan Fern
Oregon State University
Liang Huang
Oregon State University
Jana Doppa
Washington State University
Tutorial at International Joint Conference on Artificial Intelligence (IJCAI), 2016
![Page 2: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/2.jpg)
2
Part 1: Introduction
![Page 3: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/3.jpg)
3
Introduction
Structured Prediction problems are very common Natural language processing Computer vision Computational biology Planning Social networks ….
![Page 4: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/4.jpg)
4
Natural Language Processing Examples
![Page 5: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/5.jpg)
5
NLP Examples: POS Tagging and Parsing
POS Tagging
Parsing
𝑥𝑥 = “The cat ran” 𝑦𝑦 = <article> <noun> <verb>
“Red figures on the screen indicated falling stocks”
𝑥𝑥 𝑦𝑦
![Page 6: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/6.jpg)
6
NLP Examples: Coreference and Translation
Co-reference Resolution
Machine Translation
“Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affair experience as a former First Lady.”
“Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affair experience as a former First Lady.”
𝑥𝑥 𝑦𝑦
𝑥𝑥 = “The man bit the dog” 𝑦𝑦 = 该男子咬狗
![Page 7: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/7.jpg)
7
Examples of Bad Prediction
![Page 8: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/8.jpg)
8
Computer Vision Examples
![Page 9: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/9.jpg)
9
Scene Labeling
Image Labeling
![Page 10: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/10.jpg)
10
Biological Image Analysis
Nematocyst Image Body parts of the nematocyst
![Page 11: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/11.jpg)
11
The OSU Digital Scout Project Objective: compute semantic interpretations of football video
Raw video High-level interpretation of play
Help automate tedious video annotation done by pro/college/HS teams Working with hudl (hudl.com)
Requires advancing state-of-the-art in computer vision, including: registration, multi-object tracking, event/activity recognition
![Page 12: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/12.jpg)
12
Multi-Object Tracking in Videos
Video
Player Trajectories
![Page 13: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/13.jpg)
13
Automated Planning
![Page 14: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/14.jpg)
14
Planning
Initial State Goal State
A planning problem gives: an initial state a goal condition a list of actions and their semantics (e.g. STRIPS)
Objective: find action sequence from initial state to goal
?
Available actions:
Pickup(x) PutDown(x,y)
![Page 15: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/15.jpg)
15
Common Theme
POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects Inputs and outputs are highly structured
Studied under a sub-field of machine learning called “Structured Prediction” Generalization of standard classification Exponential no. of classes (e.g., all POS tag sequences)
![Page 16: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/16.jpg)
16
Classification to Structured Prediction
![Page 17: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/17.jpg)
Input
X
Y
Output
Learning a Classifier
?
![Page 18: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/18.jpg)
X
male
Learning a Classifier
?
Example problem:
X - image of a face
Y ∈ {male, female}
![Page 19: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/19.jpg)
X
Y
Learning a Classifier
?
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Example problem:
X - image of a face
Y ∈ {male, female}
Learning Algorithm
( , male)
![Page 20: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/20.jpg)
X
Y
Learning a Classifier
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Example problem:
X - image of a face
Y ∈ {male, female}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃
![Page 21: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/21.jpg)
X
Y
Learning for Simple Outputs
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Example problem:
X - image of a face
Y ∈ {male, female}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃 feature vector
class label
![Page 22: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/22.jpg)
X
Y
Learning for Simple Outputs
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Example problem:
X - image of a face
Y ∈ {male, female}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃 feature vector
class label Logistic Regression Support Vector Machines K Nearest Neighbor Decision Trees Neural Networks
![Page 23: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/23.jpg)
X
Y
Learning for Structured Outputs
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃
Part-of-Speech Tagging
“The cat ran”
<article> <noun> <verb>
English Sentence:
Part-of-Speech Sequence:
𝒀𝒀 = set of all possible POS tag sequences
Exponential !!
![Page 24: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/24.jpg)
X
Y
Learning for Structured Outputs
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃
Co-reference Resolution
“Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affair experience as a former First Lady.”
Text with input mentions:
Co-reference Output:
𝒀𝒀 = set of all possible clusterings
Exponential !!
“Barack Obama nominated Hillary Clinton as his secretary of state on Monday. He chose her because she had foreign affair experience as a former First Lady.”
![Page 25: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/25.jpg)
X
Y
Learning for Structured Outputs
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃
Handwriting Recognition
𝒀𝒀 = set of all possible letter sequences
Exponential !!
Letter Sequence: S t r u c t u r e d
Handwritten Word:
![Page 26: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/26.jpg)
X
Y
Learning for Structured Outputs
Training Data {(x1,y1),(x2,y2),…,(xn,yn)}
Learning Algorithm F(X, 𝜃𝜃)
𝜃𝜃
𝒀𝒀 = set of all possible labelings
Exponential !!
Image Labeling
![Page 27: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/27.jpg)
27
Part 2: Cost Function Learning Framework and Argmin Inference Challenge
![Page 28: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/28.jpg)
28
Cost Function Learning Approaches: Inspiration
Generalization of traditional ML approaches to structured outputs
SVMs ⇒ Structured SVM [Tsochantaridis et al., 2004]
Logistic Regression ⇒ Conditional Random Fields [Lafferty et al., 2001]
Perceptron ⇒ Structured Perceptron [Collins 2002]
![Page 29: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/29.jpg)
29
Cost Function Learning: Approaches
Most algorithms learn parameters of linear models 𝜙𝜙 𝑥𝑥,𝑦𝑦 is n-dim feature vector over input-output pairs w is n-dim parameter vector
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝒎𝒎𝒎𝒎𝒎𝒎 𝒚𝒚∈𝒀𝒀
𝒘𝒘 ⋅ 𝝓𝝓(𝒙𝒙,𝒚𝒚)
![Page 30: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/30.jpg)
30
Cost Function Learning: Approaches
Most algorithms learn parameters of linear models 𝜙𝜙 𝑥𝑥,𝑦𝑦 is n-dim feature vector over input-output pairs w is n-dim parameter vector
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝒎𝒎𝒎𝒎𝒎𝒎 𝒚𝒚∈𝒀𝒀
𝒘𝒘 ⋅ 𝝓𝝓(𝒙𝒙,𝒚𝒚)
Example: Part-of-Speech Tagging
x = “The cat ran” y = <article> <noun> <verb>
𝜙𝜙(𝑥𝑥,𝑦𝑦) may have unary and pairwise features
unary feature: e.g. # of times ‘the’ is paired with <article>
pairwise feature: e.g. # of times <article> followed by <verb>
![Page 31: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/31.jpg)
31
Key challenge: “Argmin” Inference
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝒎𝒎𝒎𝒎𝒎𝒎 𝒚𝒚∈𝒀𝒀
𝒘𝒘 ⋅ 𝝓𝝓(𝒙𝒙,𝒚𝒚)
Exponential size of output
space !!
![Page 32: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/32.jpg)
32
Key challenge: “Argmin” Inference
Time complexity of inference depends on the dependency structure of features 𝜙𝜙(𝑥𝑥,𝑦𝑦)
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝒎𝒎𝒎𝒎𝒎𝒎 𝒚𝒚∈𝒀𝒀
𝒘𝒘 ⋅ 𝝓𝝓(𝒙𝒙,𝒚𝒚)
![Page 33: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/33.jpg)
33
Key challenge: “Argmin” Inference
Time complexity of inference depends on the dependency structure of features 𝜙𝜙(𝑥𝑥,𝑦𝑦) NP-Hard in general Efficient inference algorithms exist only for simple features
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝒎𝒎𝒎𝒎𝒎𝒎 𝒚𝒚∈𝒀𝒀
𝒘𝒘 ⋅ 𝝓𝝓(𝒙𝒙,𝒚𝒚)
![Page 34: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/34.jpg)
34
Cost Function Learning: Key Elements
Joint Feature Function How to encode a structured input (x) and structured output
(y) as a fixed set of features 𝜙𝜙(𝑥𝑥,𝑦𝑦)?
(Loss Augmented) Argmin Inference Solver Viterbi algorithm for sequence labeling CKY algorithm for parsing (Loopy) Belief propagation for Markov Random Fields Sorting for ranking
Optimization algorithm for learning weights (sub) gradient descent, cutting plane algorithm …
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝒎𝒎𝒎𝒎𝒎𝒎 𝒚𝒚∈𝒀𝒀
𝒘𝒘 ⋅ 𝝓𝝓(𝒙𝒙,𝒚𝒚)
![Page 35: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/35.jpg)
35
Cost Function Learning: Generic Template
repeat For every training example (𝑥𝑥,𝑦𝑦) Inference: 𝑦𝑦� = arg𝑚𝑚𝑖𝑖𝑖𝑖𝑦𝑦∈𝑌𝑌 𝑤𝑤 ∙ 𝜑𝜑 𝑥𝑥, 𝑦𝑦 If mistake 𝑦𝑦 ≠ 𝑦𝑦�, Learning: online or batch weight update
until convergence or max. iterations
Training goal: Find weights 𝑤𝑤 s.t For each input 𝑥𝑥, the cost of the correct structured output 𝑦𝑦 is lower than all wrong structured outputs
Exponential size of output
space !!
![Page 36: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/36.jpg)
36
Expensive Training Process
Main Reason repeated calls to “Argmin inference solver” (computationally
expensive) on all the training examples
Recent Solutions Amortized Inference: Kai-Wei Chang, Shyam Upadhyay, Gourab
Kundu, Dan Roth: Structural Learning with Amortized Inference. AAAI 2015
Decomposed Learning: Rajhans Samdani, Dan Roth: Efficient Decomposed Learning for Structured Prediction. ICML 2012
![Page 37: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/37.jpg)
37
Cost Function Learning: “Exact” vs. “Approximate” Inference Solver Most theory works for “Exact” Inference
Theory breaks with “Approximate” Inference Alex Kulesza, Fernando Pereira: Structured Learning with Approximate
Inference. NIPS 2007 Thomas Finley, Thorsten Joachims: Training structural SVMs when exact
inference is intractable. ICML 2008: 304-311
Active Research Topic: Interplay between (approximate) inference and learning Veselin Stoyanov, Alexander Ropson, Jason Eisner: Empirical Risk
Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure. AISTATS 2011
Justin Domke: Structured Learning via Logistic Regression. NIPS 2013 …
![Page 38: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/38.jpg)
38
Focus of Tutorial
Integrating “Learning” and “Search” two fundamental branches of AI to solve structured prediction problems
Key Idea: Accept that “exact” Argmin inference is intractable Select a computationally bounded search architecture for
making predictions Optimize the parameters of that procedure to produce
accurate outputs using training data Learning “with Inference” vs. Learning “for Inference”
![Page 39: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/39.jpg)
39
Part 3: A Brief Overview of Search Concepts
![Page 40: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/40.jpg)
40
Combinatorial Search: Key Concepts
Search Space Where to start the search? How to navigate the space?
Search Procedure / Strategy How to conduct search?
Search Control Knowledge How to guide the search? (Intelligence)
![Page 41: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/41.jpg)
41
Search Space Definition
Initial State Function: 𝑰𝑰 Where to start the search?
Successor State Function: 𝑺𝑺 What are the successor (next) states for a given state? Generally, specified as a set of actions that modify the given
state to compute the successor states
Terminal State Function: 𝑻𝑻 When to stop the search?
![Page 42: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/42.jpg)
42
(Ordered) Search Space: Example
![Page 43: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/43.jpg)
43
Search Procedure
Search Tree (or Graph): Instantiation of the search space. How to navigate?
Uninformed (Blind) Search Procedure Breadth-First Search (BFS) Depth-First Search (DFS)
Informed (Intelligent) Search Procedure Greedy Search Beam Search Best-First Search …
![Page 44: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/44.jpg)
44
Informed Search Procedures
Maintain an internal memory of a set of open nodes (𝑀𝑀)
Intelligent search guided by the control knowledge
Algorithmic Framework for Best-First Search style search strategies: Selection: score each open node in the memory 𝑀𝑀 and select
a subset of node(s) to expand Expansion: expand each selected state using the successor
function to generate the candidate set Pruning: Retains a subset of all open nodes (update 𝑀𝑀) and
prune away all the remaining nodes
![Page 45: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/45.jpg)
45
Best-First Search Style Algorithms
Best-first Search (𝑀𝑀 = ∞) selects the best open node no pruning
Greedy Search (𝑀𝑀 = 1) selection is trivial prunes everything except for the best open node in the
candidate set
![Page 46: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/46.jpg)
46
Best-First Search Style Algorithms
Best-first Beam Search (𝑀𝑀 = 𝐵𝐵) selects the best open node prunes everything except for the best 𝐵𝐵 open nodes in the
candidate set
Breadth-First Beam Search (𝑀𝑀 = 𝐵𝐵) selection is trivial – all B nodes prunes everything except for the best 𝐵𝐵 open nodes in the
candidate set
![Page 47: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/47.jpg)
47
Search Control Knowledge
Greedy Policies Classifier that selects the best action at each state
Heuristic Functions computes the score for each search node heuristic scores are used to perform selection and pruning
Pruning Rules additional control knowledge to prune bad actions / states
Cost Function Scoring function to evaluate the terminal states
![Page 48: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/48.jpg)
48
Part 4: Control Knowledge Learning Framework: Greedy Methods
![Page 49: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/49.jpg)
49
Greedy Control Knowledge Learning
Given Search space definition (ordered or unordered) Training examples (input-output pairs)
Learning Goal Learn a policy or classifier to make good predictions
Key Idea: Training examples can be seen as expert demonstrations Equivalent to “Imitation Learning” or “Learning from
Demonstration” Reduction to classifier or rank learning
![Page 50: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/50.jpg)
50
Ordered vs. Unordered Search Space
Ordered Search Space Fixed ordering of decisions (e.g., left-to-right in sequences) Classifier based structured prediction
Unordered Search Space Learner dynamically orders the decisions Easy-First approach
![Page 51: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/51.jpg)
51
Classifier-based Structured Prediction
![Page 52: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/52.jpg)
52
Classifier-based Structured Prediction
Reduction to classifier learning 26 classes
IL Algorithms Exact-Imitation SEARN DAgger AggreVaTe LOLS
![Page 53: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/53.jpg)
53
Aside: Reductions in Machine Learning
Reduce complex problem to simpler problem(s)
A better algorithm for simpler problem means a better algorithm for complex problem
Composability, modularity, ease-of-implementation
Hard Machine Learning Problem
Easy Machine Learning Problem Reduction
Performance 𝜖𝜖 Performance f(𝜖𝜖)
![Page 54: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/54.jpg)
54
Imitation Learning Approach
Expert demonstrations each training example (input-output pair) can be seen as a
“expert” demonstration for sequential decision-making
Collect classification examples Generate a multi-class classification example for each of the
decisions Input: f(n), features of the state n Output: yn, the correct decision at state n
Classifier Learning Learn a classifier from all the classification examples
![Page 55: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/55.jpg)
55
Exact Imitation: Classification examples
, - - - - - - 𝑓𝑓 𝑠𝑠
𝑓𝑓 𝑡𝑡
𝑓𝑓 𝑟𝑟
𝑓𝑓 𝑢𝑢
𝑓𝑓 𝑐𝑐
𝑓𝑓 𝑡𝑡
, s - - - - -
, s t - - - -
, s t r - - -
, s t r u - -
, s t r u c -
Input Output For each training example
![Page 56: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/56.jpg)
56
Exact Imitation: Classifier Learning
, - - - - - - 𝑓𝑓 𝑠𝑠
𝑓𝑓 𝑡𝑡
𝑓𝑓 𝑟𝑟
𝑓𝑓 𝑢𝑢
𝑓𝑓 𝑐𝑐
𝑓𝑓 𝑡𝑡
, s - - - - -
, s t - - - -
, s t r - - -
, s t r u - -
, s t r u c -
Input Output
…
𝒉𝒉
Recurrent classifier
or
Learned policy
Classification Learner
![Page 57: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/57.jpg)
57
Learned Recurrent Classifier: Illustration
Error propagation: errors in early decisions propagate to down-stream decisions
![Page 58: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/58.jpg)
58
Recurrent Error
Can lead to poor global performance
Early mistakes propagate to downstream decisions: f 𝜖𝜖 = 𝑂𝑂 𝜖𝜖𝑇𝑇2 , where 𝜖𝜖 is the probability of error at each decision and T is the number of decision steps [Kaariainen 2006] [Ross & Bagnell 2010]
Mismatch between training (IID) and testing (non-IID) distribution
Is there a way to address error propagation?
![Page 59: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/59.jpg)
59
Addressing Error Propagation
• Rough Idea: Iteratively observe current policy and augment training data to better represent important states
• Several variations on this idea [Fern et al., 2006], [Daume et al., 2009], [Xu & Fern 2010], [Ross & Bagnell 2010], [Ross et al. 2011, 2014], [Chang et al., 2015]
• Generate trajectories using current policy (or some variant)
• Collect additional classification examples using optimal policy (via ground-truth output)
![Page 60: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/60.jpg)
60
DAgger Algorithm [Ross et al., 2011]
Collect initial training set 𝐷𝐷 of 𝑁𝑁 trajectories from reference policy 𝜋𝜋∗
Repeat until done 𝜋𝜋 ← LearnClassifier(𝐷𝐷) Collect set of states S that occur along 𝑁𝑁 trajectories of 𝜋𝜋 For each state 𝑠𝑠 ∈ 𝑆𝑆
𝐷𝐷 ← 𝐷𝐷 ∪ { 𝑠𝑠,𝜋𝜋∗ 𝑠𝑠 } // add state labeled by expert or reference policy
Return 𝜋𝜋
Each iteration increases the amount of training data (data aggregation)
![Page 61: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/61.jpg)
61
DAgger for Handwriting Recognition
Source: [Ross et al., 2011]
![Page 62: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/62.jpg)
62
Ordered vs. Unordered Search Space
Ordered Search Space Fixed ordering of decisions (e.g., left-to-right in sequences) Classifier based structured prediction
Unordered Search Space Learner dynamically orders the decisions Easy-First approach
![Page 63: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/63.jpg)
63
Easy-First Approach for Structured Prediction
![Page 64: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/64.jpg)
64
Easy-First Approach: Motivation
Drawbacks of classifier-based structured prediction Need to define an ordering over the output variables (e.g., left-
to-right in sequence labeling) Which order is good? How do you find one? Some decisions are hard to make if you pre-define a fixed order
over the output variables
Easy-First Approach: Key Idea Make easy decisions first to constrain the harder decisions Learns to dynamically order the decisions Analogous to constraint satisfaction algorithms
![Page 65: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/65.jpg)
65
Example: Cross-Document Coreference
One of the key suspected mafia bosses arrested yesterday had hanged himself.
Police said Lo Presti has hanged himself.
had hanged
has hanged
Easy
One of the key suspected mafia bosses
Lo Presti
Hard
Doc 1
Doc 2
![Page 66: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/66.jpg)
66
Example: Cross-Document Coreference
One of the key suspected mafia bosses arrested yesterday had hanged himself.
Police said Lo Presti has hanged himself.
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti
• Once we decide that the two verbs are coreferent, the two noun mentions serve the same semantic role to the verb cluster
• Strong evidence for coreference
Doc 1
Doc 2
![Page 67: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/67.jpg)
67
Easy-First Approach: Overview
Consider a set of inter-dependent decisions in a sequential manner
At each step, make the easiest decision first
This allows us to accumulate more information to help resolve more challenging decisions later
![Page 68: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/68.jpg)
68
Applications of Easy-First
Cross-document joint entity and event co-reference Lee et. al. EMNLP-CoNLL ’12
Within-document co-reference Resolution Stoyanov and Eisner, COLING’12
Dependency parsing Goldberg and Elhadad, HLT-NAACL’ 10
![Page 69: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/69.jpg)
69
Easy-First Approach: Key Elements
• Search space – A state corresponds to a partial solution
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti
Initial state: all mentions and verbs are in separate clusters
The Police
![Page 70: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/70.jpg)
70
Easy-First Approach: Key Elements
• Search space – A state corresponds to a partial solution – In each state, we consider a set of fixed possible actions
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti
Four possible merge actions
The Police ? ? ?
?
![Page 71: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/71.jpg)
Easy-First Approach: Key Elements
• Search space – A state corresponds to a partial solution – In each state, we consider a set of fixed possible actions – Each action is described by a feature vector 𝑥𝑥 ∈ 𝑅𝑅𝑑𝑑
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti
Four possible merge actions
The Police
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝑥𝑥4
![Page 72: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/72.jpg)
Easy-First Approach: Key Elements
• Search space – A state corresponds to a partial solution – In each state, we consider a set of fixed possible actions – Each action is described by a feature vector 𝑥𝑥 ∈ 𝑅𝑅𝑑𝑑 – An action is defined to be good if it leads to an improved
state
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti The Police
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝑥𝑥4
𝑥𝑥2, 𝑥𝑥4 ∈ 𝐺𝐺 (good actions); 𝑥𝑥1, 𝑥𝑥3 ∈ 𝐵𝐵(bad actions)
![Page 73: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/73.jpg)
Easy-First Approach: Key Elements
• Search space • Scoring function 𝑓𝑓:𝑅𝑅𝑑𝑑 → 𝑅𝑅
– e.g., 𝑓𝑓 𝑥𝑥 = 𝑤𝑤 ⋅ 𝑥𝑥 – In each state, evaluate all possible actions
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti The Police
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝑥𝑥4
𝑓𝑓 𝑥𝑥1 = 0.05 𝑓𝑓 𝑥𝑥2 = 0.08 𝑓𝑓 𝑥𝑥3 = 0.057 𝑓𝑓 𝑥𝑥4 = 0.75
73
![Page 74: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/74.jpg)
Easy-First Approach: Key Elements
• Search space • Scoring function 𝑓𝑓:𝑅𝑅𝑑𝑑 → 𝑅𝑅
– e.g., 𝑓𝑓 𝑥𝑥 = 𝑤𝑤 ⋅ 𝑥𝑥 – In each state, evaluate all possible actions – Take the highest scoring action (easiest)
had hanged
has hanged
One of the key suspected mafia bosses
Lo Presti The Police
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
𝑥𝑥4
𝑓𝑓 𝑥𝑥1 = 0.05 𝑓𝑓 𝑥𝑥2 = 0.08 𝑓𝑓 𝑥𝑥3 = 0.057 𝑓𝑓 𝑥𝑥4 = 0.75
74
![Page 75: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/75.jpg)
75
Scoring Function Learning
Possible goal: learn a scoring function such that: in every state all good actions are ranked higher than all bad actions
A better goal: learn a scoring function such that in every state a good action is ranked higher than all bad actions
![Page 76: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/76.jpg)
76
Alternate Methods
• In a training step, if the highest scoring action is bad, perform weight update
• Different update approaches – Best (highest scoring) good vs. best (highest
scoring) bad – Average good vs. average bad
Issue: they do not directly optimize toward our goal!
![Page 77: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/77.jpg)
77
Optimization Objective for Update
+∈ ∈∑ ⋅+⋅−
BbbgGgw
xwxwB
)max1(1argmin
max𝑔𝑔∈𝐺𝐺
𝑤𝑤 ⋅ 𝑥𝑥𝑔𝑔 > 𝑤𝑤 ⋅ 𝑥𝑥𝑏𝑏 + 1 for all 𝑏𝑏 ∈ 𝐵𝐵
77
• Goal: find a linear function such that it ranks one good action higher than all bad actions – This can be achieved by a set of constraints
• Optimization Objective: • Use hinge loss to capture the constraints • Regularization to avoid overly aggressive update
2cww−+ λ
![Page 78: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/78.jpg)
78
Optimization: Majorization-Minimization [Xie et al., 2015]
: It is non-convex
Can be solved using a Majorization-Minimization (MM) algorithm to get local optima solution
In each MM iteration: Let 𝑥𝑥𝑔𝑔∗ be the current highest scoring good action Solve following convex objective (via subgradient descent):
+
∈ ∈∑ ⋅+⋅−
BbbgGgw
xwxwB
)max1(1argmin 2cww−+ λ
*gxw ⋅
+∈ ∈∑ ⋅+⋅−
BbbgGgw
xwxwB
)max1(1argmin 2cww−+ λ
![Page 79: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/79.jpg)
79
Contrast with Alternate Methods Good Bad
Average-Good Average-Bad
• Average-good vs. average-bad (AGAB)
• Best-good vs. best-bad (BGBB)
• Current method: Best-good vs. violated-bad (BGVB)
Best-good Violated-bad
Best-good Best-bad
[Daume et al., 2005], [Xu et al., 2009]
[Goldberg et al., 2010], [Stoyanov et al., 2012]
[Xie et al., 2015]
![Page 80: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/80.jpg)
80
Experiment I: Cross-document entity and event Coreference
01020304050607080
MUC B-CUBE CEAF_e CoNLL
Results on EECB corpus (Lee et al., 2012) BGBB R-BGBB BGVB R-BGVB Lee et al.
[Xie et al., 2015]
![Page 81: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/81.jpg)
81
Experiment I: Within document Coreference
0
10
20
30
40
50
60
70
80
MUC B-CUBE CEAF_e CoNLL
Results on OntoNotes BGBB R-BGBB BGVB R-BGVB
[Xie et al., 2015]
![Page 82: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/82.jpg)
82
Easy-First Learning as Imitation Learning
Imitation learning with a non-deterministic oracle policy multiple good decisions (actions) at a state
Ties are broken with the learned policy (scoring function)
NLP researchers employ imitation learning ideas and call them “training with exploration” Miguel Ballesteros, Yoav Goldberg, Chris Dyer, Noah A. Smith: Training with
Exploration Improves a Greedy Stack-LSTM Parser. CoRR abs/1603.03793 (2016)
Imitation learning ideas are also employed in training recurrent neural networks (RNNs) under the name “scheduled sampling” Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer: Scheduled Sampling
for Sequence Prediction with Recurrent Neural Networks. NIPS 2015
![Page 83: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/83.jpg)
83
Part 5: Control Knowledge Learning: Beam Search Methods
![Page 84: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/84.jpg)
84
Beam Search Framework
Given Search space definition (ordered or unordered) Training examples (input-output pairs) Beam width B (>1)
Learning Goal Learn a heuristic function to quickly guide the search to the
correct “complete’’ output
Key Idea: Structured prediction as a search problem in the space of
partial outputs Training examples define target paths from initial state to
the goal state (correct structured output)
![Page 85: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/85.jpg)
85
Beam Search Framework: Key Elements 1) Search space; 2) Search procedure; 3) Heuristic function
Represent heuristic function as a linear function 𝐻𝐻 𝑖𝑖 = 𝑤𝑤 ∙ 𝜓𝜓(𝑖𝑖) , where 𝜓𝜓(𝑖𝑖) stands for features of node 𝑖𝑖
Target node
Non-Target node
![Page 86: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/86.jpg)
86
Beam Search: Illustration
![Page 87: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/87.jpg)
87
Beam Search: Illustration
![Page 88: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/88.jpg)
88
Beam Search: Illustration
![Page 89: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/89.jpg)
89
Beam Search: Illustration
…
![Page 90: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/90.jpg)
90
Beam Search: Illustration
…
![Page 91: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/91.jpg)
91
Beam Search Framework: Inference
Input: learned weights 𝑤𝑤; beam width B; structured input 𝑥𝑥
repeat Perform search with heuristic 𝐻𝐻 𝑖𝑖 = 𝑤𝑤 ∙ 𝜓𝜓(𝑖𝑖)
until reaching a terminal state
Output: the complete output y corresponding to the terminal state
![Page 92: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/92.jpg)
92
Beam Search Framework: Generic Learning Template
Three design choices
How to define the notion of “search error”?
How to “update the weights” of heuristic function
when a search error is encountered? How to “update the beam” after weight update?
![Page 93: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/93.jpg)
93
Beam Search Framework: Learning Instantiations
Early update
Max-violation update
Learning as Search Optimization (LaSO)
[Collins and Roark, 2004]
[Huang et al., 2012]
[Daume et al., 2005], [Xu et al., 2009]
![Page 94: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/94.jpg)
94
Beam Search Framework: Learning Instantiations
Early update
Max-violation update
Learning as Search Optimization (LaSO)
![Page 95: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/95.jpg)
95
Beam Search Framework: Early Update
Search error: NO target node in the beam We cannot reach the goal node (correct structured output)
Weight update: standard structured perceptron Score of correct output > score of bad output
Beam update: reset beam with initial state OR discontinue search
![Page 96: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/96.jpg)
96
Beam Search Framework: Early Update
repeat For every training example (𝑥𝑥,𝑦𝑦)
Perform search with current heuristic (weights) If search error , update weights Reset beam with initial state (Dis)continue search
until convergence or max. iterations
![Page 97: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/97.jpg)
97
Beam Search Framework: Learning Instantiations
Early update
Max-violation update
Learning as Search Optimization (LaSO)
![Page 98: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/98.jpg)
98
Beam Search Framework: Max-Violation Update
Improves on the drawback of Early update Slow learning: learns from only earliest mistake
Max-Violation fix Consider worst-mistake (maximum violation) instead
of earliest-mistake for the weight update More useful training data Converges faster than early update
![Page 99: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/99.jpg)
99
POS Tagging: Max-violation vs. Early vs. Standard
Early and Max-violation >> Standard at small beams Advantage shrinks as beam size increases Max-violation converges faster than Early (and slightly
better) Beam =2 Best accuracy vs. beam size
Source: Huang et al., 2012
![Page 100: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/100.jpg)
100
Beam Search Framework: LaSO
Search error: NO target node in the beam We cannot reach the goal node (correct structured output)
Weight update: perceptron update 𝑤𝑤𝑛𝑛𝑛𝑛𝑛𝑛 = 𝑤𝑤𝑜𝑜𝑜𝑜𝑑𝑑 + 𝛼𝛼 ∙ (𝜓𝜓𝑎𝑎𝑎𝑎𝑔𝑔 𝑡𝑡𝑡𝑡𝑟𝑟𝑡𝑡𝑡𝑡𝑡𝑡 − 𝜓𝜓𝑎𝑎𝑎𝑎𝑔𝑔(𝑖𝑖𝑛𝑛𝑖𝑖 − 𝑡𝑡𝑡𝑡𝑟𝑟𝑡𝑡𝑡𝑡𝑡𝑡)) 𝜓𝜓𝑎𝑎𝑎𝑎𝑔𝑔 𝑡𝑡𝑡𝑡𝑟𝑟𝑡𝑡𝑡𝑡𝑡𝑡 = Average features of all target nodes in the
candidate set 𝜓𝜓𝑎𝑎𝑎𝑎𝑔𝑔 𝑖𝑖𝑛𝑛𝑖𝑖 − 𝑡𝑡𝑡𝑡𝑟𝑟𝑡𝑡𝑡𝑡𝑡𝑡 = Average features of all non-target nodes
in the candidate set Intuition: increase the score of target nodes and decrease the
score of the non-target nodes
Beam update: reset beam with target nodes in the candidate set
![Page 101: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/101.jpg)
101
LaSO Training: Illustration
−
∩×+= ∑∑
∈∩∈
BvF
CP
vFww
ji
CP ji Bv
,
v*
)()(,
*
α
−
∩×+= ∑∑
∈∩∈
BvF
CP
vFww
ji
CP ji Bv
,
v*
)()(,
*
α
…
…
An error occurs
An error occurs
Basic Idea: repeatedly conduct search on training examples update weights when error occurs
solution node non-solution node
![Page 102: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/102.jpg)
102
Beam Search Framework: LaSO
repeat For every training example (𝑥𝑥,𝑦𝑦)
Perform search with current heuristic (weights) If search error , update weights Reset beam with target nodes in the candidate set Continue search
until convergence or max. iterations
![Page 103: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/103.jpg)
103
LaSO Convergence Results
Under certain assumptions, LaSO-BR converges to a weight vector that solves all training examples in a finite number of iterations
Interesting convergence result Mistake bound depends on the beam width Formalizes the intuition that learning becomes easier as we
increase the beam width (increase the amount of search) First formal result of this kind
![Page 104: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/104.jpg)
104
LaSO: Example Planning Results Blocksworld 30 testing problems Trained with beam width 10 Features: RPL heuristic and features induced in prior work
0
5
10
15
20
25
30
1 10 50 100 500
Problems solved Median plan length
0
500
1000
1500
2000
2500
3000
3500
1 10 50 100 500
LaSO-BR10LENLRU
Beam width Beam width
Source: Xu et al., 2009
![Page 105: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/105.jpg)
105
Part 6: HC-Search: A Unifying Framework for Cost Function and
Control Knowledge Learning
![Page 106: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/106.jpg)
106
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 107: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/107.jpg)
107
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 108: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/108.jpg)
108
HC-Search: A Unifying View
Cost Function Learning Approaches Don’t learn search control knowledge
Control Knowledge Learning Approaches Don’t learn cost functions
HC-Search Learning Framework Unifies the above two frameworks and has many advantages Without H, degenerates to cost function learning Without C, degenerates to control knowledge learning Supports learning to improve both speed and accuracy of
structured prediction
![Page 109: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/109.jpg)
109
HC-Search framework: Inspiration
HC-Search Framework
Traditional AI Search for combinatorial optimization +
Learning
![Page 110: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/110.jpg)
110
HC-Search Framework: Overview
Key Idea: Generate high-quality candidate outputs by conducting a
time-bounded search guided by a learned heuristic H Score the candidate outputs using a learned cost function C
to select the least cost output as prediction
Heuristic Learning can be done in primitive space (e.g., IJCAI’16 paper on
incremental parsing) OR complete output space
IJCAI’16 paper on computing M-Best Modes via Heuristic Search
![Page 111: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/111.jpg)
111
HC-Search framework: Overview
Key Ingredients: Define a search space over structured outputs
Learn a cost function 𝑪𝑪 to score potential outputs
Use a search algorithm to find low cost outputs
Learn a heuristic function 𝑯𝑯 to make search efficient
Our approach:
o Structured Prediction as a search process in the combinatorial space of outputs
![Page 112: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/112.jpg)
112
HC-Search Illustration: Search Space
, praual
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
, strucl , stauce
… …
…
, stauaz , staucl
…
… …
…
, pragat
…
, praglt
…
…
…
root node Nodes = input-output pairs
![Page 113: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/113.jpg)
113
HC-Search Illustration: Cost Function
, praual
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
, strucl , stauce
… …
…
, stauaz , staucl
…
… …
…
, pragat
…
, praglt
…
…
…
root node
0.92
0.88 0.81 0.89 0.85
0.95 0.84 0.91
0.95 0.68 0.83
0.60 0.77
Cost
How bad is a node?
![Page 114: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/114.jpg)
114
HC-Search Illustration: Making Predictions
Assume we have a good cost function.
How to make predictions?
![Page 115: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/115.jpg)
115
HC-Search Illustration: Greedy Search
, praual root node
![Page 116: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/116.jpg)
116
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
![Page 117: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/117.jpg)
117
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
0.88 0.81 0.89 0.85
Heuristic value
![Page 118: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/118.jpg)
118
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
0.88 0.81 0.89 0.85
![Page 119: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/119.jpg)
119
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
![Page 120: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/120.jpg)
120
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
… 0.95 0.84
![Page 121: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/121.jpg)
121
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
… 0.95 0.84
![Page 122: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/122.jpg)
122
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
…
![Page 123: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/123.jpg)
123
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… 0.95 0.68
![Page 124: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/124.jpg)
124
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… 0.95 0.68
![Page 125: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/125.jpg)
125
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
…
![Page 126: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/126.jpg)
126
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
…
Set of all outputs generated within time limit
![Page 127: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/127.jpg)
127
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… 0.69 0.71
0.65 0.56
0.81 0.67 0.76 0.71
0.95
Cost
![Page 128: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/128.jpg)
128
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… 0.69 0.71
0.65 0.56
0.81 0.67 0.76 0.71
0.95
Best cost output
![Page 129: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/129.jpg)
129
HC-Search Illustration: Greedy Search
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… 0.69 0.71
0.65 0.56
0.81 0.67 0.76 0.71
0.95
𝑦𝑦� = sraucl
Best cost output
![Page 130: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/130.jpg)
130
HC-Search: Properties
Anytime predictions Stop the search at any point and return the best cost output
Minimal restrictions on the complexity of heuristic and cost functions Only needs to be evaluated on complete input-output pairs Can use higher-order features with negligible overhead
Can optimize non-decomposable loss functions e.g., F1 score
Error Analysis: Heuristic error + Cost function error engineering methodology guided by the error decomposition
![Page 131: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/131.jpg)
131
HC-Search: Key Learning Challenges
Search Space Design: How can we automatically define high-quality search
spaces ?
Heuristic Learning: How can we learn a heuristic function to guide the
search to generate high-quality outputs ?
Cost Function Learning: How can we learn a cost function to score the
outputs generated by the heuristic function ?
![Page 132: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/132.jpg)
132
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 133: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/133.jpg)
133
HC-Search: Loss Decomposition
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… Best cost output
![Page 134: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/134.jpg)
134
HC-Search: Loss Decomposition
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… Best cost output
Loss = 0.22
![Page 135: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/135.jpg)
135
HC-Search: Loss Decomposition
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… Best cost output
Loss = 0.22
Minimum loss output Loss = 0.09
![Page 136: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/136.jpg)
136
HC-Search: Loss Decomposition
Overall loss 𝝐𝝐 = 0.22
Generation loss 𝝐𝝐𝑯𝑯 = 0.09
(Heuristic function)
Selection loss 𝝐𝝐𝑪𝑪 = 0.22 – 0.09
(Cost function)
![Page 137: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/137.jpg)
137
HC-Search: Loss Decomposition
𝝐𝝐 = 𝝐𝝐𝑯𝑯 + 𝝐𝝐𝑪𝑪|𝑯𝑯
Overall expected loss
Generation loss (Heuristic function)
Selection loss (Cost function)
𝑪𝑪 𝒙𝒙,𝒚𝒚 = 𝒘𝒘𝒄𝒄 ⋅ 𝝓𝝓𝑯𝑯 𝒙𝒙,𝒚𝒚 𝑯𝑯 𝒙𝒙,𝒚𝒚 = 𝒘𝒘𝑯𝑯 ⋅ 𝝓𝝓𝑪𝑪 𝒙𝒙,𝒚𝒚
![Page 138: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/138.jpg)
138
HC-Search: Learning
𝝐𝝐 = 𝝐𝝐𝑯𝑯 + 𝝐𝝐𝑪𝑪|𝑯𝑯
Overall loss
Generation loss (Heuristic function)
Selection loss (Cost function)
Key idea: Greedy stage-wise minimization guided by the loss decomposition
Doppa, J.R., Fern, A., Tadepalli, P. HC-Search: A Learning Framework for Search-based Structured Prediction. Journal of Artificial Intelligence Research (JAIR) 2014.
![Page 139: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/139.jpg)
139
HC-Search: Learning
𝝐𝝐 = 𝝐𝝐𝑯𝑯 + 𝝐𝝐𝑪𝑪|𝑯𝑯
Overall loss
Generation loss (Heuristic function)
Selection loss (Cost function)
Key idea: Greedy stage-wise minimization guided by the loss decomposition Step 1: 𝐻𝐻� = arg𝑚𝑚𝑖𝑖𝑖𝑖𝐻𝐻∈𝑯𝑯 𝜖𝜖𝐻𝐻 (heuristic training)
![Page 140: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/140.jpg)
140
HC-Search: Learning
𝝐𝝐 = 𝝐𝝐𝑯𝑯 + 𝝐𝝐𝑪𝑪|𝑯𝑯
Overall loss
Generation loss (Heuristic function)
Selection loss (Cost function)
Key idea: Greedy stage-wise minimization guided by the loss decomposition Step 1: 𝐻𝐻� = arg𝑚𝑚𝑖𝑖𝑖𝑖𝐻𝐻∈𝑯𝑯 𝜖𝜖𝐻𝐻 (heuristic training) Step 2: �̂�𝐶 = arg𝑚𝑚𝑖𝑖𝑖𝑖𝐶𝐶∈𝑪𝑪 𝜖𝜖𝐶𝐶|𝐻𝐻� (cost function training)
![Page 141: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/141.jpg)
141
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 142: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/142.jpg)
142
HC-Search: Heuristic learning
Learning Objective: Guide the search quickly towards high-quality (low loss)
outputs
![Page 143: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/143.jpg)
143
HC-Search: Heuristic Learning
Key idea: Imitation of true loss function Conduct searches on training example using the true loss
function as a heuristic (generally is a good way to produce good outputs) Learn a heuristic function that tries to imitate the observed
search behavior
• Given a search procedure (e.g., greedy search)
![Page 144: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/144.jpg)
144
Greedy Search: Imitation with true loss , praual
, araual , strual , ptrual , practi
… … …
, struct , struat
…
root node
5
5 2 3 6
1 0
True loss
, strual , struct = 2 Hamming Loss
![Page 145: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/145.jpg)
145
Greedy Search: Imitation with true loss , praual
, araual , strual , ptrual , practi
… … …
, struct , struat
…
root node
5
5 2 3 6
1 0
True loss
Generation loss 𝜖𝜖𝐻𝐻∗= 0
![Page 146: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/146.jpg)
146
Greedy Search: Ranking examples
, praual
, araual , strual , ptrual , practi
… … …
, strual , araual <
, strual < , ptrual
, strual < , practi
…
5
5 2 3 6
2
2
2
5
3
6
![Page 147: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/147.jpg)
147
Greedy Search: Ranking examples
, praual
, araual , strual , ptrual , practi
… … …
, struct , struat
… … …
5
5 2 3 6
1 0
![Page 148: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/148.jpg)
148
Greedy Search: Ranking examples
, praual
, araual , strual , ptrual , practi
… … …
, struct , struat
… … …
, struct , struat < …
5
5 2 3 6
1 0
0 1
![Page 149: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/149.jpg)
149
HC-Search: Heuristic Function Learning
Rank Learner
Heuristic function 𝑯𝑯�
Ranking examples
Can prove generalization bounds on learned heuristic [Doppa et al., 2012]
![Page 150: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/150.jpg)
150
HC-Search: Learning
𝝐𝝐 = 𝝐𝝐𝑯𝑯 + 𝝐𝝐𝑪𝑪|𝑯𝑯
Overall loss
Generation loss (Heuristic function)
Selection loss (Cost function)
Key idea: Greedy stage-wise minimization guided by the loss decomposition Step 1: 𝐻𝐻� = arg𝑚𝑚𝑖𝑖𝑖𝑖𝐻𝐻∈𝑯𝑯 𝜖𝜖𝐻𝐻 (heuristic training) Step 2: �̂�𝐶 = arg𝑚𝑚𝑖𝑖𝑖𝑖𝐶𝐶∈𝑪𝑪 𝜖𝜖𝐶𝐶|𝐻𝐻� (cost function training)
![Page 151: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/151.jpg)
151
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 152: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/152.jpg)
152
HC-Search: Cost Function Learning
Learning Objective: Correctly score the outputs generated by the heuristic as per
their losses
![Page 153: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/153.jpg)
153
HC-Search: Cost function Learning
Set of all outputs generated by the heuristic 𝑯𝑯�
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
…
![Page 154: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/154.jpg)
154
HC-Search: Cost function Learning
Key Idea: Learn to rank the outputs generated by the learned heuristic function 𝐻𝐻� as per their losses
, praual root node
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
…
, stauaz , staucl
… 0.22 0.29
0.39 0.22
0.81 0.37 0.76 0.71
Loss
0.5
Best loss output
![Page 155: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/155.jpg)
155
HC-Search: Cost function Learning
Create a ranking example between every pair of outputs (𝑦𝑦𝑏𝑏𝑛𝑛𝑏𝑏𝑏𝑏 , 𝑦𝑦) such that: 𝐶𝐶 𝑥𝑥, 𝑦𝑦𝑏𝑏𝑛𝑛𝑏𝑏𝑏𝑏 < 𝐶𝐶(𝑥𝑥, 𝑦𝑦)
Learning to Rank:
< …
…
…
…
Best loss outputs
Non-best loss outputs
![Page 156: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/156.jpg)
156
HC-Search: Cost function Learning
Rank Learner
Cost function 𝑪𝑪�
Ranking examples
<
< …
…
<
<
Can borrow generalization bounds from rank-learning literature [Agarwal and Roth, 2005 & Agarwal and Niyogi, 2009]
![Page 157: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/157.jpg)
157
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 158: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/158.jpg)
158
HC-Search: Search Space Design Objective: High-quality outputs can be located at small depth
Target depth = 5
![Page 159: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/159.jpg)
159
HC-Search: Search Space Design
Objective: High-quality outputs can be located at small depth
Solution #1: Flipbit Search Space [JMLR, 2014]
Solution #2: Limited Discrepancy Search (LDS) Space [JMLR, 2014]
Defined in terms of a greedy predictor or policy
Solution #3: Segmentation Search Space for computer vision tasks [CVPR, 2015]
![Page 160: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/160.jpg)
160
Flip-bit Search Space
, praual
… … …
, araual , sraual , prauat , prauaz
, sraucl , staual
, strucl , stauce
… …
…
, stauaz , staucl
…
… …
…
, pragat
…
, praglt
…
…
…
root node Output of recurrent classifier
![Page 161: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/161.jpg)
161
Limited Discrepancy Search: Idea Limited Discrepancy Search [Harvey and Ginsberg, 1995]
Key idea: correct the response of recurrent classifier at a small no. of critical errors to produce high-quality outputs
• See IJCAI’16 paper on LDS for AND/OR search w/ applications to optimization tasks in graphical models
![Page 162: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/162.jpg)
162
Limited Discrepancy Search: Illustration Limited Discrepancy Search [Harvey and Ginsberg, 1995]
Key idea: correct the response of recurrent classifier at a small no. of critical errors to produce high-quality outputs
![Page 163: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/163.jpg)
163
Limited Discrepancy Search: Illustration Limited Discrepancy Search [Harvey and Ginsberg, 1995]
Key idea: correct the response of recurrent classifier at a small no. of critical errors to produce high-quality outputs
![Page 164: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/164.jpg)
164
LDS Space: Illustration
{ } , praual
{(1,a)} , araual
{(1,s)} , strual
{(2,t)} , ptrual
{(4,c)} , practi
… … …
{(1,s),(5,c)} , struct
{(1,s),(6,t)} , struat
{(2,t),(1,s)} , strual
… … … …
{(1,s),(6,t),(5,c)} , struct
… …
{(2,t),(1,s),(3,i)} , sticky
root node
![Page 165: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/165.jpg)
165
Quality of LDS Space
Expected target depth
𝑦𝑦∗
𝜖𝜖 𝑇𝑇
I.I.D error
![Page 166: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/166.jpg)
166
Quality of LDS Space
Expected target depth
𝑦𝑦∗
𝜖𝜖 𝑇𝑇
I.I.D error
o We can learn a classifier to optimize the I.I.D error 𝜖𝜖
Doppa, J.R., Fern, A., Tadepalli, P. Structured Prediction via Output Space Search. Journal of Machine Learning Research (JMLR), vol 15, 2014.
![Page 167: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/167.jpg)
167
Quality of LDS Space
Expected target depth
𝑦𝑦∗
𝜖𝜖 𝑇𝑇
I.I.D error
o We can learn a classifier to optimize the I.I.D error 𝜖𝜖
o Important contribution that helped HC-Search achieve state-of-the-art results
![Page 168: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/168.jpg)
168
Quality of Search Space: LDS vs. Flip-bit
Expected target depth of a search space
𝑦𝑦∗
𝜖𝜖 𝑇𝑇
𝑦𝑦∗
𝜖𝜖𝑟𝑟𝑇𝑇
LDS space
Flip-bit space I.I.D error
recurrent error
![Page 169: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/169.jpg)
169
Sparse LDS Space (k)
Complete LDS space is expensive each successor state generation requires running greedy
policy with the given discrepancy set # successors = 𝐿𝐿.𝑇𝑇, where 𝑇𝑇 is the size of the structured
output and 𝐿𝐿 is the number of labels
Sparse Search Space: Key Idea Sort discrepancies using recurrent classifier scores and pick
top-𝑘𝑘 choices # successors = 𝑘𝑘.𝑇𝑇 Parameter 𝑘𝑘 = # discrepancies for each variable controls the
trade-off between speed and accuracy In practice, very small 𝑘𝑘 suffice How can we deal with dependence on 𝑇𝑇?
![Page 170: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/170.jpg)
170
Aside: Very simple HC-Search Instantiation
Heuristic function Greedy recurrent classifier (or policy)
Search procedure Depth-first or Breadth-first Limited Discrepancy Search w/
bounded depth
Cost function Score the outputs generated by search procedure
![Page 171: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/171.jpg)
171
Computer Vision Tasks: Randomized Segmentation Space [Lam et al., 2015]
Key Idea: probabilistically sample likely object configurations in the image from a hierarchical segmentation tree
Segmentation selection
Candidate generation
![Page 172: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/172.jpg)
172
Pre-requisite: Hierarchical Segmentation Tree Berkeley segmentation tree Regions are very robust Regions are closed UCM level 0 corresponds to all super-pixels
UCM 0.0
UCM 1.0
UCM 0.5
(not to scale)
![Page 173: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/173.jpg)
173
Randomized Segmentation Space: Segmentation Selection
UCM 0.0
UCM 1.0
UCM 0.5
(not to scale)
Randomly pick threshold
𝜃𝜃~𝑈𝑈(0,1) to select a segmentation
from Berkeley Segmentation Algorithm
![Page 174: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/174.jpg)
174
Randomized Segmentation Space: Candidate Generation
For each segment, give it a label (based on segment’s current labels and neighboring segment labels) and add it to the candidate set
![Page 175: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/175.jpg)
175
Randomized Segmentation Space: Candidate Generation
For each segment, give it a label (based on segment’s current labels and neighboring segment labels) and add it to the candidate set
![Page 176: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/176.jpg)
176
Randomized Segmentation Space: Candidate Generation
For each segment, give it a label (based on segment’s current labels and neighboring segment labels) and add it to the candidate set
…etc…
![Page 177: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/177.jpg)
177
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 178: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/178.jpg)
178
Benchmark Domains
Handwriting recognition [Taskar et al., 2003] HW-Small and HW-Large
NET-Talk [Sejnowski and Rosenberg, 1987] Stress and Phoneme prediction
Scene labeling [Vogel et al., 2007]
𝑥𝑥 = 𝑦𝑦 =
s t r u c t u r e d 𝑥𝑥 = 𝑦𝑦 =
“photograph” /f-Ot@graf-/ 𝑥𝑥 = 𝑦𝑦 =
![Page 179: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/179.jpg)
179
Experimental Setup
Search space: LDS space
Search procedure: Greedy search
Time bound: 15 steps for sequences and 150 for scene labeling
Loss function: Hamming loss
Baselines Recurrent CRFs SVM-Struct SEARN CASCADES C-Search
![Page 180: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/180.jpg)
180
Results: comparison to state-of-the-art
Error-rates of different structured prediction algorithms
HW-Small HW-Large Phoneme Scene labeling
HC-Search 12.81 03.23 16.05 19.71
C-Search 17.41 07.41 20.91 27.05 CRF 19.97 13.11 21.09 -
SVM-Struct 19.64 12.49 21.70 - Recurrent 34.33 25.13 26.42 43.36
SEARN 17.88 09.42 22.74 37.69 CASCADES 13.02 03.22 17.41 -
![Page 181: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/181.jpg)
181
Results: comparison to state-of-the-art
Error-rates of different structured prediction algorithms
HW-Small HW-Large Phoneme Scene labeling
HC-Search 12.81 03.23 16.05 19.71
C-Search 17.41 07.41 20.91 27.05 CRF 19.97 13.11 21.09 -
SVM-Struct 19.64 12.49 21.70 - Recurrent 34.33 25.13 26.42 43.36
SEARN 17.88 09.42 22.74 37.69 CASCADES 13.02 03.22 17.41 -
![Page 182: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/182.jpg)
182
Results: comparison to state-of-the-art
Error-rates of different structured prediction algorithms
HW-Small HW-Large Phoneme Scene labeling
HC-Search 12.81 03.23 16.05 19.71
C-Search 17.41 07.41 20.91 27.05 CRF 19.97 13.11 21.09 -
SVM-Struct 19.64 12.49 21.70 - Recurrent 34.33 25.13 26.42 43.36
SEARN 17.88 09.42 22.74 37.69 CASCADES 13.02 03.22 17.41 -
HC-Search outperforms all the other algorithms including C-Search (our prior approach that uses a single function C to serve the dual roles of heuristic and cost function)
![Page 183: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/183.jpg)
183
Results: Loss Decomposition Analysis
𝝐𝝐 = 𝝐𝝐𝑯𝑯 + 𝝐𝝐𝑪𝑪|𝑯𝑯
Overall expected loss
Generation loss (Heuristic function)
Selection loss (Cost function)
![Page 184: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/184.jpg)
184
Results: Loss decomposition analysis
Phoneme Scene labeling
ERROR 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻
HC-Search 16.05 03.98 12.07 19.71 5.82 13.89
![Page 185: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/185.jpg)
185
Results: Loss decomposition analysis
Phoneme Scene labeling
ERROR 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻
HC-Search 16.05 03.98 12.07 19.71 5.82 13.89
Selection loss 𝝐𝝐𝑪𝑪|𝑯𝑯 contributes more to the overall loss
![Page 186: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/186.jpg)
186
Results: Loss decomposition analysis
Phoneme Scene labeling
ERROR 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻
HC-Search 16.05 03.98 12.07 19.71 05.82 13.89
C-Search 20.91 04.38 16.53 27.05 07.83 19.22
![Page 187: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/187.jpg)
187
Results: Loss decomposition analysis
Phoneme Scene labeling
ERROR 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻
HC-Search 16.05 03.98 12.07 19.71 05.82 13.89
C-Search 20.91 04.38 16.53 27.05 07.83 19.22
Improvement of HC-Search over C-Search is due to the improvement in the selection loss
![Page 188: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/188.jpg)
188
Results: Loss decomposition analysis
Phoneme Scene labeling
ERROR 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻 𝜖𝜖 𝜖𝜖𝐻𝐻 𝜖𝜖𝐶𝐶|𝐻𝐻
HC-Search 16.05 03.98 12.07 19.71 05.82 13.89
C-Search 20.91 04.38 16.53 27.05 07.83 19.22
Improvement of HC-Search over C-Search is due to the improvement in the selection loss
Clearly shows the advantage of separating the roles of heuristic and cost function
![Page 189: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/189.jpg)
189
Multi-Label Prediction: Problem
Input Output
1
1
1
0 0
0
…
…
sky
water
sand
computer chair
mountains
![Page 190: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/190.jpg)
190
Multi-Label Prediction: Problem
Commonly arises in various domains
Biology – predict functional classes of a protein/gene
Text – predict email tags or document classes
…
![Page 191: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/191.jpg)
191
Multi-Label Prediction: Challenges
Input Output
1
1
1
0 0
0
…
…
sky
water
sand
computer chair
mountains
o Joint prediction of labels to exploit the relationships between labels
o Automatically optimize the evaluation measure of the real-world task
![Page 192: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/192.jpg)
192
Multi-Label Prediction
Benchmark data
Dataset Domain #TR #TS #F #L 𝑬𝑬[𝒅𝒅]
Scene image 1211 1196 294 6 1.07
Emotions music 391 202 72 6 1.86
Medical text 333 645 1449 45 1.24
Genbase biology 463 199 1185 27 1.25
Yeast biology 1500 917 103 14 4.23
Enron text 1123 579 1001 53 3.37
LLog text 876 584 1004 75 1.18
Slashdot text 2269 1513 1079 22 2.15
![Page 193: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/193.jpg)
193
Multi-Label Prediction
Benchmark data
Dataset Domain #TR #TS #F #L 𝑬𝑬[𝒅𝒅]
Scene image 1211 1196 294 6 1.07
Emotions music 391 202 72 6 1.86
Medical text 333 645 1449 45 1.24
Genbase biology 463 199 1185 27 1.25
Yeast biology 1500 917 103 14 4.23
Enron text 1123 579 1001 53 3.37
LLog text 876 584 1004 75 1.18
Slashdot text 2269 1513 1079 22 2.15
Label vectors are highly sparse
![Page 194: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/194.jpg)
194
Multi-Label Prediction via HC-Search
HC-Search Exploit the sparsity property (Null vector + flip bits)
𝑥𝑥 , y = 000000 root node
𝑥𝑥 , y = 100000 𝑥𝑥 , y = 001000 𝑥𝑥 , y = 000001
… …
𝑥𝑥 , y = 101000 𝑥𝑥 , y = 001001
…
𝑥𝑥 , y = 111000 𝑥𝑥 , y = 101100 𝑥𝑥 , y = 001011
… …
![Page 195: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/195.jpg)
195
Multi-Label Prediction: Results F1 Accuracy Results
Algorithm Scene Emotions Medical Genbase Yeast Enron LLog Slashdot
BR 52.60 60.20 63.90 98.70 63.20 53.90 36.00 46.20
CC 59.10 57.50 64.00 99.40 63.20 53.30 26.50 44.90
ECC 68.00 62.60 65.30 99.40 64.60 59.10 32.20 50.20
M2CC 68.20 63.20 65.40 99.40 64.90 59.10 32.30 50.30
CLR 62.20 66.30 66.20 70.70 63.80 56.50 22.70 46.60
CDN 63.20 61.40 68.90 97.80 64.00 58.50 36.60 53.10
CCA 66.43 63.27 49.60 98.60 61.64 53.83 25.80 48.00
PIR 74.45 60.92 80.17 99.41 65.47 61.14 38.95 57.55
SML 68.50 64.32 68.34 99.62 64.32 57.46 34.95 55.73
RML 74.17 64.83 80.73 98.80 63.18 57.79 35.97 51.30
DecL 73.76 65.29 78.02 97.89 63.46 61.19 37.52 54.67
HC-Search 75.89 66.17 78.19 98.12 63.78 62.34 39.76 57.98
Doppa, J.R., Yu, J., Ma C., Fern, A., Tadepalli, P. HC-Search for Multi-Label Prediction: An Empirical Study. American Association of Artificial Intelligence (AAAI) Conference 2014.
![Page 196: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/196.jpg)
196
Detecting Basal Tubules of Nematocysts
o Imaged against significant background clutter (unavoidable)
o Biological objects have highly-deformable parts
Challenges:
![Page 197: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/197.jpg)
197
Detecting Basal Tubules of Nematocysts
Experimental Setup 80 images (training); 20 images (validation); 30 images (testing)
864
1024
32
32
o Patch labels
Patch
“0” Background
“1” Basal tubule
![Page 198: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/198.jpg)
198
Detecting Basal Tubules of Nematocysts
Baselines IID Classifier Pairwise CRFs (w/ ICM, LBP, Graph-cuts)
HC-Search Flipbit space (IID classifier + flip patch labels) Randomized Segmentation space
![Page 199: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/199.jpg)
199
Basal Tubule Detection Results
Algorithm Precision Recall F1
SVM 0.675 0.147 0.241 Logistic Regression 0.605 0.129 0.213
![Page 200: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/200.jpg)
200
Basal Tubule Detection Results
Algorithm Precision Recall F1
SVM 0.675 0.147 0.241 Logistic Regression 0.605 0.129 0.213
Pairwise CRF (w/ ICM) 0.432 0.360 0.393 Pairwise CRF (w/ LBP) 0.545 0.091 0.156 Pairwise CRF (w/ GC) 0.537 0.070 0.124
![Page 201: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/201.jpg)
201
Basal Tubule Detection Results
Algorithm Precision Recall F1
SVM 0.675 0.147 0.241 Logistic Regression 0.605 0.129 0.213
Pairwise CRF (w/ ICM) 0.432 0.360 0.393 Pairwise CRF (w/ LBP) 0.545 0.091 0.156 Pairwise CRF (w/ GC) 0.537 0.070 0.124
HC-Search (w/ Flipbit) 0.472 0.545 0.506
Lam, M., Doppa, J.R., Xu, S.H., Todorovic, S., Dietterich, T.G., Reft, A., Daly, M. Learning to Detect Basal Tubules of Nematocysts in SEM Images. IEEE Workshop on Computer Vision for Accelerated Biosciences (CVAB) 2013.
![Page 202: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/202.jpg)
202
Basal Tubule Detection Results
Algorithm Precision Recall F1
SVM 0.675 0.147 0.241 Logistic Regression 0.605 0.129 0.213
Pairwise CRF (w/ ICM) 0.432 0.360 0.393 Pairwise CRF (w/ LBP) 0.545 0.091 0.156 Pairwise CRF (w/ GC) 0.537 0.070 0.124
HC-Search (w/ Flipbit) 0.379 0.603 0.465 HC-Search (w/ Randomized) 0.831 0.651 0.729
Lam, M., Doppa, J.R., Todorovic, S., Dietterich, T.G. HC-Search for Structured Prediction in Computer Vision. IEEE International Conference on Computer Vision (CVPR) 2015.
![Page 203: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/203.jpg)
203
Basal Tubule Detection Results
o HC-Search significantly outperforms all the other algorithms
o Performance critically depends on the quality of the search space
Algorithm Precision Recall F1
SVM 0.675 0.147 0.241 Logistic Regression 0.605 0.129 0.213
Pairwise CRF (w/ ICM) 0.432 0.360 0.393 Pairwise CRF (w/ LBP) 0.545 0.091 0.156 Pairwise CRF (w/ GC) 0.537 0.070 0.124
HC-Search (w/ Flipbit) 0.379 0.603 0.465 HC-Search (w/ Randomized) 0.831 0.651 0.729
![Page 204: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/204.jpg)
204
Basal Tubule Detection Results Visual results:
HC-Search Ground-truth output
CRF w/ Graph cuts CRF w/ LBP
![Page 205: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/205.jpg)
205
Results: Stanford Background Dataset
HC-Search without using features from deep learning
Method Accuracy (%) Region Energy 76.4 SHL 76.9 RNN 78.1 ConvNet 78.8 ConvNet + NN 80.4 ConvNet + CRF 81.4 Pylon (No Bnd) 81.3 Pylon 81.9 HC-Search (w/ Randomized) 81.4
Benchmark for scene labeling in vision community
![Page 206: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/206.jpg)
206
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 207: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/207.jpg)
207
Engineering Methodology
Select a time-bounded search architecture High-quality search space (e.g., LDS space or its variant) Search procedure Time bound Effectiveness can be measured by performing LL-Search (loss
function as both heuristic and cost function)
Training and Debugging Overall error = generation error (heuristic) + selection error
(cost function) Take necessary steps to improve the appropriate error guided
by the decomposition
![Page 208: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/208.jpg)
208
Outline of HC-Search Framework Introduction Unifying view and high-level overview
Learning Algorithms Heuristic learning Cost function learning
Search Space Design
Experiments and Results
Engineering Methodology for applying HC-Search
Relation to Alternate Methods
![Page 209: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/209.jpg)
209
HC-Search vs. CRF/SSVM
Inference in CRF/SSVM Cost function needs to score exponential no. of outputs
Inference in HC-Search Cost function needs to score only the outputs generated
by the search procedure guided by heuristic 𝐻𝐻
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝐦𝐦𝒎𝒎𝒎𝒎 𝒚𝒚 ∈ 𝒀𝒀(𝒙𝒙)
𝑪𝑪(𝒙𝒙,𝒚𝒚)
F(x) = 𝐚𝐚𝐚𝐚𝐚𝐚 𝐦𝐦𝒎𝒎𝒎𝒎 𝒚𝒚 ∈ 𝒀𝒀𝑯𝑯(𝒙𝒙)
𝑪𝑪(𝒙𝒙,𝒚𝒚)
![Page 210: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/210.jpg)
210
HC-Search vs. Re-Ranking Algorithms
Re-Ranking Approaches k-best list from a generative model Michael Collins: Ranking Algorithms for Named Entity Extraction: Boosting and the Voted Perceptron. ACL 2002: 489-496
Diverse M-best modes of a probabilistic model Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich: Discriminative Re-ranking of Diverse Segmentations. CVPR 2013: 1923-1930
No guarantees on the quality of generated candidate set
HC-Search Candidate set is generated via generic search in high-quality
search spaces guided by the learned heuristic Minimal restrictions on the representation of heuristic PAC guarantees on the quality of candidate set
![Page 211: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/211.jpg)
211
HC-Search: A “Divide-and-Conquer” Solution
HC-Search is a “Divide-and-Conquer’’ solution with procedural knowledge injected into it All components have clearly pre-defined roles Every component is contributing towards the
overall goal by making the role of other components easier
![Page 212: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/212.jpg)
212
HC-Search: A “Divide-and-Conquer” Solution
Every component is contributing towards the overall goal by making the role of other components easier LDS space leverages greedy classifiers to reduce the target
depth to make the heuristic learning easier
Heuristic tries to make the cost function learning easier by generating high-quality outputs with as little search as possible
![Page 213: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/213.jpg)
213
Part 7: Future Directions
![Page 214: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/214.jpg)
214
Future Directions
Design and optimization of search spaces for complex structured prediction problems very under-studied problem
Leveraging deep learning advances to improve the performance of structured prediction approaches Loose vs. tight integration
Learning to trade-off speed and accuracy of structured prediction Active research topic, but relatively less work
What architectures are more suitable for “Anytime” predictions? How to learn for anytime prediction?
![Page 215: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/215.jpg)
215
Future Directions
Theoretical analysis: sample complexity and generalization bounds Lot of room for this line of work in the context of “learning” +
“search” approaches
Understanding and analyzing structured predictors in the context of integrated applications Pipelines in NLP and Vision among others
![Page 216: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/216.jpg)
216
Important References Classifier-based structured Prediction
Recurrent classifier: Thomas G. Dietterich, Hermann Hild, Ghulum Bakiri: A Comparison of ID3 and Backpropagation for English Text-to-Speech Mapping. Machine Learning 18(1): 51-80 (1995)
PAC Results and Error Propagation: Roni Khardon: Learning to Take Actions. Machine Learning 35(1): 57-90 (1999) Alan Fern, Sung Wook Yoon, Robert Givan: Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes. J. Artif. Intell. Res. (JAIR) 25: 75-118 (2006) Umar Syed, Robert E. Schapire: A Reduction from Apprenticeship Learning to Classification. NIPS 2010 Stéphane Ross, Drew Bagnell: Efficient Reductions for Imitation Learning. AISTATS 2010: 661-668
Advanced Imitation Learning Algorithms: SEARN: Hal Daumé III, John Langford, Daniel Marcu: Search-based structured prediction. Machine Learning 75(3): 297-325 (2009) DAgger: Stéphane Ross, Geoffrey J. Gordon, Drew Bagnell: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS 2011: 627-635 AggreVaTe: Stéphane Ross, J. Andrew Bagnell: Reinforcement and Imitation Learning via Interactive No-Regret Learning. CoRR abs/1406.5979 (2014) LOLS: Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford: Learning to Search Better than Your Teacher. ICML 2015: 2058-2066 Yuehua Xu, Alan Fern, Sung Wook Yoon: Iterative Learning of Weighted Rule Sets for Greedy Search. ICAPS 2010: 201-208 Alan Fern, Sung Wook Yoon, Robert Givan: Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes. J. Artif. Intell. Res. (JAIR) 25: 75-118 (2006)
![Page 217: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/217.jpg)
217
Important References Easy-first approach for structured Prediction
Yoav Goldberg, Michael Elhadad: An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing. HLT-NAACL 2010 Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nate Chambers, Mihai Surdeanu, Dan Jurafsky, Christopher D. Manning: A Multi-Pass Sieve for Coreference Resolution. EMNLP 2010 Lev-Arie Ratinov, Dan Roth: Learning-based Multi-Sieve Co-reference Resolution with Knowledge. EMNLP-CoNLL 2012 Veselin Stoyanov, Jason Eisner: Easy-first Coreference Resolution. COLING 2012 Jun Xie, Chao Ma, Janardhan Rao Doppa, Prashanth Mannem, Xiaoli Z. Fern, Thomas G. Dietterich, Prasad Tadepalli: Learning Greedy Policies for the Easy-First Framework. AAAI 2015
Learning Beam search heuristics for structured prediction Michael Collins, Brian Roark: Incremental Parsing with the Perceptron Algorithm. ACL 2004 Hal Daumé III, Daniel Marcu: Learning as search optimization: approximate large margin methods for structured prediction. ICML 2005 Yuehua Xu, Alan Fern, Sung Wook Yoon: Learning Linear Ranking Functions for Beam Search with Application to Planning. Journal of Machine Learning Research 10: 1571-1610 (2009) Liang Huang, Suphan Fayong, Yang Guo: Structured Perceptron with Inexact Search. HLT-NAACL 2012
![Page 218: Integrating Learning and Search for Structured Predictionjana/pubs/IJCAI-tutorial-all-parts.pdf · POS tagging, Parsing, Co-reference resolution, detecting parts of biological objects](https://reader030.vdocument.in/reader030/viewer/2022041120/5f338ac7a7487c12e27cdb1b/html5/thumbnails/218.jpg)
218
Important References
HC-Search Framework for structured Prediction Janardhan Rao Doppa, Alan Fern, Prasad Tadepalli: HC-Search: A Learning Framework for Search-based Structured Prediction. J. Artif. Intell. Res. (JAIR) 50: 369-407 (2014) Janardhan Rao Doppa, Alan Fern, Prasad Tadepalli: Structured prediction via output space search. Journal of Machine Learning Research 15(1): 1317-1350 (2014) Janardhan Rao Doppa, Jun Yu, Chao Ma, Alan Fern, Prasad Tadepalli: HC-Search for Multi-Label Prediction: An Empirical Study. AAAI 2014 Michael Lam, Janardhan Rao Doppa, Sinisa Todorovic, Thomas G. Dietterich: HC-Search for structured prediction in computer vision. CVPR 2015