Download - Introduction
![Page 1: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/1.jpg)
CS 6961: Structured PredictionFall 2014
Introduction
Lecture 1What is structured prediction?
![Page 2: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/2.jpg)
2
Our goal today
To define a Structure and Structured Prediction
![Page 3: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/3.jpg)
3
What are structures?
![Page 4: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/4.jpg)
4
What are some examples of structured data?
• Database tables and spreadsheets• HTML documents• JSON objects• Wikipedia info-boxes• Computer programs
… we will see more examples
![Page 5: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/5.jpg)
5
What are some examples of unstructured data?
• Images and photographs• Videos• Text documents• PDF files• Books• …
What makes these unstructured?
How are they different from the previous list?
![Page 6: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/6.jpg)
6
Structured representations are useful because ….
• We know how to process them– Algorithms for managing symbolic data– Computational complexity well understood
• They abstract away unnecessary complexities– Why deal with text/images/etc when you can process a
database with the same information?
![Page 7: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/7.jpg)
7
Have we made the problem easier?
What does the splitting of water lead to? A: Light absorption B: Transfer of ions
Water is split, providing a source of electrons and protons (hydrogen ions, H+) and giving off O2 as a by-product. Light absorbed by chlorophyll drives a transfer of the electrons and hydrogen ions from water to an acceptor called NADP+.
![Page 8: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/8.jpg)
8
Reading comprehension can be hard!
Water is split, providing a source of electrons and protons (hydrogen ions, H+) and giving off O2 as a by-product. Light absorbed by chlorophyll drives a transfer of the electrons and hydrogen ions from water to an acceptor called NADP+.
Water is split, providing a source of electrons and protons (hydrogen ions, H+) and giving off O2 as a by-product. Light absorbed by chlorophyll drives a transfer of the electrons and hydrogen ions from water to an acceptor called NADP+.
Enable
Cause
What does the splitting of water lead to? A: Light absorption B: Transfer of ions
To answer the question, we need a structured representation.
![Page 9: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/9.jpg)
9
Machine learning to the rescue
• Techniques from statistical learning can help build these representations
• In fact, machine learning is necessary to scale up and generalize this process
![Page 10: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/10.jpg)
10
A detour about classification
![Page 11: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/11.jpg)
11
Classification
• We know how to train classifiers
– Given an email, spam or not spam?– Is a review positive or negative?– Which folder should an email be automatically be placed
into?– “Predict if a car purchased at an auction is a lemon”
• And other such questions from kaggle
![Page 12: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/12.jpg)
12
Standard classification setting
• Notation– X: A feature representation of input– Y: One of a set of labels (spam, not-spam)
• The goal: To learn a function X ! Y that maps examples into a category
• The standard recipe1. Collect labeled examples {(x1,y1), (x2, y2), }
2. Train a function f: X ! Y that a. Is consistent with the observed examples, andb. Can hopefully be correct on new unseen examples
Based on slides of Dan Roth
![Page 13: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/13.jpg)
13
Classification is generally well understood
• Theoretically: generalization bounds– We know how many examples one needs to see to guarantee good behavior
on unseen examples• Algorithmically: good learning algorithms for linear representations
– Efficient and can deal with high dimensionality (millions of features)• Some open questions
– What is a good feature representation?– Learning protocols: how to minimize supervision, efficient semi-supervised
learning, active learning
Based on slides of Dan Roth
Is this sufficient for solving problems like the reading comprehension one? No!
![Page 14: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/14.jpg)
14
Examples where standard classification is not enough
![Page 15: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/15.jpg)
15
Semantic Role Labeling
X: John saw the dog chasing the ball.
Y:Predicate see
Viewer John
Thing viewed The dog chasing the ball
Predicate Chase
Chaser The dog
Thing chased the ball
See(Viewer: John, Viewed: the dog chasing the ball)
Chase(Chaser: the dog, Chased: the ball)
Or equivalently, predicate-argument representations
![Page 16: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/16.jpg)
16
Semantic Parsing
X: “A python function that takes a name and prints the string Hello followed by the name and exits.” Y:
X: “Find the largest state in the US.” Y: SELECT name
FROM us_states WHERE size = (SELECT MAX(size) FROM us_states)
In all these cases, the output Y is a structure
![Page 17: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/17.jpg)
17
What is a structure? One definition
By … linguistic structure, we refer to symbolic representations of language posited by some theory of language.
From the book Linguistic Structure Prediction, by Noah Smith, 2011.
![Page 18: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/18.jpg)
18
What is in this picture?
Photo by Andrew Dressel - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0
![Page 19: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/19.jpg)
19
Object detection
Photo by Andrew Dressel - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0
Right facingbicycle
left wheelright wheel
handle barsaddle/seat
![Page 20: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/20.jpg)
20
The output: A schematic showing the parts and their relative layout
left wheelright wheel
handle barsaddle/seat
Right facingbicycle
Once again, a structure
![Page 21: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/21.jpg)
21
A working definition of a structure
A structure is a concept that can be applied to any complex thing, whether it be a bicycle, a commercial company, or a carbon molecule. By complex, we mean:
1. It is divisible into parts,
2. There are different kinds of parts,
3. The parts are arranged in a specifiable way, and,
4. Each part has a specifiable function in the structure of the thing as a whole
From the book Analysing Sentences: An Introduction to English Syntax by Noel Burton-Roberts, 1986.
![Page 22: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/22.jpg)
22
What is structured prediction?
![Page 23: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/23.jpg)
23
Standard classification tools can’t predict structures
X: “Find the largest state in the US.”Y:
Classification is about making one decision– Spam or not spam, or predict one label, etc
We need to make multiple decisions– Each part needs a label
• Should “US” be mapped to us_states or utah_counties?• Should “Find” be mapped to SELECT or FROM or WHERE?
– The decisions interact with each other• If the outer FROM clause talks about the table us_states, then the inner FROM clause should not talk about
utah_counties
– How to compose the fragments together to create the whole structure?• Should the output consist of a WHERE clause? What should go in it?
SELECT name FROM us_states WHERE size = (SELECT MAX(size) FROM us_states)
![Page 24: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/24.jpg)
24
Structured prediction: Machine learning of interdependent variables
• Unlike standard classification problems, many problems have structured outputs with– Multiple interdependent output variables– Both local and global decisions to be made
• Mutual dependencies necessitate a joint assignment to all the output variables– Joint inference or Global inference or simply Inference
• These problems are called structured output problems
![Page 25: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/25.jpg)
25
Computational issues
Model definitionWhat are the parts of the output? What are the inter-dependencies?
How to train the model? How to do inference?
Data annotation difficulty
Background knowledge about
domain
Semi-supervised/indirectly supervised?
![Page 26: Introduction](https://reader036.vdocument.in/reader036/viewer/2022062721/5681360c550346895d9d840d/html5/thumbnails/26.jpg)
26
Another look at the important issues
• Availability of supervision– Supervised algorithms are well studied; supervision is hard (or
expensive) to obtain
• Complexity of model– More complex models encode complex dependencies between
parts; complex models make learning and inference harder
• Features– Most of the time we will assume that we have a good feature set
to model our problem. But do we?
• Domain knowledge– Incorporating background knowledge into learning and inference
in a mathematically sound way