learning from big data lecture 5 m. pawan kumar oval/ slides available online
TRANSCRIPT
![Page 1: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/1.jpg)
Learning from Big Data Lecture 5
M. Pawan Kumar
http://www.robots.ox.ac.uk/~oval/
Slides available online http://mpawankumar.info
![Page 2: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/2.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization
• Results
Outline
![Page 3: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/3.jpg)
Is this an urban or rural area?
Input: x Output: y {-1,+1}
Image Classification
![Page 4: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/4.jpg)
Is this scan healthy or unhealthy?
Input: x Output: y {-1,+1}
Image Classification
![Page 5: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/5.jpg)
y
xObserved input
Unobserved output
Label -1
Label +1Probabilistic
GraphicalModel
Image Classification
![Page 6: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/6.jpg)
Feature Vector
x
FeatureΦ(x)
![Page 7: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/7.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
Feature Vector
x
FeatureΦ(x)
Pre-Trained CNN
![Page 8: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/8.jpg)
Joint Feature Vector
Input: x Output: y {-1,+1}
Ψ(x,y)
![Page 9: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/9.jpg)
Joint Feature Vector
Input: x Output: y {-1,+1}
Ψ(x,-1)
Φ(x)
0
=
![Page 10: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/10.jpg)
Joint Feature Vector
Input: x Output: y {-1,+1}
Ψ(x,+1)
0
Φ(x)
=
![Page 11: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/11.jpg)
Score Function
Input: x Output: y {-1,+1}
Ψ(x,y)f: → (-∞,+∞) wTΨ(x,y)
![Page 12: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/12.jpg)
Prediction
Input: x Output: y {-1,+1}
Ψ(x,y)f: → (-∞,+∞) wTΨ(x,y)
y* = argmaxy f(Ψ(x,y))
Maximize the score over all possible outputs
![Page 13: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/13.jpg)
• Structured Output Prediction– Binary Output– Multi-label Output– Structured Output– Learning
• Structured Output SVM
• Optimization
• Results
Outline
![Page 14: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/14.jpg)
Which city is this?
Input: x Output: y {1,2,…,C}
Image Classification
![Page 15: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/15.jpg)
What type of tumor does this scan contain?
Input: x Output: y {1,2,…,C}
Image Classification
![Page 16: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/16.jpg)
y
xObserved input
Unobserved output
123
C
GraphicalModel
Image Classification
![Page 17: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/17.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
Feature Vector
x
FeatureΦ(x)
Pre-Trained CNN
![Page 18: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/18.jpg)
Joint Feature Vector
Input: x Output: y {1,2,…,C}
Ψ(x,y)
![Page 19: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/19.jpg)
Joint Feature Vector
Input: x Output: y {1,2,…,C}
Ψ(x,1)
Φ(x)
0=
.
.
.
0
![Page 20: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/20.jpg)
Joint Feature Vector
Input: x Output: y {1,2,…,C}
Ψ(x,2)
0
Φ(x)=
.
.
.
0
![Page 21: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/21.jpg)
Joint Feature Vector
Input: x Output: y {1,2,…,C}
Ψ(x,C)
0
=
.
.
.
Φ(x)
0
![Page 22: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/22.jpg)
Where is the object in the image?
Input: x Output: y {Pixels}
Object Detection
![Page 23: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/23.jpg)
Where is the rupture in the scan?
Input: x Output: y {Pixels}
Object Detection
![Page 24: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/24.jpg)
y
xObserved input
Unobserved output
123
C
GraphicalModel
Object Detection
![Page 25: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/25.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
Joint Feature Vector
x
Ψ(x,y)
Pre-Trained CNNy
![Page 26: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/26.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
Joint Feature Vector
x
Ψ(x,y)
Pre-Trained CNNy
![Page 27: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/27.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
Joint Feature Vector
x
Ψ(x,y)
Pre-Trained CNNy
![Page 28: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/28.jpg)
Score Function
Input: x Output: y {1,2,…,C}
Ψ(x,y)f: → (-∞,+∞) wTΨ(x,y)
![Page 29: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/29.jpg)
Prediction
Input: x Output: y {1,2,…,C}
Ψ(x,y)f: → (-∞,+∞) wTΨ(x,y)
y* = argmaxy f(Ψ(x,y))
Maximize the score over all possible outputs
![Page 30: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/30.jpg)
• Structured Output Prediction– Binary Output– Multi-label Output– Structured Output– Learning
• Structured Output SVM
• Optimization
• Results
Outline
![Page 31: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/31.jpg)
What is the semantic class of each pixel?
Input: x Output: y {1,2,…,C}m
car
roadgrass
treesky
sky
Segmentation
![Page 32: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/32.jpg)
What is the muscle group of each pixel?
Input: x Output: y {1,2,…,C}m
Segmentation
![Page 33: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/33.jpg)
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
y7
x7
y8
x8
y9
x9
GraphicalModel
Segmentation
![Page 34: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/34.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
FeatureΦ(x1)
Pre-Trained CNNx1
Feature Vector
![Page 35: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/35.jpg)
Joint Feature Vector
Input: x1 Output: y1 {1,2,…,C}
Ψu(x1,1)
Φ(x1)
0=
.
.
.
0
![Page 36: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/36.jpg)
Joint Feature Vector
Input: x1 Output: y1 {1,2,…,C}
Ψu(x1,2)
0
Φ(x1)=
.
.
.
0
![Page 37: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/37.jpg)
Joint Feature Vector
Input: x1 Output: y1 {1,2,…,C}
Ψu(x1,C)
0
=
.
.
.
Φ(x1)
0
![Page 38: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/38.jpg)
conv1 conv2 conv3 conv4 conv5fc6
fc7
FeatureΦ(x2)
Pre-Trained CNNx2
Feature Vector
![Page 39: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/39.jpg)
Joint Feature Vector
Input: x2 Output: y2 {1,2,…,C}
Ψu(x2,1)
Φ(x2)
0=
.
.
.
0
![Page 40: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/40.jpg)
Joint Feature Vector
Input: x2 Output: y2 {1,2,…,C}
Ψu(x2,2)
0
Φ(x2)=
.
.
.
0
![Page 41: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/41.jpg)
Joint Feature Vector
Input: x2 Output: y2 {1,2,…,C}
Ψu(x2,C)
0
=
.
.
.
Φ(x2)
0
![Page 42: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/42.jpg)
Overall Joint Feature Vector
Input: x Output: y {1,2,…,C}m
Ψu(x,y)
Ψu(x1,y1)
=
.
.
.
Ψu(xm,ym)
Ψu(x2,y2)
![Page 43: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/43.jpg)
Score Function
Input: x Output: y {1,2,…,C}m
Ψu(x,y)f: → (-∞,+∞) wTΨu(x,y)
![Page 44: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/44.jpg)
Prediction
Input: x Output: y {1,2,…,C}m
Ψu(x,y)f: wTΨu(x,y)
y* = argmaxy f(Ψu(x,y))
→ (-∞,+∞)
![Page 45: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/45.jpg)
Prediction
Input: x Output: y {1,2,…,C}m
Ψu(x,y)f: wTΨu(x,y)
y* = argmaxy wTΨu(x,y)
→ (-∞,+∞)
![Page 46: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/46.jpg)
Prediction
Input: x Output: y {1,2,…,C}m
Ψu(x,y)f: wTΨu(x,y)
y* = argmaxy ∑a (wa)TΨu(xa,ya)
Maximize for each a {1,2,…,m} independently
→ (-∞,+∞)
![Page 47: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/47.jpg)
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
y7
x7
y8
x8
y9
x9
GraphicalModel
Segmentation
![Page 48: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/48.jpg)
Unary Joint Feature Vector
Input: x Output: y {1,2,…,C}m
Ψu(x,y)
Ψu(x1,y1)
=
.
.
.
Ψu(xm,ym)
Ψu(x2,y2)
![Page 49: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/49.jpg)
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
y7
x7
y8
x8
y9
x9
Pairwise Joint Feature Vector
![Page 50: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/50.jpg)
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
y7
x7
y8
x8
y9
x9
Ψp(x12,y12) = δ(y1=y2)
Pairwise Joint Feature Vector
![Page 51: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/51.jpg)
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
y7
x7
y8
x8
y9
x9
Ψp(x23,y23) = δ(y2=y3)
Pairwise Joint Feature Vector
![Page 52: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/52.jpg)
Input: x Output: y {1,2,…,C}m
Ψp(x,y)
Ψp(x12,y12)
=
.
.
.
Ψp(x23,y23)
Pairwise Joint Feature Vector
![Page 53: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/53.jpg)
Overall Joint Feature Vector
Input: x Output: y {1,2,…,C}m
Ψ(x,y)Ψu(x,y)
=Ψp(x,y)
![Page 54: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/54.jpg)
Score Function
Input: x Output: y {1,2,…,C}m
Ψ(x,y)f: → (-∞,+∞) wTΨ(x,y)
![Page 55: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/55.jpg)
Prediction
Input: x Output: y {1,2,…,C}m
Ψ(x,y)f: wTΨ(x,y)
y* = argmaxy f(Ψ(x,y))
→ (-∞,+∞)
![Page 56: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/56.jpg)
Prediction
Input: x Output: y {1,2,…,C}m
Ψ(x,y)f: wTΨ(x,y)
y* = argmaxy wTΨ(x,y)
→ (-∞,+∞)
![Page 57: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/57.jpg)
Prediction
Input: x Output: y {1,2,…,C}m
Ψ(x,y)f: wTΨ(x,y)
y* = argmaxy ∑a (wa)TΨu(xa,ya)
→ (-∞,+∞)
+ ∑a,b (wab)TΨp(xab,yab)
Week 5 “Optimization” lectures
![Page 58: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/58.jpg)
Input x,Outputs{y1,y2,..}
FeaturesΨ(x,yi)
Scoresf(Ψ(x,yi))
Extract Features
ComputeScores
maxyi f(Ψ(x,yi))
Predictiony(f)
How do I fix “f”?
Summary
![Page 59: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/59.jpg)
• Structured Output Prediction– Binary Output– Multi-label Output– Structured Output– Learning
• Structured Output SVM
• Optimization
• Results
Outline
![Page 60: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/60.jpg)
Data distribution P(x,y)
Prediction
f* = argminf EP(x,y) Error(y(f),y)
Ground Truth
Measure of prediction quality
Distribution is unknown
Expectation overdata distribution
Learning Objective
![Page 61: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/61.jpg)
Training data {(xi,yi), i = 1,2,…,n}
Prediction
f* = argminf EP(x,y) Error(y(f),y)
Ground Truth
Measure of prediction quality
Expectation overdata distribution
Learning Objective
![Page 62: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/62.jpg)
Training data {(xi,yi), i = 1,2,…,n}
Prediction
f* = argminf Σi Error(yi(f),yi)
Ground Truth
Measure of prediction quality
Expectation overempirical distribution
Finite samples
Learning Objective
![Page 63: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/63.jpg)
Training data {(xi,yi), i = 1,2,…,n}
f* = argminf Σi Error(yi(f),yi) + λ R(f)
Finite samples
RegularizerRelative weight(hyperparameter)
Learning Objective
![Page 64: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/64.jpg)
Training data {(xi,yi), i = 1,2,…,n}
f* = argminf Σi Error(yi(f),yi) + λ R(f)
Finite samples
Learning Objective
Error can be negative log-likelihood
Probabilistic model
![Page 65: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/65.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization
• Results
Outline
Taskar et al. NIPS 2003; Tsochantaridis et al. ICML 2004
![Page 66: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/66.jpg)
Score Function and Prediction
Input: x Output: y
Joint feature vector of input and output: Ψ(x,y)
f(Ψ(x,y)) = wTΨ(x,y)
Prediction: maxy wTΨ(x,y)
Predicted Output: y(w) = argmaxy wTΨ(x,y)
![Page 67: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/67.jpg)
Δ(y,y(w))
Loss or risk of prediction given ground-truth
Error Function
Classification loss?
User specified
“New York” 0
“Paris” 1
Δ(y,y(w)) = δ(y=y(w))
![Page 68: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/68.jpg)
Δ(y,y(w))
Loss or risk of prediction given ground-truth
Error Function
Detection loss?
User specified
Overlap score
Area of intersection
Area of union
![Page 69: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/69.jpg)
Δ(y,y(w))
Loss or risk of prediction given ground-truth
Error Function
Segmentation loss?
User specified
car
roadgrass
treesky Fraction of incorrect pixels
Micro-average
Macro-average
![Page 70: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/70.jpg)
Training data {(xi,yi), i = 1,2,…,n}
Δ(yi,yi(w))
Loss function for i-th sample
Minimize the regularized sum of loss over training data
Highly non-convex in w
Regularization plays no role (overfitting may occur)
Learning Objective
![Page 71: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/71.jpg)
Training data {(xi,yi), i = 1,2,…,n}
Δ(yi,yi(w))wTΨ(xi,yi(w)) + - wTΨ(xi,yi(w))
≤ wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi)
≤ maxy { wTΨ(xi,y) + Δ(yi,y) } - wTΨ(xi,yi)
ConvexSensitive to regularization of w
Learning Objective
![Page 72: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/72.jpg)
Training data {(xi,yi), i = 1,2,…,n}
wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi) ≤ ξi for all y
minw ||w||2 + C Σiξi
Learning Objective
Quadratic program with large number of constraints
Many polynomial time algorithms
![Page 73: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/73.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization– Stochastic subgradient descent– Conditional gradient aka Frank-Wolfe
• Results
Outline
Shalev-Shwartz et al. Mathematical Programming 2011
![Page 74: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/74.jpg)
Convex function g(z)
Gradient
Gradient s at a point z0
g(z) – g(z0) ≥ sT(z-z0)
g(z) = z2
Gradient? 2z0
![Page 75: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/75.jpg)
minz g(z)
Gradient DescentStart at some point z0
g(z) = z2
Move along the negative gradient direction
zt+1 ← zt – λtg’(zt) Estimate step-size via line search
![Page 76: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/76.jpg)
Convex function g(z)
Gradient
Gradient s at a point z0
g(z) – g(z0) ≥ sT(z-z0)
May not exist
g(z) = |z|
s?
![Page 77: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/77.jpg)
Convex function g(z)
Subgradient
Subgradient s at a point z0
g(z) – g(z0) ≥ sT(z-z0)
May not be unique
g(z) = |z|
![Page 78: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/78.jpg)
minz g(z)
Subgradient DescentStart at some point z0
Move along the negative subgradient direction
zt+1 ← zt – λtg’(zt) Estimate step-size via line search
g(z) = |z|
Doesn’t always work
![Page 79: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/79.jpg)
minz max{z2 + 2z1, z2 - 2z1}
Subgradient Descent
g(z) = 5
g(z) = 4
g(z) = 3
z1
z20
5
-2
1
-λ
5+3λ5
![Page 80: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/80.jpg)
minz g(z)
Subgradient DescentStart at some point z0
Move along the negative subgradient direction
zt+1 ← zt – λtg’(zt) Estimate step-size via line search
g(z) = |z|
Doesn’t always work
![Page 81: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/81.jpg)
minz g(z)
Subgradient DescentStart at some point z0
Move along the negative subgradient direction
zt+1 ← zt – λtg’(zt) limT→∞∑1T λt = ∞
g(z) = |z|
Convergence
limt→∞ λt = 0
![Page 82: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/82.jpg)
Training data {(xi,yi), i = 1,2,…,n}
wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi) ≤ ξi for all y
minw ||w||2 + C Σiξi
Learning Objective
Constrained problem?
![Page 83: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/83.jpg)
Training data {(xi,yi), i = 1,2,…,n}
C Σi maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
minw ||w||2 +
Learning Objective
Subgradient?
g(z) – g(z0) ≥ sT(z-z0)
![Page 84: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/84.jpg)
C Σi maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
Subgradient
Ψ(xi,y) - Ψ(xi,yi)
![Page 85: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/85.jpg)
C Σi maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
Subgradient
Ψ(xi,ŷ) - Ψ(xi,yi)
ŷ = argmaxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
Proof?
![Page 86: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/86.jpg)
C Σi maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
Subgradient
Ψ(xi,ŷ) - Ψ(xi,yi)
ŷ = argmaxy{wTΨ(xi,y) + Δ(yi,y)} Inference
![Page 87: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/87.jpg)
Inference
Classification inference
ŷ = argmaxy{wTΨ(xi,y) + Δ(yi,y)}
Output: y {1,2,…,C}
Brute-force search
![Page 88: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/88.jpg)
Inference
Detection inference
ŷ = argmaxy{wTΨ(xi,y) + Δ(yi,y)}
Output: y {1,2,…,C}
Brute-force search
![Page 89: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/89.jpg)
Inference
Segmentation inference
ŷ = argmaxy{wTΨ(xi,y) + Δ(yi,y)}
car
roadgrass
treeskymaxy ∑a (wa)TΨu(xi
a,ya)
+ ∑a,b (wab)TΨp(xiab,yab)
+ ∑a Δ(yia,ya)
Week 5 “Optimization” lectures
![Page 90: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/90.jpg)
Subgradient Descent
Start at some parameter w0
For t = 0 to T
End
s = 2wt
For i = 1 to n
// Number of iterations
// Number of samples
End
ŷ = maxy{wtTΨ(xi,y) + Δ(yi,y)}
s = s + C(Ψ(xi,ŷ) - Ψ(xi,yi))
wt+1 = wt + λtst λt = 1/(t+1)
![Page 91: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/91.jpg)
Subgradient Descent
Start at some parameter w0
For t = 0 to T
End
s = 2wt
For i = 1 to n
// Number of iterations
// Number of samples
End
ŷ = maxy{wtTΨ(xi,y) + Δ(yi,y)}
s = s + C(Ψ(xi,ŷ) - Ψ(xi,yi))
wt+1 = wt + λtst λt = 1/(t+1)
![Page 92: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/92.jpg)
Training data {(xi,yi), i = 1,2,…,n}
C Σi maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
minw ||w||2 +
Learning Objective
![Page 93: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/93.jpg)
Training data {(xi,yi), i = 1,2,…,n}
C Σi maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
minw ||w||2 +
Stochastic Approximation
Choose a sample ‘i’ with probability 1/n
![Page 94: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/94.jpg)
Training data {(xi,yi), i = 1,2,…,n}
Cn maxy{wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi)}
minw ||w||2 +
Stochastic Approximation
Choose a sample ‘i’ with probability 1/n
Expected value? Original objective function
![Page 95: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/95.jpg)
Stochastic Subgradient Descent
Start at some parameter w0
For t = 0 to T
End
s = 2wt
Choose a sample ‘i’ with probability 1/n
// Number of iterations
ŷ = maxy{wtTΨ(xi,y) + Δ(yi,y)}
s = s + Cn(Ψ(xi,ŷ) - Ψ(xi,yi))
wt+1 = wt + λtst λt = 1/(t+1)
![Page 96: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/96.jpg)
Convergence Rate
Compute an ε-optimal solution
C: SSVM hyperparameter
d: Number of non-zeros in the feature vector
O(dC/ε) iterations
Each iteration requires solving an inference problem
![Page 97: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/97.jpg)
Side Note: Structured Output CNN
conv1 conv2 conv3 conv4 conv5fc6
fc7SSVM
Back-propagate the subgradients
![Page 98: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/98.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization– Stochastic subgradient descent– Conditional gradient aka Frank-Wolfe
• Results
Outline
Lacoste-Julien et al. ICML 2013
![Page 99: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/99.jpg)
Slide courtesy Martin Jaggi
Conditional Gradient
![Page 100: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/100.jpg)
Slide courtesy Martin Jaggi
Conditional Gradient
![Page 101: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/101.jpg)
Slide courtesy Martin Jaggi
Conditional Gradient
![Page 102: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/102.jpg)
Slide courtesy Martin Jaggi
Conditional Gradient
![Page 103: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/103.jpg)
SSVM Primal
wTΨ(xi,y) + Δ(yi,y) - wTΨ(xi,yi) ≤ ξi for all y
minw ||w||2 + C Σiξi
Derive dual on board
![Page 104: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/104.jpg)
SSVM Dual
∑y αi(y) = C
maxα ||Mα||2/4 + bTα
for all i
αi(y) ≥ 0 for all i, y
w = Mα/2
bT = [Δ(yi,y)]
![Page 105: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/105.jpg)
Linear Program
∑y αi(y) = C
maxα (Mα)Twt + bTα
for all i
αi(y) ≥ 0 for all i, y
Solve this over all possible α
Standard Frank-Wolfe
Solve this over all possible αi for a sample ‘i’
Block Coordinate Frank-Wolfe
![Page 106: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/106.jpg)
Linear Program
∑y αi(y) = C
maxα (Mα)Twt + bTα
for all i
αi(y) ≥ 0 for all i, y
Vertices?
αi(y) =C, if y = ŷ
0, otherwise
![Page 107: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/107.jpg)
Solution
∑y αi(y) = C
maxα (Mα)Twt + bTα
for all i
αi(y) ≥ 0 for all i, y
ŷ = argmaxy{wtTΨ(xi,y) + Δ(yi,y)}
si(y) =C, if y = ŷ
0, otherwise
Inference
Which one maximizes the linear function?
![Page 108: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/108.jpg)
Update
αt+1 = (1-μ) αt + μs
Standard Frank-Wolfe
s contains the solution for all the samples
Block Coordinate Frank-Wolfe
s contains the solution for sample ‘i’
sj = αtj for all other samples
![Page 109: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/109.jpg)
Step-Size
αt+1 = (1-μ) αt + μs
Maximizing a quadratic function in one variable μ
Analytical computation of optimal step-size
![Page 110: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/110.jpg)
Comparison
OCR Dataset
![Page 111: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/111.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization
• Results– Exact Inference– Approximate Inference– Choice of Loss Function
Outline
![Page 112: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/112.jpg)
Optical Character Recognition
Identify each letter in a handwritten word
Taskar, Guestrin and Koller, NIPS 2003
![Page 113: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/113.jpg)
Optical Character Recognition
Taskar, Guestrin and Koller, NIPS 2003
X1 X2 X3 X4
Labels L = {a, b, …., z}
Logistic Regression Multi-Class SVM
![Page 114: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/114.jpg)
Optical Character Recognition
Taskar, Guestrin and Koller, NIPS 2003
X1 X2 X3 X4
Labels L = {a, b, …., z}
Maximum Likelihood Structured Output SVM
![Page 115: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/115.jpg)
Optical Character Recognition
Taskar, Guestrin and Koller, NIPS 2003
![Page 116: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/116.jpg)
Image Segmentation
Szummer, Kohli and Hoiem, ECCV 2006
![Page 117: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/117.jpg)
Image Segmentation
Szummer, Kohli and Hoiem, ECCV 2006
X1 X2 X3
X4 X5 X6
X7 X8 X9
Labels L = {0, 1}
![Page 118: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/118.jpg)
Image Segmentation
Szummer, Kohli and Hoiem, ECCV 2006
Unary Max Likelihood SSVM0
5
10
15
20
25
![Page 119: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/119.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization
• Results– Exact Inference– Approximate Inference– Choice of Loss Function
Outline
![Page 120: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/120.jpg)
Scene Dataset
Finley and Joachims, ICML 2008
Greedy LBP Combine Exact LP9.6
9.8
10
10.2
10.4
10.6
10.8
11
11.2
11.4
![Page 121: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/121.jpg)
Reuters Dataset
Finley and Joachims, ICML 2008
Greedy LBP Combine Exact LP0
2
4
6
8
10
12
14
16
18
![Page 122: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/122.jpg)
Yeast Dataset
Finley and Joachims, ICML 2008
Greedy LBP Combine Exact LP0
5
10
15
20
25
30
35
40
45
50
![Page 123: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/123.jpg)
Mediamill Dataset
Finley and Joachims, ICML 2008
Greedy LBP Combine Exact LP0
5
10
15
20
25
30
35
40
![Page 124: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/124.jpg)
• Structured Output Prediction
• Structured Output SVM
• Optimization
• Results– Exact Inference– Approximate Inference– Choice of Loss Function
Outline
![Page 125: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/125.jpg)
“Jumping” Classification
![Page 126: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/126.jpg)
Standard Pipeline
Collect dataset D = {(xi,yi), i = 1, …., n}
Learn your favourite classifier
Classifier assigns a score to each test sample
Threshold the score for classification
![Page 127: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/127.jpg)
“Jumping” RankingRank 1 Rank 2 Rank 3
Rank 4 Rank 5 Rank 6
Average Precision = 1
![Page 128: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/128.jpg)
Ranking vs. ClassificationRank 1 Rank 2 Rank 3
Rank 4 Rank 5 Rank 6
Average Precision = 1 Accuracy = 1= 0.92 = 0.67= 0.81
![Page 129: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/129.jpg)
Standard Pipeline
Collect dataset D = {(xi,yi), i = 1, …., n}
Learn your favourite classifier
Classifier assigns a score to each test sample
Sort the score for ranking
![Page 130: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/130.jpg)
Computes subgradients of the AP loss
![Page 131: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/131.jpg)
Train
ing
Tim
e
0-1
AP
5x slower
Yue, Finley, Radlinski and Joachims, SIGIR 2007
Avera
ge P
reci
sion
0-1 AP
4% improvementfor free
![Page 132: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/132.jpg)
Efficient Optimization ofAverage Precision
Pritish Mohapatra C. V. Jawahar M. Pawan Kumar
![Page 133: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/133.jpg)
Train
ing
Tim
e
0-1
AP
5x slowerAP
Slightly faster
Each iteration for AP optimization is slightly slower
It takes fewer iterations to converge in practice
![Page 134: Learning from Big Data Lecture 5 M. Pawan Kumar oval/ Slides available online](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01e1a28abf838cd0dfe/html5/thumbnails/134.jpg)
Questions?