![Page 1: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/1.jpg)
Multi-label Classification
Jesse Readhttps://users.ics.aalto.fi/jesse/
Department of Information and Computer ScienceHelsinki, Finland
Summer School on Data Sciences for Big DataSeptember 5, 2015
![Page 2: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/2.jpg)
Multi-label Classification
Binary classification: Is this a picture of the sea?
∈ {yes, no}
![Page 3: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/3.jpg)
Multi-label Classification
Multi-class classification: What is this a picture of?
∈ {sea, sunset, trees, people, mountain, urban}
![Page 4: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/4.jpg)
Multi-label Classification
Multi-label classification: Which labels are relevant to thispicture?
⊆ {sea, sunset, trees, people, mountain, urban}
i.e., multiple labels per instance instead of a single label!
![Page 5: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/5.jpg)
Multi-label Classification
K = 2 K > 2L = 1 binary multi-classL > 1 multi-label multi-output†
† also known as multi-target, multi-dimensional.
Figure: For L target variables (labels), each of K values.
multi-output can be cast to multi-label, just as multi-classcan be cast to binary.
tagging / keyword assignment: set of labels (L) is notpredefined
![Page 6: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/6.jpg)
Increasing Interest
year in text in title1996-2000 23 12001-2005 188 182006-2010 1470 1642011-2015 4550 485
Table: Academic articles containing the phrase ‘multi-labelclassification’ (Google Scholar)
![Page 7: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/7.jpg)
Single-label vs. Multi-label
Table: Single-label Y ∈ {0, 1}X1 X2 X3 X4 X5 Y1 0.1 3 1 0 00 0.9 1 0 1 10 0.0 1 1 0 01 0.8 2 0 1 11 0.0 2 0 1 0
0 0.0 3 1 1 ?
Table: Multi-label Y ⊆ {λ1, . . . , λL}X1 X2 X3 X4 X5 Y1 0.1 3 1 0 {λ2, λ3}0 0.9 1 0 1 {λ1}0 0.0 1 1 0 {λ2}1 0.8 2 0 1 {λ1, λ4}1 0.0 2 0 1 {λ4}0 0.0 3 1 1 ?
![Page 8: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/8.jpg)
Single-label vs. Multi-label
Table: Single-label Y ∈ {0, 1}X1 X2 X3 X4 X5 Y1 0.1 3 1 0 00 0.9 1 0 1 10 0.0 1 1 0 01 0.8 2 0 1 11 0.0 2 0 1 0
0 0.0 3 1 1 ?
Table: Multi-label [Y1, . . . ,YL] ∈ 2L
X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4
1 0.1 3 1 0 0 1 1 00 0.9 1 0 1 1 0 0 00 0.0 1 1 0 0 1 0 01 0.8 2 0 1 1 0 0 11 0.0 2 0 1 0 0 0 1
0 0.0 3 1 1 ? ? ? ?
![Page 9: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/9.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 10: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/10.jpg)
Text Categorization
For example, the news . . .
Novo Banco: Portugal bank sell-off hits snag
Portugal’s central bank has missed its deadline to sellNovo Banco, a bank created after the collapse of thecountry’s second-biggest lender.
Reuters collection, newswire stories into 103 topic codes
![Page 11: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/11.jpg)
Text Categorization
For example, the IMDb dataset: Textual movie plot summariesassociated with genres (labels).
![Page 12: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/12.jpg)
Text Categorization
For example, the IMDb dataset: Textual movie plot summariesassociated with genres (labels).
aban
don
ed
acci
den
t
...
viol
ent
wed
din
g
horror
romance
. . . comedy
action
i X1 X2 . . . X1000 X1001 Y1 Y2 . . . Y27 Y28
1 1 0 . . . 0 1 0 1 . . . 0 02 0 1 . . . 1 0 1 0 . . . 0 03 0 0 . . . 0 1 0 1 . . . 0 04 1 1 . . . 0 1 1 0 . . . 0 15 1 1 . . . 0 1 0 1 . . . 0 1...
......
. . ....
......
.... . .
......
120919 1 1 . . . 0 0 0 0 . . . 0 1
![Page 13: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/13.jpg)
Labelling E-mails
For example, the Enron e-mails multi-labelled to 53categories by the UC Berkeley Enron Email AnalysisProject
Company Business, Strategy, etc.Purely PersonalEmpty MessageForwarded email(s). . .company image – current. . .Jokes, humor (related to business). . .Emotional tone: worry / anxietyEmotional tone: sarcasm. . .Emotional tone: shameCompany Business, Strategy, etc.
![Page 14: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/14.jpg)
Labelling Images
Images are labelled to indicate
multiple concepts
multiple objects
multiple people
e.g., Scene data with concept labels⊆ {beach, sunset, foliage, field, mountain, urban}
![Page 15: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/15.jpg)
Applications: AudioLabelling music/tracks with genres / voices, concepts, etc.
e.g., Music dataset, audio tracks labelled with different moods,among: {
amazed-surprised,
happy-pleased,
relaxing-calm,
quiet-still,
sad-lonely,
angry-aggressive
}
![Page 16: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/16.jpg)
Medical
Medical Diagnosis
medical history, symptoms→ diseases / ailments
e.g., Medical dataset,
clinical free text reports by radiologists
label assignment out of 45 ICD-9-CM codes
![Page 17: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/17.jpg)
Bioinformatics
Genes are associated with biological functions.
E.g. the Yeast dataset: 2, 417 genes, described by 103attributes, labeled into 14 groups of the FunCAtfunctional catalogue.
![Page 18: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/18.jpg)
Related Tasksmulti-output1 classification: outputs are nominal
yj ∈ {1, . . . ,K }, y ∈ NL
multi-output regression: outputs are real-valued
yj ∈ R, y ∈ RL
X1 X2 X3 X4 X5 pri
ce
age
per
cen
t
x1 x2 x3 x4 x5 37.00 25 0.88x1 x2 x3 x4 x5 22.88 22 0.22x1 x2 x3 x4 x5 88.23 11 0.77
label ranking, i.e., preference learning
λ3 � λ1 � λ4 � . . . � λ2
1aka multi-target, multi-dimensional
![Page 19: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/19.jpg)
Related Areas
multi-task learning: multiple tasks, sharedrepresentation, data may come from different sourcese.g., learn to recognise speech for different speakers,classify text from different corpora
sequential learning: predict across time indices instead ofacross label indices
structured output prediction: assume particular structureamoung outputs, e.g., pixels
![Page 20: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/20.jpg)
Advanced Applications
Figure: Image Segmentation: Foreground yj = 1
![Page 21: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/21.jpg)
Advanced Applications
Figure: Localization: yj = 1 if j-th tile occupied.
![Page 22: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/22.jpg)
Advanced Applications
Figure: Demand prediction: yj = 1 if high demand at j-th node.
![Page 23: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/23.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 24: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/24.jpg)
Single-label Classification
y
x5x4x3x2x1
y = h(x) • classifier h
= argmaxy∈{0,1}
p(y|x) •MAP Estimate
![Page 25: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/25.jpg)
Example: Naive Bayes
y
x5x4x3x2x1
y = argmaxy∈{0,1}
p(y)p(x|y) • Generative, p(y|x) ∝ p(x|y)p(y)
= argmaxy∈{0,1}
p(y)D∏
d=1
p(xd |y) •Naive Bayes
![Page 26: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/26.jpg)
Example: Logistic Regression
y
x5x4x3x2x1
y = argmaxy∈{0,1}
p(y|x) •MAP Estimate
p(y = 1|x) = fw(x) =1
1 + exp(−w>x)• Logistic Regression
and find w to minimize E(w)
![Page 27: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/27.jpg)
Focus on the Labels
y
x5x4x3x2x1
y1 y2 y3 y4
x5x4x3x2x1
y4y3y2y1
xd
D
y4y3y2y1
x
![Page 28: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/28.jpg)
Multi-label Classification
y4y3y2y1
x
yj = hj(x) = argmaxyj∈{0,1}
p(yj |x) • for index, j = 1, . . . ,L
and then,
y = h(x) = [y1, . . . , y4]
=[
argmaxy1∈{0,1}
p(y1|x), · · · , argmaxy4∈{0,1}
p(y4|x)]
=[
f1(x), · · · , f4(x)]= f (W>x)
This is the Binary Relevance method (BR).
![Page 29: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/29.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 30: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/30.jpg)
BR Transformation1 Transform dataset . . .
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1. . . into L separate binary problems (one for each label)
X Y1
x(1) 0x(2) 1x(3) 0x(4) 1x(5) 0
X Y2
x(1) 1x(2) 0x(3) 1x(4) 0x(5) 0
X Y3
x(1) 1x(2) 0x(3) 0x(4) 0x(5) 0
X Y4
x(1) 0x(2) 0x(3) 0x(4) 1x(5) 1
2 and train with any off-the-shelf binary base classifier.
![Page 31: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/31.jpg)
Why Not Binary Relevance?
BR ignores label dependence, i.e.,
p(y|x) ∝ p(x)L∏
j=1
p(yj |x)
which may not always hold!
Example (Film Genre Classification)
p(yromance|x) 6= p(yromance|x, yhorror)
![Page 32: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/32.jpg)
Why Not Binary Relevance?BR ignores label dependence, i.e.,
p(y|x) ∝ p(x)L∏
j=1
p(yj |x)
which may not always hold!
Table: Average predictive performance (5 fold CV, EXACT MATCH)
L BR MCC
Music 6 0.30 0.37Scene 6 0.54 0.68Yeast 14 0.14 0.23Genbase 27 0.94 0.96Medical 45 0.58 0.62Enron 53 0.07 0.09Reuters 101 0.29 0.37
![Page 33: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/33.jpg)
Classifier Chains
Modelling label dependence,
y4y3y2y1
x
p(y|x) ∝ p(x)L∏
j=1
p(yj |x, y1, . . . , yj−1)
and,y = argmax
y∈{0,1}L
p(y|x)
![Page 34: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/34.jpg)
CC Transformation
Similar to BR: make L binary problems, but include previouspredictions as feature attributes,
X Y1
x(1) 0x(2) 1x(3) 0x(4) 1x(5) 0
X Y1 Y2
x(1) 0 1x(2) 1 0x(3) 0 1x(4) 1 0x(5) 0 0
X Y1 Y2 Y3
x(1) 0 1 1x(2) 1 0 0x(3) 0 1 0x(4) 1 0 0x(5) 0 0 0
X Y1 Y3 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1
and, again, apply any classifier (not necessarily a probabilisticone)!
![Page 35: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/35.jpg)
Greedy CC
y4y3y2y1
x
L classifiers for L labels. For test instance x, classify [22],1 y1 = h1(x)2 y2 = h2(x, y1)
3 y3 = h3(x, y1, y2)
4 y4 = h4(x, y1, y2, y3)
and returny = [y1, . . . , yL]
![Page 36: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/36.jpg)
Example
0
0
0
1
1
0
1
1
0
0
1
1
0
1
y = h(x) = [?, ?, ?]
y3y2y1
x
1 y1 = h1(x) =argmaxy1
p(y1|x) = 1
2 y2 = h2(x, y1) = . . . = 03 y3 = h3(x, y1, y2) = . . . = 1
Improves over BR; similar build time (if L < D);
able to use any off-the-shelf classifier for hj ; parralelizable
But, errors may be propagated down the chain
![Page 37: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/37.jpg)
Example
0
0
0
1
1
0
10.4
1
0
0
1
1
0
1
0.6
y = h(x) = [1, ?, ?]
y3y2y1
x
1 y1 = h1(x) =argmaxy1
p(y1|x) = 1
2 y2 = h2(x, y1) = . . . = 03 y3 = h3(x, y1, y2) = . . . = 1
Improves over BR; similar build time (if L < D);
able to use any off-the-shelf classifier for hj ; parralelizable
But, errors may be propagated down the chain
![Page 38: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/38.jpg)
Example
0
0
0
1
1
0
1
1
0
0
10.7
1
0
1
0.3
0.6
y = h(x) = [1, 0, ?]
y3y2y1
x
1 y1 = h1(x) =argmaxy1
p(y1|x) = 1
2 y2 = h2(x, y1) = . . . = 0
3 y3 = h3(x, y1, y2) = . . . = 1
Improves over BR; similar build time (if L < D);
able to use any off-the-shelf classifier for hj ; parralelizable
But, errors may be propagated down the chain
![Page 39: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/39.jpg)
Example
0
0
0
1
1
0
1
1
0
0
0.4
10.60.7
1
0
1
0.6
y = h(x) = [1, 0, 1]
y3y2y1
x
1 y1 = h1(x) =argmaxy1
p(y1|x) = 1
2 y2 = h2(x, y1) = . . . = 03 y3 = h3(x, y1, y2) = . . . = 1
Improves over BR; similar build time (if L < D);
able to use any off-the-shelf classifier for hj ; parralelizable
But, errors may be propagated down the chain
![Page 40: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/40.jpg)
Example
0
0
0
1
1
0
1
1
0
0
10.60.7
1
0
1
0.6
y = h(x) = [1, 0, 1]
y3y2y1
x
1 y1 = h1(x) =argmaxy1
p(y1|x) = 1
2 y2 = h2(x, y1) = . . . = 03 y3 = h3(x, y1, y2) = . . . = 1
Improves over BR; similar build time (if L < D);
able to use any off-the-shelf classifier for hj ; parralelizable
But, errors may be propagated down the chain
![Page 41: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/41.jpg)
Bayes Optimal CC
Bayes-optimal Probabilistic CC [4] (PCC)
y = argmaxy∈{0,1}L
p(y|x)
= argmaxy∈{0,1}L
{p(y1|x)
L∏j=2
p(yj |x, y1, . . . , yj−1)}• chain rule
y4y3y2y1
x
Test all possible paths (y = [y1, . . . , yL] ∈ 2L in total)
![Page 42: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/42.jpg)
Bayes Optimal CCExample
0
0
0
0.5
10.50.2
1
0
0.9
10.1
0.8
0.4
1
0
0
0.4
10.60.7
1
0
0.5
10.5
0.3
0.6
1 p(y = [0, 0, 0]) = 0.0402 p(y = [0, 0, 1]) = 0.0403 p(y = [0, 1, 0]) = 0.2884 . . .6 p(y = [1, 0, 1]) = 0.2527 . . .8 p(y = [1, 1, 1]) = 0.090
return argmaxy p(y|x)
Better accuracy than greedy CC but computationallylimited to L . 15
![Page 43: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/43.jpg)
Monte-Carlo search for CC1 For t = 1, . . . ,T iterations:
Sample yt ∼ p(y|x) the chain [20]
1 y1 ∼ p(y1|x) //y1 = 1 with probability p(y1|x)2 y2 ∼ p(y2|x, y1, y2)
3 . . .4 yL ∼ p(yL|x, y1, . . . , yL−1)
2 Predicty = argmax
yt∈{y1,...,yT }p(yt |x)
y4y3y2y1
x
![Page 44: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/44.jpg)
Monte-Carlo search for CC
Example
0
0
0
1
1
0
0.9
1
0.8
0.4
1
0
0
10.60.7
1
0
1
0.6
Sample T times . . .
p([1, 0, 1]) = 0.6 · 0.7 · 0.6 =0.252
p([0, 1, 0]) = 0.4 · 0.8 · 0.9 =0.288
return argmaxytp(yt |x)
![Page 45: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/45.jpg)
Monte-Carlo search for CCExample
0
0
0
1
1
0
0.9
1
0.8
0.4
1
0
0
10.60.7
1
0
1
0.6
Sample T times . . .
p([1, 0, 1]) = 0.6 · 0.7 · 0.6 =0.252
p([0, 1, 0]) = 0.4 · 0.8 · 0.9 =0.288
return argmaxytp(yt |x)
Tractable, with similar accuracy to (Bayes Optimal) PCC
Can use other search algorithms, e.g., beam search [13]
![Page 46: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/46.jpg)
Does Label-order Matter?Are these models equivalent?
y4y3y2y1
x
vs
y1y3y2y4
x
p(x, y) = p(y1|x)p(y2|y1, x) = p(y2|x)p(y1|y2, x)
but we are estimating p from finite and noisy data (andpossibly doing a greedy search); thus
p(y1|x)p(y2|y1, x) 6= p(y2|x)p(y1|y2, x)
![Page 47: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/47.jpg)
Does Label-order Matter?Are these models equivalent?
y4y3y2y1
x
vs
y1y3y2y4
x
p(x, y) = p(y1|x)p(y2|y1, x) = p(y2|x)p(y1|y2, x)
but we are estimating p from finite and noisy data (andpossibly doing a greedy search); thus
p(y1|x)p(y2|y1, x) 6= p(y2|x)p(y1|y2, x)
![Page 48: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/48.jpg)
Searching the Chain SpaceCan search the space of possible chain orderings [20] with, e.g.,Monte Carlo walk
ys4ys3ys2ys1
x
For u = 1, . . . ,U :
1 propose su = [s1, . . . , sL] =permute([1, . . . ,L])
2 build model on sequence su
3 evaluate; accept if better(if J (su) > J (su−1))
Use hsU as the final model.
ExampleScene data
u su = [s1, . . . , sL] J (su)0 [4, 2, 0, 1, 3, 5] 0.6231 [4, 2, 0, 3, 1, 5] 0.6282 [4, 2, 0, 3, 5, 1] 0.6383 [4, 0, 2, 3, 5, 1] 0.6475 [4, 0, 5, 2, 3, 1] 0.65318 [5, 1, 4, 3, 2, 0] 0.65423 [5, 4, 0, 1, 2, 3] 0.664128 [3, 5, 1, 0, 2, 4] 0.668176 [5, 3, 1, 0, 4, 2] 0.669225 [5, 3, 1, 4, 0, 2] 0.670J (s) := EXACTMATCH(Y, hs(X)) (higher is better)
![Page 49: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/49.jpg)
Searching the Chain Space
ys4ys3ys2ys1
x
The space is of combinational proportions, . . . but a littlesearch can go a long way.Many other options:
add temperature to freeze su from left to right over timeuse a population of chain sequences: s(1)
u , . . . , s(M)u
use beam search
![Page 50: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/50.jpg)
Chain Structure
We can formulate any structure,
yj = hj(x, pa(yj))
where pa(yj) = parents of node j.
y4y3y2y1
x
If pa(yj) := {y1, . . . , yj−1}we recover CC
‘partial’ models are more efficient and interpretable
![Page 51: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/51.jpg)
Structured Classifiers Chains1 Measure some heuristic
marginal dependence [30]conditional dependence [31]
2 Find a structure3 Plug in base classifiers and run some CC inference
y4y3y2y1
xx
Related to Bayesian networks, [1, 2]:
p(y, x) =L∏
j=1
p(yj |pa(yj), x)
![Page 52: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/52.jpg)
Structured Classifiers Chains1 Measure some heuristic
marginal dependence [30]conditional dependence [31]
2 Find a structure3 Plug in base classifiers and run some CC inference
y4y3y2y1
xx
Related to Bayesian networks, [1, 2]:
p(y, x) =L∏
j=1
p(yj |pa(yj), x)
![Page 53: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/53.jpg)
Label Powerset (LP)One multi-class problem (taking many values),
y = argmaxy∈{0,1}L
p(x)L∏
j=1
p(yj |x, y1, . . . , yj−1) • PCC
= argmaxy∈Y
p(y|x) • LP, where Y ⊂ {0, 1}L
≡ argmaxy∈{0,...,2L−1}
p(y|x) • a multi-class problem!
y1, y2, y3, y4
x
![Page 54: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/54.jpg)
Label Powerset (LP)
One multi-class problem (taking many values),
y = argmaxy∈{0,1}L
p(x)L∏
j=1
p(yj |x, y1, . . . , yj−1) • PCC
= argmaxy∈Y
p(y|x) • LP, where Y ⊂ {0, 1}L
≡ argmaxy∈{0,...,2L−1}
p(y|x) • a multi-class problem!
Each value is a label vector,
typically, the occurrences of the training set.
In practice, |Y| ≤ N , and |Y| � 2L
![Page 55: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/55.jpg)
Label Powerset Method (LP)1 Transform dataset . . .
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 1 0x(4) 1 0 0 1x(5) 0 0 0 1. . . into a multi-class problem, taking 2L possible values:
X Y ∈ 2L
x(1) 0110x(2) 1000x(3) 0110x(4) 1001x(5) 0001
2 . . . and train any off-the-shelf multi-class classifier.
![Page 56: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/56.jpg)
Issues with LP
complexity: there is no greedy label-by-label option
imbalance: few examples per class label
overfitting: how to predict new value?
Example
In the Enron dataset, 44% of labelsets are unique (a singletraining example or test instance). In del.icio.us dataset, 98%are unique.
![Page 57: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/57.jpg)
RAkELX Y ∈ 2L
x(1) 0110x(2) 1000x(3) 0110x(4) 1001x(5) 0001
Ensembles of RAndom k-labEL subsets (RAkEL) [27]
Do LP on M subsets⊂ {1, . . . ,L} of size k
X Y123 ∈ 2k
x(1) 011x(2) 100x(3) 011x(4) 100x(5) 000
X Y124 ∈ 2k
x(1) 010x(2) 100x(3) 010x(4) 101x(5) 001
X Y234 ∈ 2k
x(1) 110x(2) 000x(3) 110x(4) 001x(5) 001
![Page 58: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/58.jpg)
Pruned SetsX Y ∈ 2L
x(1) 0110x(2) 1000x(3) 0110x(4) 1001x(5) 0001
Ensembles of Pruned label Sets (EPS) [21]
Do LP on M pruned subsets (wrt class values)
Can flip bits to reduce ratio of classes to examples
X Y ∈ 2L
x(1) 0110x(3) 0110x(4) 0001x(5) 0001
X Y ∈ 2L
x(1) 0110x(2) 1000x(3) 0110x(4) 0001x(4) 1000
X Y ∈ 2L
x(1) 0110x(2) 1000x(3) 0110x(4) 1000x(5) 0001
![Page 59: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/59.jpg)
Ensemble-based Voting
Most problem-transformation methods are ensemble-based,e.g., ECC, EPS, RAkEL.
Ensemble Voting
y1 y2 y3 y4
h1(x) 1 1 1h2(x) 0 1 0h3(x) 1 0 0h4(x) 1 0 0score 0.75 0.25 0.75 0
y 1 0 1 0y4y3y2y1
y123 y234y134y124
x
more predictive power (ensemble effect)
LP can predict novel label combinations
![Page 60: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/60.jpg)
Scaling Up
LSHTC4: Large Scale Hierarchal Text Classification
A wikipedia-scale problem
325, 056 labels
2.4M examples
Even with only 1, 000 features, have to learn over 300Mparameters with BR (linear models)
. . . plus 52, 831M more with CC
. . . plus ensembles (×10,×50?)
LP transformation generates around 1.47M classes
![Page 61: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/61.jpg)
Scaling Up
Our approach [16, 23]:1 Ignore the predefined hierarchy2 work with subsets of the labelset (RAkEL)3 prune them (pruned sets)4 chain these sets together (classifier chains)5 mix of base classifiers (centroid, decision trees, SVMs)6 ensemble with sample features and instances (random
subspace)7 randomization: splits, pruning, reintroduction, chain
links, base classifier parameters8 train models in parallel, weight according to score on
hold-out sets (avoid overfitting!)
![Page 62: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/62.jpg)
Pairwise Multi-labelClassification
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1
Create a pairwise transformation, of up to L(L−1)2 binary
classifiers (all-vs-all), but smaller than in BRX Y1v2
x(1) 0x(2) 1x(3) 0x(4) 1
X Y1v3
x(1) 0x(2) 1x(4) 1
X Y1v4
x(2) 1x(5) 0
X Y2v3
x(3) 1
X Y2v4
x(1) 1x(3) 1x(4) 0x(5) 0
X Y3v4
x(1) 1x(4) 0x(5) 0
Ensemble voting, or calibrated label ranking [7]
Can also model four classes (related to LP)
![Page 63: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/63.jpg)
Hierarchy of MLC (HOMER)
x
��
h1
�� $${{
{1, 4}
��
y5 {2, 3, 6}
��
h2
{{ ��
h3
�� $$yyy1 y4 y2 y4 y6
1 Cluster labels (randomly, k-means) [28], or usepre-defined hierarchy
2 Apply problem transformation
![Page 64: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/64.jpg)
Multi-label Regularization
Regularization
y = b(h(x)), or y = b(h(x), x)
where
y = h(x) is an initial classification; and
b is some regularizer
Examples:
Meta BR: A second (meta) BR (b) takes as input the outputfrom an initial BR (h) [9]
Error Correcting Output Codes: bit vector y has beendistorted by noise; attempt to correct it [6]
Subset matching: if y does not exist in training set, matchit to the closest one that does
![Page 65: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/65.jpg)
Problem TransformationSummary
Two ways of viewing a multi-label problem of L labels:1 L binary problems (BR),2 a multi-class problem with 2L classes (LP)
or a combination of these.
General method:1 Transform data into subproblems (binary or multi-class)2 Apply some off-the-shelf base classifier3 (Optional) Regularize4 (Optional) Ensemble
![Page 66: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/66.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 67: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/67.jpg)
Algorithm Adaptation
1 Take your favourite (most suitable) classifier2 Modify it for multi-label classification
Advantage: a single model, usually very scalable
Disadvantage: predictive performance depends on theproblem domain
![Page 68: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/68.jpg)
k Nearest Neighbours (kNN)Assign to x the majority class of the k ‘nearest neighbours’
y = argmaxy
∑i∈Nk
y(i)
where Nk contains the training pairs with x(i) closest to x.
4 3 2 1 0 1 2 3x1
4
3
2
1
0
1
2
3
x2
c1c2c3c4c5c6?
![Page 69: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/69.jpg)
Multi-label kNNAssigns the most common labels of the k nearest neighbours
p(yj = 1|x) = 1k
∑i∈Nk
y(i)j
yj = argmaxyj∈{0,1}
[p(yj |x) > 0.5]
4 3 2 1 0 1 2 3 4x1
5
4
3
2
1
0
1
2
3
x2
000001010011101?
For example, [32]. Related to ensemble voting.
![Page 70: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/70.jpg)
Decision Trees
x1
>0.3
~~
≤0.3
!!y x3
>−2.9
}}
≤−2.9
x2
=A
~~
=B
!!
y
y y
construct like C4.5 (multi-label entropy [3])
multiple labels at the leaves
predictive clustering trees [12] are highly competitive inan random forest/ensemble
![Page 71: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/71.jpg)
Conditional Random Fields
x
y2y1
φ1 φ2
φ3
p(y|x) = 1Z(x)
∏c
φc(x, y)
=1
Z(x)exp{
∑c
wcfc(x, y)}
where, e.g., φ3(x, y) = φ3(y1, y2) ∝ p(y2|y1). Factors can bemodelled with, e.g., with a problem transformation
![Page 72: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/72.jpg)
Conditional Random Fields
x
y2y1
φ1 φ2
φ3
→
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1
X Y1 Y2 Y3 Y4
x(1) 0 1 1 0x(2) 1 0 0 0x(3) 0 1 0 0x(4) 1 0 0 1x(5) 0 0 0 1
where, e.g., φ3(x, y) = φ3(y1, y2) ∝ p(y2|y1). Factors can bemodelled with, e.g., with a problem transformation
![Page 73: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/73.jpg)
Conditional Random Fields
x
y2y1
φ1 φ2
φ3
where, e.g., φ3(x, y) = φ3(y1, y2) ∝ p(y2|y1). Factors can bemodelled with, e.g., with a problem transformation, butcomputational burden is shifted to inference, e.g.,
y = argmaxy∈{0,1}L
f1(x, y1)f2(x, y2)f3(y2, y1)
Gibbs simpling [10] (like an undirected PCC)
Supported combinations [8] (i.e., Y in LP)
![Page 74: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/74.jpg)
Neural Network
y3y2y1
z4z3z2z1
x5x4x3x2x1
Just include an output node for each label.
train with, e.g., gradient descent + error back-propagation
![Page 75: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/75.jpg)
Other Algorithm Adaptations
Max-margin methods / SVMs [29]
Association rules [25]
Boosting [24]
Generative (Bayesian) [15]
![Page 76: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/76.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 77: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/77.jpg)
Label Dependence in MLC
Common approach: Present methods to1 measure label dependence2 find a structure that best represents this
and then apply classifiers, compare results to BR.
![Page 78: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/78.jpg)
Label Dependence in MLC
Common approach: Present methods to1 measure label dependence2 find a structure that best represents this
and then apply classifiers, compare results to BR.
Example
y4y3y2y1
xx
y4y3y2y1
y124 y234
x
Which labels (nodes) to link together? (CC, PGMs)
Which subsets to form from the labelset? (RAkEL)
![Page 79: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/79.jpg)
Label Dependence in MLC
Common approach: Present methods to1 measure label dependence2 find a structure that best represents this
and then apply classifiers, compare results to BR.
ProblemAccuracy often indistinguishable to that of random en-sembles, or slow! (although, may be more compactand/or interpretable)
![Page 80: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/80.jpg)
Marginal label dependence
Marginal dependence
When the joint is not the product of the marginals, i.e.,
p(y2) 6= p(y2|y1)
p(y1)p(y2) 6= p(y1, y2)Y1 Y2
Estimate from co-occurrence frequencies in training data
![Page 81: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/81.jpg)
Marginal label dependenceExample
amazed happy relaxing quiet sad angry
amazed
happy
relaxing
quiet
sad
angry
Figure: Music dataset - Mutual Information
![Page 82: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/82.jpg)
Marginal label dependenceExample
beach sunset foliage field mountain urban
beach
sunset
foliage
field
mountain
urban
Figure: Scene dataset - Mutual Information
![Page 83: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/83.jpg)
Exploiting marginal dependence
A Toy Dataset
X1 X2 Y1 Y2 Y3
0 0 0 0 01 0 1 0 10 1 1 0 11 1 1 1 0
Measure marginal label dependence (i.e., do labels co-occurfrequently, or does one exclude the other?).
![Page 84: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/84.jpg)
Exploiting marginal dependenceA Toy Dataset
X1 X2 Y1 Y2 Y3
0 0 0 0 01 0 1 0 10 1 1 0 11 1 1 1 0
Measure marginal label dependence (i.e., do labels co-occurfrequently, or does one exclude the other?).
But all labels are interdependent! For example,
p(y2 = 1|y1 = 1) 6= p(y2 = 1)
1/3 > 1/4
Could use a threshold, or statistical significance, . . .But how does this relate to classification, p(yj |x)?
![Page 85: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/85.jpg)
Conditional label dependence
Conditional dependence
. . . conditioned on input observation x.
p(y2|y1, x) 6= p(y2|x) X
Y1 Y2
Have to build and measure models
Indication of conditional dependence if
the performance of LP/CC exeeds that of BR
errors among the binary models are correlated
![Page 86: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/86.jpg)
Conditional label dependence
Conditional independence
. . . conditioned on input observation x.
p(y2) 6= p(y2|y1)
, but p(y2|x) = p(y2|, y1, x)X
Y1 Y2 vs
X
Y1 Y2
Have to build and measure models
Indication of conditional dependence if
the performance of LP/CC exeeds that of BR
errors among the binary models are correlated
![Page 87: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/87.jpg)
Exploiting conditionaldependence
A Toy Dataset
X1 X2 Y1 Y2 Y3
0 0 0 0 01 0 1 0 10 1 1 0 11 1 1 1 0
Measure conditional label dependence (build models,measure the difference in error rate).
![Page 88: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/88.jpg)
Exploiting conditionaldependence
A Toy Dataset
X1 X2 Y1 Y2 Y3
0 0 0 0 01 0 1 0 10 1 1 0 11 1 1 1 0
Measure conditional label dependence (build models,measure the difference in error rate).
But building models is expensive!
Which structure to construct?
![Page 89: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/89.jpg)
Exploiting conditionaldependence
A Toy Dataset
OR
AN
D
XO
R
X1 X2 Y1 Y2 Y3
0 0 0 0 01 0 1 0 10 1 1 0 11 1 1 1 0
Oracle: complete conditional independence!
Complete conditional independence,
p(Yj |Yk ,X1,X2) = p(Yj |X1,X2), ∀j, k : 0 < j < k ≤ L
Then the binary relevance (BR) classifier should suffice?
![Page 90: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/90.jpg)
The LOGICAL Problem
or and xor
x
xorandor
x
or,and,xor
x
Figure: BR (left), CC (middle), LP (right)
Table: The LOGICAL problem, base classifier logistic regression.
Metric BR CC LP
HAMMING SCORE 0.83 1.00 1.00EXACT MATCH 0.50 1.00 1.00
Why didn’t BR work?
![Page 91: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/91.jpg)
XOR Solution
or and xor
x
orandxor
x
xorandor
x
Only one of these works (with greedy inference)!
The ground truth (oracle) is
p(yXOR|yAND, x) = p(yXOR|x)
but, recall: we have an estimation of this,
f (yXOR|yAND, x) 6= f (yXOR|x)
(finite data, finite training time, limited class of model f ,i.e., linear): dependence depends on the model!
![Page 92: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/92.jpg)
Solutions
1 Use a suitable structure2 Use a suitable base classifier3 Ensure that labels are conditionally independent.
![Page 93: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/93.jpg)
Solutions
1 Use a suitable structure How to find it?2 Use a suitable base classifier Which one is suitable?3 Ensure that labels are conditionally independent. How to
do that?
Main limiting factor: computational complexity.
![Page 94: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/94.jpg)
The LOGICAL Problem
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YOR|x1 ,x2
YOR=0
YOR=1
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YAND|x1 ,x2
YAND=0
YAND=1
or and xor
x
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YXOR|x1 ,x2
YXOR=0
YXOR=1
Figure: Binary Relevance (BR): linear decision boundary (solid line,estimated with logistic regression) not viable for YXOR label
![Page 95: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/95.jpg)
Solution via Structure
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YOR|x1 ,x2
YOR=0
YOR=1
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YAND|x1 ,x2
YAND=0
YAND=1
xorandor
x
x1
0.20.0
0.20.4
0.60.8
1.01.2
x 2
0.20.0
0.20.4
0.60.8
1.01.2
y OR
0.20.00.20.4
0.6
0.8
1.0
1.2
1.4
1.6
Model YXOR|YXOR,x1 ,x2
Figure: Solution via structure: linear model now applicable to YXOR
![Page 96: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/96.jpg)
Solution via Structure
orandxor
x
Figure: Solution via structure: two labels have augmented decisionspace
Can also use undirected connections
directionality not an issue,
but implies greater computational burden (≈ LP)
. . . possibly shifted to inference (≈ PCC, CDN)
![Page 97: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/97.jpg)
Solution via Multi-classDecomposition
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YOR,AND,XOR|x1 ,x2
Y[OR,AND,XOR] =[0,0,0]
Y[OR,AND,XOR] =[1,0,1]
Y[OR,AND,XOR] =[1,1,0]
or,and,xor
x
Figure: Label Powerset (LP): solvable with one-vs-one multi-classdecomposition for any (e.g., linear) base classifier
![Page 98: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/98.jpg)
Solution via Multi-classDecomposition
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2x1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x2
Model YOR,AND,XOR|x1 ,x2
Y[OR,AND,XOR] =[0,0,0]
Y[OR,AND,XOR] =[1,0,1]
Y[OR,AND,XOR] =[1,1,0]
or,xorand
x
Figure: Label Powerset (LP): solvable with one-vs-one multi-classdecomposition for any (e.g., linear) base classifier. Also possiblewith RAkEL subsets YOR,XOR and YAND
![Page 99: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/99.jpg)
Solution via Con. Independence
1.0 1.1 1.2 1.3 1.4 1.5 1.6z1
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
z 2
Model YOR|,z1 ,z2
YOR=0
YOR=1
1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60z1
1.00
1.02
1.04
1.06
1.08
1.10
1.12
z 2
Model YAND|,z1 ,z2
YAND=0
YAND=1
or and xor
φ(x)
1.30 1.35 1.40 1.45 1.50 1.55 1.60z1
0.4
0.6
0.8
1.0
1.2
1.4
1.6
z 2
Model YXOR|,z1 ,z2
YXOR=0
YXOR=1
Figure: Solution via non-linear (e.g., random RBF) transformationon input to new space z (creating independence).
![Page 100: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/100.jpg)
Solution via SuitableBase-classifier
x1
0
zz
1
$$x2
0
{{
1
##
x2
0
{{
1
##
[0, 0, 0] [0, 1, 1] [1, 1, 0]
Figure: Solution via non-linear classifier (e.g., Decision Tree). Leaveshold examples, where y = [yOR, yAND, yXOR]
![Page 101: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/101.jpg)
On Real World Problems . . .
3 2 1 0 1 2 3 4 5 63
2
1
0
1
2
3
4
5
6Y0
Y1
Y2
Y3
Y4
Y5
Figure: Music dataset, kernel PCA
![Page 102: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/102.jpg)
Latent Variables
p(y|x) =∑
z p(x, y, z)p(x)
=1Z
∑z
p(x, y, z)
y3y2y1
x1 x2 x3 x4 x5 vs
y3y2y1
z4z3z2z1
x5x4x3x2x1
Can view label dependencies as having marginalized outlatent variables
![Page 103: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/103.jpg)
Inner Layer Methods
1 Use an inner layerz = f(x), z ∈ RH
2 Apply a classifiery = h(z)
y2y1
z4z3z2z1
x5x4x3x2x1
PCA, CCA [17]
Kernel PCA [29]
Mixture models [15]
Clustering [28]
Compressive Sensing [11]
Deep Learning [18]
Auto Encoders
![Page 104: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/104.jpg)
Another look: ProblemTransformation
X1 X2 X3
Y1
Y2
Y3
Y3Y2Y1
Y12 Y23
X5X4X3X2X1
Figure: Methods CC and RAkEL (among others) can be viewed asusing an inner layer [18].
![Page 105: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/105.jpg)
What about marginaldependence?
Can be seen as a kind of constraint
used for regularization(recall: e.g., ECOC, subset matching)
![Page 106: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/106.jpg)
Label Dependence: Summary
Marginal dependence for regularizationConditional dependence
. . . depends on the model
. . . may be introduced
Should consider together:base classifierstructureinner layer
An open problem
Much existing research is relevant
![Page 107: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/107.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 108: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/108.jpg)
Multi-label EvaluationIn single-label classification, simply compare true label y withpredicted label y [or p(y|x)].What about in multi-labelclassification?
Example
If true label vector is y = [1, 0, 0, 0], then y =?
urban
mountain
beach
foliage
1 0 0 01 1 0 00 0 0 00 1 1 1
compare bit-wise? too lenient?
compare vector-wise? too strict?
![Page 109: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/109.jpg)
Hamming Loss
Example
y(i) y(i)
x(1) [1 0 1 0] [1 0 0 1]x(2) [0 1 0 1] [0 1 0 1]x(3) [1 0 0 1] [1 0 0 1]x(4) [0 1 1 0] [0 1 0 0]x(5) [1 0 0 0] [1 0 0 1]
HAMMING LOSS =1
NL
N∑i=1
L∑j=1
I[y(i)j 6= y(i)
j ]
= 0.20
![Page 110: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/110.jpg)
0/1 Loss
Example
y(i) y(i)
x(1) [1 0 1 0] [1 0 0 1]x(2) [0 1 0 1] [0 1 0 1]x(3) [1 0 0 1] [1 0 0 1]x(4) [0 1 1 0] [0 1 0 0]x(5) [1 0 0 0] [1 0 0 1]
0/1 LOSS =1N
N∑i=1
I(y(i) 6= y(i))
= 0.60
![Page 111: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/111.jpg)
Other MetricsJACCARD INDEX – often called multi-label ACCURACY
RANK LOSS – average fraction of pairs not correctly orderedONE ERROR – if top ranked label is not in set of true labelsCOVERAGE – average “depth” to cover all true labelsLOG LOSS – i.e., cross entropyPRECISION – predicted positive labels that are relevantRECALL – relevant labels which were predictedPRECISION vs. RECALL curvesF-MEASURE
micro-averaged (‘global’ view)macro-averaged by label (ordinary averaging of a binarymeasure, changes in infrequent labels have a big impact)macro-averaged by example (one example at a time,average across examples)
For general evaluation, use multiple and contrastingevaluation measures!
![Page 112: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/112.jpg)
HAMMING LOSS vs. 0/1 LOSSHamming loss
evaluation by example, suitable for evaluating
yj = argmaxyj∈{0,1}
p(yj |x)
i.e., BRfavours sparse labellingdoes not benefit directly from modelling labeldependence
0/1 lossevaluation by label, suitable for evaluating
y = argmaxy∈{0,1}L
p(y|x)
i.e., PCC, LPdoes not favour sparse labellingbenefits from models of label dependence
![Page 113: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/113.jpg)
HAMMING LOSS vs. 0/1 LOSS
Example: 0/1 LOSS vs. HAMMING LOSS
y(i) y(i)
x(1) [1 0 1 0] [1 0 0 1]x(2) [1 0 0 1] [1 0 0 1]x(3) [0 1 1 0] [0 1 0 0]x(4) [1 0 0 0] [1 0 1 1]x(5) [0 1 0 1] [0 1 0 1]
HAM. LOSS 0.3
0/1 LOSS 0.6
Usually cannot minimize both at the same time . . .
. . . unless: labels are independent of each other! [5]
![Page 114: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/114.jpg)
HAMMING LOSS vs. 0/1 LOSS
Example: 0/1 LOSS vs. HAMMING LOSS
y(i) y(i)
x(1) [1 0 1 0] [1 0 1 1]x(2) [1 0 0 1] [1 1 0 1]x(3) [0 1 1 0] [0 1 1 0]x(4) [1 0 0 0] [1 0 1 0]x(5) [0 1 0 1] [0 1 0 1]
Optimize HAMMING LOSS
. . .
HAM. LOSS 0.2
0/1 LOSS 0.8
. . . 0/1 LOSS goes up
Usually cannot minimize both at the same time . . .
. . . unless: labels are independent of each other! [5]
![Page 115: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/115.jpg)
HAMMING LOSS vs. 0/1 LOSS
Example: 0/1 LOSS vs. HAMMING LOSS
y(i) y(i)
x(1) [1 0 1 0] [0 1 0 1]x(2) [1 0 0 1] [1 0 0 1]x(3) [0 1 1 0] [0 0 1 0]x(4) [1 0 0 0] [0 1 1 1]x(5) [0 1 0 1] [0 1 0 1]
Optimize 0/1 LOSS . . .
HAM. LOSS 0.4
0/1 LOSS 0.4
. . . HAMMING LOSS goes up
Usually cannot minimize both at the same time . . .
. . . unless: labels are independent of each other! [5]
![Page 116: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/116.jpg)
HAMMING LOSS vs. 0/1 LOSS
Example: 0/1 LOSS vs. HAMMING LOSS
y(i) y(i)
x(1) [1 0 1 0] [0 1 0 1]x(2) [1 0 0 1] [1 0 0 1]x(3) [0 1 1 0] [0 0 1 0]x(4) [1 0 0 0] [0 1 1 1]x(5) [0 1 0 1] [0 1 0 1]
Usually cannot minimize both at the same time . . .
. . . unless: labels are independent of each other! [5]
![Page 117: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/117.jpg)
Threshold Selection
Methods often return a posterior probability, or ensemblevotes p(x). Use a threshold of 0.5 ?
yj =
{1, pj(x) ≥ 0.50, otherwise
Example with threshold of 0.5
x(i) y(i) p(x(i)) y(i) := I[p(x(i)) ≥ 0.5]x(1) [1 0 1 0] [0.9 0.0 0.4 0.6] [1 0 0 1]x(2) [0 1 0 1] [0.1 0.8 0.0 0.8] [0 1 0 1]x(3) [1 0 0 1] [0.8 0.0 0.1 0.7] [1 0 0 1]x(4) [0 1 1 0] [0.1 0.7 0.4 0.2] [0 1 0 0]x(5) [1 0 0 0] [1.0 0.0 0.0 1.0] [1 0 0 1]
![Page 118: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/118.jpg)
Threshold SelectionMethods often return a posterior probability, or ensemblevotes p(x). Use a threshold of 0.5 ?
yj =
{1, pj(x) ≥ 0.50, otherwise
Example with threshold of 0.5
x(i) y(i) p(x(i)) y(i) := I[p(x(i)) ≥ 0.5]x(1) [1 0 1 0] [0.9 0.0 0.4 0.6] [1 0 0 1]x(2) [0 1 0 1] [0.1 0.8 0.0 0.8] [0 1 0 1]x(3) [1 0 0 1] [0.8 0.0 0.1 0.7] [1 0 0 1]x(4) [0 1 1 0] [0.1 0.7 0.4 0.2] [0 1 0 0]x(5) [1 0 0 0] [1.0 0.0 0.0 1.0] [1 0 0 1]
. . . but would eliminate two errors with a threshold of 0.4 !
![Page 119: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/119.jpg)
Threshold Selection
Threshold calibration strategies:
Ad-hoc, e.g., t = 0.5
Internal validation, e.g., t ∈ {0.1, 0.2, . . . , 0.9}PCut: such that the training data and test data havesimilar average number of labels/example
Can be done efficiently.Can also calibrate tj for each label individually.Assumes training set similar to test set (i.e., not ideal fordata streams)
Can be viewed as another form of regularization
y = b(h(x))
![Page 120: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/120.jpg)
Threshold Selection
Threshold calibration strategies:
Ad-hoc, e.g., t = 0.5
Internal validation, e.g., t ∈ {0.1, 0.2, . . . , 0.9}
PCut: such that the training data and test data havesimilar average number of labels/example
Can be done efficiently.Can also calibrate tj for each label individually.Assumes training set similar to test set (i.e., not ideal fordata streams)
Can be viewed as another form of regularization
y = b(h(x))
![Page 121: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/121.jpg)
Threshold SelectionThreshold calibration strategies:
Ad-hoc, e.g., t = 0.5Internal validation, e.g., t ∈ {0.1, 0.2, . . . , 0.9}PCut: such that the training data and test data havesimilar average number of labels/example
t = argmint
∣∣∣ 1N
∑i,j
I(p(i)j > t)
︸ ︷︷ ︸test data
− 1N
∑i,j
y(i)j︸ ︷︷ ︸
train data
∣∣∣Can be done efficiently.Can also calibrate tj for each label individually.Assumes training set similar to test set (i.e., not ideal fordata streams)
Can be viewed as another form of regularization
y = b(h(x))
![Page 122: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/122.jpg)
Various Real-World Concerns
In data streams, label dependence (and therefore,appropriate structures/transformations/base classifiers)
may not be known in advancemust learn it incrementallyand adapt to change over time (concept drift)New labels must be incorporated, old labels phased out
Labels may be missing from training data,but we don’t know when they’re missing (non-relevance 6=missing)Labelling is more intensive per example (affects bothmanual labelling and active learning)
![Page 123: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/123.jpg)
Outline
1 Introduction
2 Applications
3 Background
4 Problem Transformation
5 Algorithm Adaptation
6 Label Dependence
7 Multi-label Evaluation
8 Summary & Resources
![Page 124: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/124.jpg)
Summary
Multi-label classification
Can be approached via problem transformation oralgorithm adaptation
Label dependence and scalability are the main themes
An active area of research and a gateway to many relatedareas
![Page 125: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/125.jpg)
Resources
Overview [26]
Review/Survey of Algorithms [33]
Extensive empirical comparison [14]
Some slides: A, B, C
http://users.ics.aalto.fi/jesse/
![Page 126: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/126.jpg)
Software & Datasets
Mulan (Java)
Meka (Java)
Scikit-Learn (Python) offers some multi-label support
Clus (Java)
LAMDA (Matlab)
Datasets
http://mulan.sourceforge.net/datasets.html
http://meka.sourceforge.net/#datasets
![Page 127: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/127.jpg)
MEKA
A WEKA-based framework for multi-label classificationand evaluation
support for data-stream, semi-supervised classification
http://meka.sourceforge.net
![Page 128: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/128.jpg)
A MEKA Classifierpackage weka.classifiers.multilabel;import weka.core.∗;
public class DumbClassifier extends MultilabelClassifier {
/∗∗∗ BuildClassifier∗/
public void buildClassifier (Instances D) throws Exception {// the first L attributes are the labelsint L = D.classIndex();}
/∗∗∗ DistributionForInstance− return the distribution p(y[j ]|x)∗/
public double[] distributionForInstance(Instance x) throws Exception {int L = x.classIndex();// predict 0 for each labelreturn new double[L];}}
![Page 129: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/129.jpg)
References
Antonucci Alessandro, Giorgio Corani, Denis Maua, and Sandra Gabaglio.
An ensemble of Bayesian networks for multilabel classification.In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI’13, pages1220–1225. AAAI Press, 2013.
Hanen Borchani.
Multi-dimensional classification using Bayesian networks for stationary and evolving streaming data.PhD thesis, Departamento de Inteligencia Artificial, Facultad de Informatica, Universidad Politecnica deMadrid, 2013.
Amanda Clare and Ross D. King.
Knowledge discovery in multi-label phenotype data.Lecture Notes in Computer Science, 2168, 2001.
Krzysztof Dembczynski, Weiwei Cheng, and Eyke Hullermeier.
Bayes optimal multilabel classification via probabilistic classifier chains.In ICML ’10: 27th International Conference on Machine Learning, pages 279–286, Haifa, Israel, June 2010.Omnipress.
Krzysztof Dembczynski, Willem Waegeman, Weiwei Cheng, and Eyke Hullermeier.
On label dependence and loss minimization in multi-label classification.Mach. Learn., 88(1-2):5–45, July 2012.
Chun-Sung Ferng and Hsuan-Tien Lin.
Multi-label classification with error-correcting codes.In Proceedings of the 3rd Asian Conference on Machine Learning, ACML 2011, Taoyuan, Taiwan, November13-15, 2011, pages 281–295, 2011.
Johannes Furnkranz, Eyke Hullermeier, Eneldo Loza Mencıa, and Klaus Brinker.
Multilabel classification via calibrated label ranking.Machine Learning, 73(2):133–153, November 2008.
Nadia Ghamrawi and Andrew McCallum.
Collective multi-label classification.
![Page 130: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/130.jpg)
In CIKM ’05: 14th ACM international Conference on Information and Knowledge Management, pages195–200, New York, NY, USA, 2005. ACM Press.
Shantanu Godbole and Sunita Sarawagi.
Discriminative methods for multi-labeled classification.In PAKDD ’04: Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 22–30.Springer, 2004.
Yuhong Guo and Suicheng Gu.
Multi-label classification using conditional dependency networks.In IJCAI ’11: 24th International Conference on Artificial Intelligence, pages 1300–1305. IJCAI/AAAI, 2011.
Daniel Hsu, Sham M. Kakade, John Langford, and Tong Zhang.
Multi-label prediction via compressed sensing.In NIPS ’09: Neural Information Processing Systems 2009, 2009.
Dragi Kocev, Celine Vens, Jan Struyf, and Saso Deroski.
Tree ensembles for predicting structured outputs.Pattern Recognition, 46(3):817–833, March 2013.
Abhishek Kumar, Shankar Vembu, AdityaKrishna Menon, and Charles Elkan.
Beam search algorithms for multilabel learning.Machine Learning, 92(1):65–89, 2013.
Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, and Saso Dzeroski.
An extensive experimental comparison of methods for multi-label learning.Pattern Recognition, 45(9):3084–3104, September 2012.
Andrew Kachites McCallum.
Multi-label text classification with a mixture model trained by em.In AAAI 99 Workshop on Text Learning, 1999.
Antti Puurula, Jesse Read, and Albert Bifet.
Kaggle LSHTC4 winning solution.Technical report, Kaggle LSHTC4 Winning Solution, 2014.
Piyush Rai and Hal Daume.
![Page 131: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/131.jpg)
Multi-label prediction via sparse infinite CCA.In NIPS 2009: Advances in Neural Information Processing Systems 22, pages 1518–1526. 2009.
Jesse Read and Jaakko Hollmen.
A deep interpretation of classifier chains.In Advances in Intelligent Data Analysis XIII - 13th International Symposium, IDA 2014, pages 251–262,October 2014.
Jesse Read and Jaakko Hollmen.
Multi-label classification using labels as hidden nodes.ArXiv.org, stats.ML(1503.09022v1), 2015.
Jesse Read, Luca Martino, and David Luengo.
Efficient monte carlo methods for multi-dimensional learning with classifier chains.Pattern Recognition, 47(3):15351546, 2014.
Jesse Read, Bernhard Pfahringer, and Geoff Holmes.
Multi-label classification using ensembles of pruned sets.In ICDM 2008: Eighth IEEE International Conference on Data Mining, pages 995–1000. IEEE, 2008.
Jesse Read, Bernhard Pfahringer, Geoffrey Holmes, and Eibe Frank.
Classifier chains for multi-label classification.Machine Learning, 85(3):333–359, 2011.
Jesse Read, Antti Puurula, and Albert Bifet.
Multi-label classification with meta labels.In ICDM’14: IEEE International Conference on Data Mining (ICDM 2014), pages 941–946. IEEE, December2014.
Robert E. Schapire and Yoram Singer.
Boostexter: A boosting-based system for text categorization.Machine Learning, 39(2/3):135–168, 2000.
F. A. Thabtah, P. Cowling, and Yonghong Peng.
MMAC: A new multi-class, multi-label associative classification approach.In ICDM ’04: Fourth IEEE International Conference on Data Mining, pages 217–224, 2004.
![Page 132: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/132.jpg)
Grigorios Tsoumakas and Ioannis Katakis.
Multi label classification: An overview.International Journal of Data Warehousing and Mining, 3(3):1–13, 2007.
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas.
Random k-labelsets for multi-label classification.IEEE Transactions on Knowledge and Data Engineering, 23(7):1079–1089, 2011.
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis P. Vlahavas.
Effective and efficient multilabel classification in domains with large number of labels.In ECML/PKDD Workshop on Mining Multidimensional Data, 2008.
Jason Weston, Olivier Chapelle, Andre Elisseeff, Bernhard Scholkopf, and Vladimir Vapnik.
Kernel dependency estimation.In NIPS, pages 897–904, 2003.
Julio H. Zaragoza, Luis Enrique Sucar, Eduardo F. Morales, Concha Bielza, and Pedro Larranaga.
Bayesian chain classifiers for multidimensional classification.In 24th International Joint Conference on Artificial Intelligence (IJCAI ’11), pages 2192–2197, 2011.
Min-Ling Zhang and Kun Zhang.
Multi-label learning by exploiting label dependency.In KDD ’10: 16th ACM SIGKDD International conference on Knowledge Discovery and Data mining, pages999–1008. ACM, 2010.
Min-Ling Zhang and Zhi-Hua Zhou.
ML-KNN: A lazy learning approach to multi-label learning.Pattern Recognition, 40(7):2038–2048, 2007.
Min-Ling Zhang and Zhi-Hua Zhou.
A review on multi-label learning algorithms.IEEE Transactions on Knowledge and Data Engineering, 99(PrePrints):1, 2013.
![Page 133: Multi-label Classification - GitHub Pages · Multi-label Classification K = 2 K >2 L = 1 binary multi-class L >1 multi-label multi-outputy yalso known as multi-target, multi-dimensional](https://reader030.vdocument.in/reader030/viewer/2022041105/5f06ead57e708231d41a60fe/html5/thumbnails/133.jpg)
Multi-label Classification
Jesse Readhttps://users.ics.aalto.fi/jesse/
Department of Information and Computer ScienceHelsinki, Finland
Summer School on Data Sciences for Big DataSeptember 5, 2015