hierarchical classification rongcheng lin computer science department
TRANSCRIPT
Hierarchical ClassificationRongcheng Lin
Computer Science Department
Contents
Motivation, Definition & Problem
Review of SVM
Hierarchical Classification
Path-based Approaches
Regularization-based Approaches
MotivationThe classes in real world are structured, specially often hierarchically related.
Gene function prediction Document categorization Image Search …
Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …
Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number
MotivationThe classes in real world are structured, specially often hierarchically related.
Gene function prediction Document categorization Image Search …
Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …
Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number
MotivationThe classes in real world are structured, specially often hierarchically related.
Gene function prediction Document categorization Image Search …
Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …
Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number
Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output
DAG and Tree Structure
Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output
Problem and solution?
Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification
Redefine the problem
Lower level categories are more detailed while upper level categories are more general Redefine the margin
Different classification mistake are of different severity Redefine the loss function
Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification
Redefine the problem
Lower level categories are more detailed while upper level categories are more general Redefine the margin
Different classification mistake are of different severity Redefine the loss function
Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification
Redefine the problem
Lower level categories are more detailed while upper level categories are more general Redefine the margin
Different classification mistake are of different severity Redefine the loss function
Review: Binary SVMBinary classification
Margin
Loss Function
wTx + b = 0
wTx + b < 0wTx + b > 0
f(x) = sign(wTx + b)w
xw br i
T
𝐿( 𝑓 (𝑥 ) , 𝑦 )
Review: Binary SVM
𝐽 (𝑤 )=𝑅 (𝑤 )+ ∑𝑖=1 …𝑛
𝐿(𝑤 ,𝑥 𝑖 , 𝑦 𝑖)
General Form:
Review: Multiclass SVM1) one-vs-the rest2) Crammer & Singer (pairwise)
Review: Multiclass SVMDedicated Loss Function
Review: Multiclass SVMDedicated Loss Function
𝑀𝑎𝑟𝑔𝑖𝑛 :𝛾𝑖 (𝑤 )=𝑤𝑦 𝑖𝑇 𝑋 𝑖−𝑤𝑘
𝑇 𝑋 𝑖 for k ≠ 𝑦 𝑖
Review: Hinge Loss Function the more you violate the margin, the higher the penalty is.
Loss Function
Hierarchical ClassifiersPath-based Approaches
Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths
Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation
Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error
(y,) is defined to be the number of edges along the (unique) path from y to
Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error
(y,) is defined to be the number of edges along the (unique) path from y to
�̂�
y
𝛾 (𝑦 , �̂� )=4
Tree Distance
2
5 6 �̂� 8 9
1
0
3
y
4
𝐷 (𝑦 , �̂� )= 𝑓 4∗𝐶4 + 𝑓 1∗𝐶1+ 𝑓 3∗𝑐3
Loss Functions
1
1Zero-One Loss
Hinge Loss
Hierarchical Hinge Loss
𝐷( �̂� , 𝑦 ¿
𝐷( �̂� , 𝑦 ) 𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)
Path-based Approachespath-based approaches try to find the most likely path from the root.
Only need to update the parameters of miss-classified
nodes in the tree
Large margin hierarchical classifier
𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)
𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 h𝑡 𝑒𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 �̂� ≠ 𝑦
√𝛾(𝑦 , �̂� )
√𝛾(𝑦 , �̂� )
Training Algorithm
HSVM
HSVM
𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)1
Δ (𝑦 𝑖 , 𝑦 )
HSVM
1
Δ (𝑦 𝑖 , 𝑦 )
Regularization-based ApproachesK individual classification tasks
Use a n additional regularization term to penalizes the disagreement between the individual models
Multitask Learning
Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness
L1-Norm, L2-Norm
Penalize model complexity to avoid overfitting
L-1 Norm give more sparse estimate than L-2 Norm
Group Lasso and Sparse Group Lasso
HMTL: Hierarchical Multitask Learning
determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)
HMTL
Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity
Original Approach:
New Approach:
Note:
Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsityeach leaf node is a class
each inner node is a group of classes
Tree-Guided Group Lasso
Advantages and DrawbacksAssume children is good
Assume parent is good
Assume both are not good
Advantages and DrawbacksAssume children is good
Tree Guided Group Lasso
Assume parent is good HMTL
Assume both are not good Path-based
It depends!