hierarchical classification rongcheng lin computer science department

Hierarchical ClassificationRongcheng Lin

Computer Science Department

Contents

Motivation, Definition & Problem

Review of SVM

Hierarchical Classification

Path-based Approaches

Regularization-based Approaches

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number

Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

DAG and Tree Structure

Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

Problem and solution?

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Review: Binary SVMBinary classification

Margin

Loss Function

wTx + b = 0

wTx + b < 0wTx + b > 0

f(x) = sign(wTx + b)w

xw br i

T

𝐿( 𝑓 (𝑥 ) , 𝑦 )

Review: Binary SVM

𝐽 (𝑤 )=𝑅 (𝑤 )+ ∑𝑖=1 …𝑛

𝐿(𝑤 ,𝑥 𝑖 , 𝑦 𝑖)

General Form:

Review: Multiclass SVM1) one-vs-the rest2) Crammer & Singer (pairwise)

Review: Multiclass SVMDedicated Loss Function

Review: Multiclass SVMDedicated Loss Function

𝑀𝑎𝑟𝑔𝑖𝑛 :𝛾𝑖 (𝑤 )=𝑤𝑦 𝑖𝑇 𝑋 𝑖−𝑤𝑘

𝑇 𝑋 𝑖 for k ≠ 𝑦 𝑖

Review: Hinge Loss Function the more you violate the margin, the higher the penalty is.

Loss Function

Hierarchical ClassifiersPath-based Approaches

Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths

Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation

Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error

(y,) is defined to be the number of edges along the (unique) path from y to

Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error

(y,) is defined to be the number of edges along the (unique) path from y to

�̂�

y

𝛾 (𝑦 , �̂� )=4

Tree Distance

2

5 6 �̂� 8 9

1

0

3

y

4

𝐷 (𝑦 , �̂� )= 𝑓 4∗𝐶4 + 𝑓 1∗𝐶1+ 𝑓 3∗𝑐3

Loss Functions

1

1Zero-One Loss

Hinge Loss

Hierarchical Hinge Loss

𝐷（ �̂� , 𝑦 ¿

𝐷( �̂� , 𝑦 ) 𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)

Path-based Approachespath-based approaches try to find the most likely path from the root.

Only need to update the parameters of miss-classified

nodes in the tree

Large margin hierarchical classifier

𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)

𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 h𝑡 𝑒𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 �̂� ≠ 𝑦

√𝛾(𝑦 , �̂� )

√𝛾(𝑦 , �̂� )

Training Algorithm

HSVM

𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)1

Δ (𝑦 𝑖 , 𝑦 )

HSVM

1

Δ (𝑦 𝑖 , 𝑦 )

Regularization-based ApproachesK individual classification tasks

Use a n additional regularization term to penalizes the disagreement between the individual models

Multitask Learning

Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness

L1-Norm, L2-Norm

Penalize model complexity to avoid overfitting

L-1 Norm give more sparse estimate than L-2 Norm

Group Lasso and Sparse Group Lasso

HMTL: Hierarchical Multitask Learning

determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity

Original Approach:

New Approach:

Note:

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsityeach leaf node is a class

each inner node is a group of classes

Tree-Guided Group Lasso

Advantages and DrawbacksAssume children is good

Assume parent is good

Assume both are not good

Advantages and DrawbacksAssume children is good

Tree Guided Group Lasso

Assume parent is good HMTL

Assume both are not good Path-based

It depends!

hierarchical classification rongcheng lin computer science department

Documents

classification performance

class relationships

upper level categories

problem lower level

different severityredefine

loss function hierarchy

specific dependency

clear advantage