on symmetric losses for learning from corrupted labels · on symmetric losses for learning from...
TRANSCRIPT
![Page 1: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/1.jpg)
On Symmetric Losses for Learning from Corrupted Labels
The University of Tokyo1
RIKEN Center for Advanced Intelligence Project (AIP)2
Nontawat Charoenphakdee1,2 , Jongyeong Lee1,2
and Masashi Sugiyama2,1
![Page 2: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/2.jpg)
Supervised learning
https://t.pimg.jp/006/570/886/1/6570886.jpghttps://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.pnghttps://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png\
Features (Input) Labels (Output)
No noise robustness
Prediction function
2
Machine learning
Data collection
Learn from input-output pairs Predict output ofunseen input accurately
Such that
![Page 3: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/3.jpg)
Learning from corrupted labels
https://thumbs.dreamstime.com/b/power-crowd-d-render-crowdsourcing-concept-30738769.jpghttp://www.process-improvement-institute.com/wp-content/uploads/2015/05/Accounting-for-Human-Error-Probability-in-SIL-Verification.jpghttps://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.pnghttps://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png
Feature collection
Labeling process
Noise-robust ML
Data collection Prediction function
Our goal
Examples:• Expert labelers (human error)• Crowdsourcing (non-expert error)
3
![Page 4: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/4.jpg)
Contents
• Background and related work
• The importance of symmetric losses
• Theoretical properties of symmetric losses
• Barrier hinge loss
• Experiments
4
![Page 5: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/5.jpg)
Warmup: Binary classification: Label
: Prediction function: Feature vector
• Given: input-output pairs:
• Goal: minimize expected error:
No access to distribution: minimize empirical error (Vapnik, 1998):
: Margin loss function
5
same signdifferent sign
![Page 6: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/6.jpg)
Minimizing 0-1 loss directly is difficult.• Discontinuous and not differentiable (Ben-david+, 2003, Feldman+, 2012)
In practice, we minimize a surrogate loss (Zhang, 2004, Bartlett+, 2006).
Surrogate losses
: Label: Prediction function: Feature vector
: Margin
6
![Page 7: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/7.jpg)
Given: Two sets of corrupted data:
Clean:
Learning from corrupted labels
Positive:
Negative:
7
Class priors
Positive-unlabeled:
(Scott+, 2013, Menon+, 2015, Lu+, 2019)
This setting covers many weakly-supervised settings (Lu+, 2019).
(du Plessis+, 2014)
![Page 8: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/8.jpg)
Given: Two sets of corrupted data:
Assumption:
Problem: are unidentifiable from samples (Scott+, 2013).
How to learn without estimating ?
8Issue on class priors
Positive:
Negative:
![Page 9: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/9.jpg)
Classification error:
Balanced error rate (BER):
Area under the receiver operating characteristic curve (AUC) risk:
9Related work:
Class priors are needed! (Lu+, 2019)
Class priors are not needed! (Menon+, 2015)
![Page 10: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/10.jpg)
Menon+, 2015: we can treat corrupted data as if they were clean.
Related work: BER and AUC optimization10
Squared loss was used in experiments.
van Rooyen+, 2015: symmetric losses are also useful for BERminimization (no experiments).
The proof relies on a property of 0-1 loss.
Ours: using symmetric loss is preferable for both BER and AUC theoretically and experimentally!
![Page 11: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/11.jpg)
Contents
• Background and related work
• The importance of symmetric losses
• Theoretical properties of symmetric losses
• Barrier hinge loss
• Experiments
11
![Page 12: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/12.jpg)
Robustness under symmetric noise (label flip with a fixed probability)
Symmetric losses12
Risk estimator simplification in weakly-supervised learning(du Plessis+, 2014, Kiryo+, 2017, Lu+, 2018)
(Ghosh+, 2015, van Rooyen+, 2015)
Applications:
![Page 13: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/13.jpg)
Symmetric losses:
AUC maximization13
Excessive terms become constant!
Excessive terms can be safely ignored with symmetric losses
1.
Corrupted risk Clean risk
![Page 14: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/14.jpg)
Symmetric losses:
BER minimization14
Excessive term becomes constant!
Corrupted risk Clean risk
Excessive terms can be safely ignored with symmetric losses
Coincides with van Rooyen 2015+
![Page 15: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/15.jpg)
Contents
• Background and related work
• The importance of symmetric losses
• Theoretical properties of symmetric losses
• Barrier hinge loss
• Experiments
15
![Page 16: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/16.jpg)
Theoretical properties of symmetric losses
Nonnegative symmetric losses are non-convex.• Theory of convex losses cannot be applied.
16
We provide a better understanding of symmetric losses:• Necessary and sufficient condition for classification-calibration• Excess risk bound in binary classification• Inability to estimate class posterior probability• A sufficient condition for AUC-consistency➢ Covers many symmetric losses, e.g., sigmoid, ramp.
(du Plessis+, 2014, Ghosh+, 2015)
Well-known symmetric losses, e.g., sigmoid, ramp are classification-calibrated and AUC-consistent!
![Page 17: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/17.jpg)
Contents
• Background and related work
• The importance of symmetric losses
• Theoretical properties of symmetric losses
• Barrier hinge loss
• Experiments
17
![Page 18: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/18.jpg)
Convex symmetric losses?By sacrificing nonnegativity:
only unhinged loss is convex and symmetric (van Rooyen+, 2015).
18
This loss has been considered (although robustness was not discussed).(Devroye+, 1996, Schoelkopf+, 2002, Shawe-Taylor+, 2004, Sriperumbudur+, 2009, Reid+, 2011)
![Page 19: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/19.jpg)
slope of the non-symmetric region.
width of symmetric region.
High penalty if misclassify or output is outside symmetric region.
Barrier hinge loss19
![Page 20: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/20.jpg)
Symmetricity of barrier hinge loss
Satisfies symmetric property in an interval.
20
If output range is restricted in a symmetric region:
unhinged, hinge , barrier are equivalent.
![Page 21: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/21.jpg)
Contents
• Background and related work
• The importance of symmetric losses
• Theoretical properties of symmetric losses
• Barrier hinge loss
• Experiments
21
![Page 22: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/22.jpg)
Experiments: BER/AUC optimization from corrupted labels22
To empirically answer the following questions:
1. Does the symmetric condition significantly help?
2. Do we need a loss to be symmetric everywhere?
3. Does the negative unboundedness degrade the practical performance?
We conducted the following experiments: Fix the models, vary the loss functions
Losses: Barrier [s=200, w=50], Unhinged, Sigmoid, Logistic, Hinge, Squared, Savage
Experiment 1:
MLPs on UCI/LIBSVM datasets.
Experiment 2:
CNNs on more difficult datasets (MNIST, CIFAR-10).
![Page 23: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/23.jpg)
Experiments: BER/AUC optimization from corrupted labels23
For UCI datasets:
Multilayered perceptrons (MLPs) with one hidden layer: [d-500-1]
Activation function: Rectifier Linear Units (ReLU) (Nair+, 2010)
MNIST and CIFAR-10:
Convolutional neural networks (CNNs):
[d-Conv[18,5,1,0]-Max[2,2]-Conv[48,5,1,0]-Max[2,2]-800-400-1]
ReLU after fully connected layer follows by dropout layer (Srivastava+, 2010)
MNIST: Odd numbers vs Even numbers
CIFAR: One class vs Airplane (follows Ishida+, 2017)
Conv[18, 5, 1 , 0]: 18 channels, 5 x 5 convolutions, stride 1, padding 0Max[2,2]: max pooling with kernel size 2 and stride 2
![Page 24: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/24.jpg)
Experiment 1: MLPs on UCI/LIBSVM datasets24
Dataset information and more experiments and can be found in our paper.
The higher the better.
![Page 25: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/25.jpg)
Experiment 1: MLPs on UCI/LIBSVM datasets25
Symmetric losses and barrier hinge loss are preferable!
The higher the better.
![Page 26: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/26.jpg)
Experiment 2: CNNs on MNIST/CIFAR-10 26
![Page 27: On Symmetric Losses for Learning from Corrupted Labels · On Symmetric Losses for Learning from Corrupted Labels The University of Tokyo1 RIKEN Center for Advanced Intelligence Project](https://reader031.vdocument.in/reader031/viewer/2022040815/5e5d3c1e8f815219a773a16c/html5/thumbnails/27.jpg)
Conclusion
We showed that symmetric loss is preferable under corrupted labels for:• Area under the receiver operating characteristic curve (AUC) maximization
• Balanced error rate (BER) minimization
We provided general theoretical properties for symmetric losses:• Classification-calibration, excess risk bound, AUC-consistency
• Inability of estimating the class posterior probability
We proposed a barrier hinge loss:• As a proof of concept of the importance of symmetric condition
• Symmetric only in an interval but benefits greatly from symmetric condition• Significantly outperformed all losses in BER/AUC optimization using CNNs
27Poster#135: 6:30-9:00PM