domain adaptation for visual...
TRANSCRIPT
Domain Adaptation for Visual Recognition
Vishal M. Patel
Assistant Professor Department of Electrical and Computer Engineering
Rutgers University
[email protected] http://www.rci.rutgers.edu/~vmp93/
IEEE WACV 2016 Tutorial
March 9, 2016
About Me
• Assistant Professor of ECE (Rutgers University, NJ USA) – Ph.D. from the University of Maryland (2010) – Research Associate at UMD (2010 - 2011) – Assistant Research Scientists at UMD (2011 - 2014)
• Research in Computer Vision and Machine Learning
Statistical methods for visual recognition
Biometrics Computational imaging
Outline • Introduction and motivation • Feature augmentation-based approaches • Break • Sparse and low-rank models for domain
adaptation – Applications
• Object recognition • Face recognition • Active authentication
Introduction and Motivation
References
R. Gopalan, R. Li, V. M. Patel and R. Chellappa, "Domain adaptation for visual recognition," Foundations and Trends on Computer Graphics and Vision, vol. 8, no. 4, pp 285-378, Mar. 2015.
V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, "Visual domain adaptation: a survey of recent advances," IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 53 - 69, May 2015.
Data Deluge
- 240k photos/minute - 500 hours of video/minute
h"p://www.youtube.com/yt/press/sta3s3cs.html5h"p://blog.wishpond.com/post/115675435109/40BupBtoBdateBfacebookBfactsBandBstats55
Data Shift
Data distribution can change between data from different sources.
Saenko5et5al.5ECCV520105
Dataset Bias
Torralba5and5Efros5CVPR520115
• Dataset has its own bias • 40% drop in performance on average when models trained on
one dataset are used for testing on another dataset • Finite collection of images cannot capture the vast variations
present in real-world applications
1) Caltech-101, 2) UIUC, 3) MSRC, 4) Tiny Images, 5) ImageNet, 6) PASCAL VOC, 7) LabelMe, 8) SUNS-09, 9) 15 Scenes, 10) Corel, 11) Caltech-256, 12) COIL-100
Cross Sensor Matching • In many applications
– New sensors are developed – Existing ones are upgraded
• Users cannot be enrolled every time a new sensor is introduced – Enrollment is expensive and time consuming
• How to adapt existing algorithms for new sensors?
Pillai5et5al.5PAMI520135
Cross Sensor Matching • Cross sensor matching degrades performance [Bowyer 2009]
– Older sensor is less accurate compared to the newer one – Cross sensor performance is worse than that of the older one
Pillai5et5al.5PAMI520135
Domain Adaptation vs. Traditional Machine Learning
• Traditional machine learning approaches: training and test data are from the same distribution
• Domain adaptation: training and test data are from different distribution
Source5domain5
Target5domain5
Domain Adaptation Problem: Given a labeled source dataset and a partially labeled/unlabeled target dataset, learn a classifier for the target dataset.
D15
D15 D15
D15
D15D15D1
D1
D2
D2
D2 D2
Domain Adaptation
• Domain adaptation has been studied extensively in the natural language processing community
• Very recently introduced to the vision community for visual recognition
• Related problems – Transfer learning – Multi-view learning – Self-taught learning – Class imbalance – Covariate shift – Sample selection bias …
An old problem with a new name! - Prof. Rama Chellappa
Domain Adaptation - Applications
Face Recognition
Qiu5et5al.5ECCV52012,5Ni5et5al.5CVPR52013,5Shekhar5et5al.5CVPR520135
Domain Adaptation - Applications
Text Recognition
Recognition of hand-written text using computer generated digits 5
Domain Adaptation - Applications
Medical applications Object recognition
Semi-Supervised Domain Adaptation
• Use the knowledge in the labeled source domain and labeled target domain
Source domain Target domain
It is normally assumed that
Unsupervised Domain Adaptation
• Use the knowledge in the labeled source domain and unlabeled target domain
Source domain Target domain
Multisource Domain Adaptation
• More than one domains in the source domain • Applicable to both semi-supervised and unsupervised cases
Source domain Target domain
S15 SK5
Heterogeneous Domain Adaptation
• The dimensions of features in the source and target domains are assumed to be different
Source domain Target domain
High-resolution 100x100
Low-resolution 30x30
Feature Augmentation-based Approaches
Frustratingly Easy Domain Adaptation
• Make a domain-specific copy of the original features for each domain (3N dimensional) - general version, source specific and target specific
! Pass the resulting feature onto the underlying supervised classifier
! Can be kernelized ! Can be extended to multi-source domain adaptation
" For a K-domain problem, simply expand the feature space from 3N to (K+1)N
Daume5III5ACL5075
Frustratingly Easy Domain Adaptation
• Data points from the same domain are twice as similar as those from different domains
• Data points from the target domain have twice as much influence as source points when making predictions about test target data
Daume5III5ACL5075
View the kernel as a measure of similarity
Heterogeneous Feature Augmentation
Seek for an optimal common space and simultaneously learn a discriminative SVM classifier
Duan5et5al.5ICML52012,5Li5et5al.5TBPAMI520145
SVM: A Brief Introduction
• Given training data: • Learn the classification model
• Find the hyperplane that maximizes the margin between the two classes
Credit:5CHRISTOPHER5J.C.5BURGES5
! Convex quadratic optimization (QP)
Heterogeneous Feature Augmentation
• Simultaneously learn an SVM classifier and two projection matrices
Duan5et5al.5ICML52012,5Li5et5al.5TBPAMI520145
Heterogeneous Feature Augmentation
• Dual form
P is l-by-N Q is l-by-M
! Global optimum can be solved using MKL methods
Li5et5al.5TBPAMI520145
Object Recognition Dataset
Saenko5et5al.5ECCV520105
Amazon: consumer images from online merchant sites DSLR: images by DSLR camera Webcam: low quality images from webcams Caltech: From Caltech-256 object dataset
Object Recognition
Duan5et5al.5ICML52012,5Li5et5al.5TBPAMI520145
SVM_T and KCCA HeMap [Shi et al. ICDM 2010] DAMA [Wang and Mahadevan. IJCAI 2011] ARC�t [Kulis et al. CVPR2011]
Text Categorization
Reuters multilingual dataset by using 10 labeled training samples per class from the target domain Spanish
Duan5et5al.5ICML52012,5Li5et5al.5TBPAMI520145
Unsupervised Manifold-based Method
• How to obtain meaningful intermediate domains? • How to characterize incremental domain shift
information to perform recognition? • Extend the feature augmentation method to consider a
manifold of intermediate domains.
Gopalan5et5al.5ICCV52011,5PAMI52014555
Manifold-based Method
• Generate intermediate subspaces using PCA • View them as a point on Grassmann manifold • Sample points along the geodesic path to obtain
geometrically meaningful intermediate subspaces
Gopalan5et5al.5ICCV52011,5PAMI52014555
Manifold-based Method
Gong5et5al.5CVPR52012,5Gopalan5et5al.5ICCV52011,5PAMI52014555
Geodesic Flow Kernel
Gong5et5al.5CVPR520125
• Embed source and target datasets in a Grassmann manifold
• Construct a geodesic flow between the two points • Integrate an infinite number of subspaces along the
flow
Sparse and Low-Rank Models for Domain Adaptation
Low-rank Representation-based Method
Jhuo5et5al.5CVPR520125
Map the source data by a matrix to an intermediate representation where each transformed sample can be reconstructed by a linear combination of the target data samples
Transformation matrix: Low-rank matrix
Robust Domain Adaptation with Low-rank Reconstruction (RDALR)
Jhuo5et5al.5CVPR520125
Dictionary Learning
Mairal5et#al.#CVPR52008,5Bach5et#al.#IEEEBPAMI52012,5Wright5et#al.5Proc.5IEEE52010#
Dictionary Learning What makes dictionary work? Olshausen and Field (Nature, 1996): Data-driven sparse codes close to response of visual receptive fields.
What if the data distribution changes?
Domain Change
Dictionary performance under change in domain:
There is a need for adaptation!
Dictionary Adaptation
Shekhar5et5al.5CVPR520135
Formulation
Reformulation
Multiple Domains
Discriminative Dictionaries
Optimization
Datasets
Domain Adaptation Results
Pose Alignment - CMU Multi-Pie
Hierarchical Sparse Coding
Dimensionality5reduc3on5
Sparse5codes5
Max5pooling5 Repeat5
• Adaptation is performed on multiple levels of the feature hierarchy • Adaptation is done jointly with feature learning • Mechanism to prevent the data dimension from increasing too fast
Nguyen5et5al.5IEEE5TIP520155
Hierarchical Domain Adaptation
The5shared5dic3onary5captures5common5structures5between5the5source5and5target5domains.5
Nguyen5et5al.5ECCV520125
Hierarchical Domain Adaptation
Feature Pooling • Local pooling
– Maximum/average over a local neighborhood – Invariant to small translations – Suppress background responses – Reduce dimension
• Spatial pyramid pooling – Maximum or average over image quadrants
Experiments – Amazon, Caltech, DSLR, Webcam
Halftoned and Edge Images
Halftoned and Edge Images
Learned Dictionaries
Subspace Interpolation via Dictionary Learning
Ni5et5al5CVPR520135
Image Synthesis
Ni5et5al5CVPR520135
Object Recognition
Feature augmentation:
Ni5et5al5CVPR520135
Sparse Representation-based Classification
+ 0.20 0.15 0.33 0.51 + + 0.21 + Self-expressiveness property:
Training samples:
[ 0 0 0 0 0 0 0 0 0 0 0.20 0.15 0.33 0.51 0.21]
Sparse5vector5
Wright5et#al.5PAMI520095
Sparse Representation-based Classification
Sparse5vector5
Reconstruc3on5errors5
Domain Adaptive Sparse Representation-based Classification
Z15 Zc5
DASRC Formulation Self-expressiveness property:
Orthogonal rows:
Regularization:
Overall optimization:
Optimization
Update5X:5 Update5P:5
Alterna3ng5Direc3on5Method5of5Mul3pliers5(ADMM),5Boyd5et#al.#2011,5Elhamifar5and5Vidal5201355Method5of5Splibng5Orthogonality5Constraints5(SOC),5Lai5and5Osher520145
Iterative optimization scheme is followed: • Update X, keeping P fixed • Update P, keeping X fixed
DASRC Algorithm
Mobile Devices
h"p://resources.infosecins3tute.com/androidBforensics/5Aviv5et#al.#2010,#USENIX5Workshop5on5Offensive5Technologies5
Smudge5A"ack5
PIN5or5Password5 Pa"ern5Unlock5
Smartphone Sensors
Camera5 Gyroscope5
Magnetometer5
Barometer5
Thermometer5
Microphone5
Accelerometer5
Photometer5
Gravity5 GPS5
Touch Gestures
Orienta3on5
Pressure5Speed5Area5
User interaction behavior on touchscreens
Face-based Authentication
Visual stream acquired by the front-facing camera
Data Collection - Enrollment + Four Tasks
• Enrollment • Scroll test
– View a collection of images that are arrayed horizontally and vertically
• Document test – Count the number figures, tables etc.
• Popup test – Drag and position an image in the center of the iPhone
• Picture test – Count the number of cars in a poster like image
50 users, 750 videos and 15,490 swipes
Sample Data Enrollment Document Test Scroll Test
Touch Data
Task515 Task525 Task535 Task545
Swipe Feature 1. Inter-swipe time 2. Swipe duration 3. Start x 4. Start y 5. Stop x 6. Stop y 7. Direct end-to-end distance 8. Mean resultant length 9. Up/down/left/right flag 10. Direction of end-to-end line 11. Length of trajectory 12. Average direction 13. Average velocity 14. Median acceleration at first 5 points 15. Mid-swipe area covered 16. Ratio end-to-end distance and length of trajectory 17. 20% pairwise velocity 18. 50% pairwise velocity 19. 80% pairwise velocity 20. 20% pairwise acceleration 21. 50% pairwise acceleration 22. 80% pairwise acceleration 23. Median velocity at last 3 points 24. Largest deviation from end-to-end line 25. 20% deviation from end-to-end line 26. 50% deviation from end-to-end line 27. 80% deviation from end-to-end line 28. Mid-swipe pressure 29. Phone orientation
Preprocessing and Feature Extraction
Viola5&5Jones,5IJCV52004,5Asthana5et#al.5CVPR520135
Identification Results
Image)set,based)methods:)Affine5HullBbased5Image5Set5Distance5(AHISD)5Convex5HullBbased5Image5Set5Distance5(CHISD)5SparseB5Approximated5Nearest5Points5(SANP)5MeanBSequence5SRC5(MSSRC)5Dic3onaryBbased5Face5Recogni3on5from5Video5(DFRV)5
S3ll)image,based)methods:)Eigen5Faces5(EF)5Fisher5Faces5(FF)5Sparse5Representa3onBbased5Classifica3on5(SRC)5LargeBMargin5Nearest5Neighbor55(LMNN)5
Faces5
Cross-Session Identification Results
Face5data5
Touch5data511Bswipe5classifica3on5
Results – Touch Data
[1]5Wright5et#al.5PAMI52009,5[2]5Saenko5et#al.5ECCV52010,5[3]5Gopalan5et#al.5PAMI52014,5[4]5Shekhar5et#al.5CVPR52013,5[5]5Ni5et#al.5CVPR52013,5[6]5Hoffman5et#al.5ECCV52012555
SingleBsource5domain5adapta3on5
Mul3Bsource5domain5adapta3on5
Training: 20 samples per class from the source domain, 5 samples per class from the target domain Testing: remaining data from the target domain
Results – Face Data
SingleBsource5domain5adapta3on5
Mul3Bsource5domain5adapta3on5
Learned Projections
P15
P25
P35
Open Problems • Propagation of pdf along intermediate domains. • Relax the need to have all target samples. • Physical and statistical characterizations of dataset bias
and domain shifts – measuring distribution mismatch and generalization bounds – integration of physical and statistical models – How to adapt visible to IR or SAR images?
• Efficient online adapting algorithms? scalable algorithms for adaptation between large datasets, incremental adaptation. High dimensional data, role of dimension reduction techniques in DA.
References 1. V.M.5Patel,5R.5Gopalan,5R.5Li,5and5R.5Chellappa.5"Visual5Domain5Adapta3on:5A5Survey5of5Recent5Advances",5IEEE5Signal5Processing5
Magazine5(SPM),5Vol.532,5pp.553B69,5May52015.52. R.5Gopalan,5R.5Li,5V.M.5Patel,5and5R.5Chellappa.5"Domain5Adapta3on5for5Visual5Recogni3on",5In5Founda3ons5and5Trends5in5Computer5
Graphics5and5Vision,5NOW5Publishers,52015.53. H.5Zhang,5V.5M.5Patel,5S.5Shekhar5and5R.5Chellappa,5"Domain5adap3ve5sparse5representa3onBbased5classifica3on,"5IEEE#Interna,onal#
Conference#on#Automa,c#Face#and#Gesture#Recogni,on#(FG),52015.554. A.5Shrivastava,5S.5Shekhar,5and5V.5M.5Patel,5"Unsupervised5domain5adapta3on5using5parallel5transport5on5Grasmann5manifold,"5IEEE#
Winter#conference#on#Applica,ons#of#Computer#Vision#(WACV),52014.55. S.5Shekhar,5V.5M.5Patel,5H.5V.5Nguyen5and5R.5Chellappa,5"Generalized5domain5adap3ve5dic3onaries,"5IEEE#conference#on#Computer#
Vision#and#PaAern#Recogni,on#(CVPR),52013.556. Q.5Qiu,5V.5M.5Patel,5P.5Turaga,5R.5Chellappa,5"Domain5adap3ve5dic3onary5learning,"5European#Conference#on#Computer#Vision#(ECCV),5
2012.557. H.5V.5Nguyen,5H.5T.5Ho,5V.5M.5Patel,5and5R.5Chellappa,5"Joint5hierarchical5domain5adapta3on5and5feature5learning,"5IEEE#Transac,ons#
on#Image#Processing,52015*.58. A.55Torralba55and55A.55A.55Efros,55“Unbiased55look55at55dataset55bias,”55in5IEEE55Conference55on55Computer55Vision55and55Pa"ern5
Recogni3on5,52011.59. A.5Khosla,5T.55Zhou,5T.5Malisiewicz,5A.5A.5Efros,55and5A.5Torralba,5“Undoing5the5damage5of55dataset5bias,”5in5European5Conference5on5
Computer5Vision5,52012.510. S.5J.5Pan5and5Q.5Yang,5“A5survey5on5transfer5learning,”5IEEE5Transac3ons5on5Knowledge5and5Data5Engineering,5vol.522,5no.510,5pp.5
1345–1359,52010.511. H.55Daume55III,55“Frustra3ngly55easy55domain55adapta3on,”55in5Conference55of55the55Associa3on55for55Computa3onal55Linguis3cs,52007.512. W.5Li,5L.5Duan,5D.5Xu,5and5I.5Tsang,5"Learning5with5augmented5features5for5supervised5and5semiBsupervised5heterogeneous5domain5
adapta3on",5IEEE5Transac3ons5on5Pa"ern5Analysis5and5Machine5Intelligence,5vol.536,5no.56,5pp.51134B1148,5Jun.52014.513. R.55Gopalan,55R.55Li,55and55R.55Chellappa,55“Domain55adapta3on55for55object55recogni3on:55An55unsupervised55approach,”55in5IEEE5
Interna3onal5Conference5on5Computer5Vision,52011,5pp.5999B1006.514. R.5Gopalan,5R.5Li,5and5R.5Chellappa,5"Unsupervised5Adapta3on5Across5Domain5Shius5By5Genera3ng5Intermediate5Data5
Representa3ons",5IEEE5Transac3ons5on5Pa"ern5Analysis5and5Machine5Intelligence5(PAMI),5Vol.536,5pp.52288B2302,5Nov52014.515. I.BH.55Jhuo,55D.55Liu,55D.55Lee,55and55S.BF.55Chang,55“Robust55visual55domain55adapta3on55with55lowBrank55reconstruc3on,”55in5IEEE5
Conference5on5Computer5Vision5and5Pa"ern5Recogni3on,52012,5pp.52168–2175.5
References 1. B.55Kulis,55K.55Saenko,55and55T.55Darrell,55“What55you55saw55is55not55what55you55get:55Domain55adapta3on55using55asymmetric55kernel5
transforms,”5in5IEEE5Conference5on5Computer5Vision5and5Pa"ern5Recogni3on,52011,5pp.51785–1792.52. K.5Saenko,5B.5Kulis,5M.5Fritz,5and5T.5Darrell.,5“Adap3ng5visual5category5models5to5new5domains,”5in5European5Conference5on5
Computer5Vision,5vol.56314,52010,5pp.5213–226.53. J.5Ni,5Q.5Qiu,5and5R.5Chellappa,5“Subspace5interpola3on5via5dic3onary5learning5for5unsupervised5domain5adapta3on,”5in5IEEE5
Interna3onal5Conference5on5Computer5Vision,52013.54. H.5V.5Nguyen,5V.5M.5Patel,5N.5M.5Nasrabadi,5and5R.5Chellappa,5“Sparse5embedding:5A5framework5for5sparsity5promo3ng5
dimensionality5reduc3on,”5in5European5conference5on5Computer5vision,52012.55. B.55Gong,55K.55Grauman,55and55F.55Sha,55“Connec3ng55the55dots55with55landmarks:55Discrimina3vely55learning55domainBinvariant5features5
for5unsupervised5domain5adapta3on,”5in5Interna3onal5Conference5on5Machine5Learning,52013.5
Acknowledgement
Ashish)Srivastava)
Sumit)Shekhar) Hien)Nguyen) Yi,Chen)Chen) Huy)Tho)Ho)
Heng)Zhang) Sayantan)Sarkar)
Rama)Chellappa)U.5Maryland5
[email protected] http://www.rci.rutgers.edu/~vmp93/