domain adaptation for visual...

Domain Adaptation for Visual Recognition

Vishal M. Patel

Assistant Professor Department of Electrical and Computer Engineering

Rutgers University

[email protected] http://www.rci.rutgers.edu/~vmp93/

IEEE WACV 2016 Tutorial

March 9, 2016

About Me

•  Assistant Professor of ECE (Rutgers University, NJ USA) –  Ph.D. from the University of Maryland (2010) –  Research Associate at UMD (2010 - 2011) –  Assistant Research Scientists at UMD (2011 - 2014)

•  Research in Computer Vision and Machine Learning

Statistical methods for visual recognition

Biometrics Computational imaging

Outline •  Introduction and motivation •  Feature augmentation-based approaches •  Break •  Sparse and low-rank models for domain

adaptation – Applications

•  Object recognition •  Face recognition •  Active authentication

Introduction and Motivation

References

R. Gopalan, R. Li, V. M. Patel and R. Chellappa, "Domain adaptation for visual recognition," Foundations and Trends on Computer Graphics and Vision, vol. 8, no. 4, pp 285-378, Mar. 2015.

V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, "Visual domain adaptation: a survey of recent advances," IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 53 - 69, May 2015.

Data Deluge

-  240k photos/minute -  500 hours of video/minute

h"p://www.youtube.com/yt/press/sta3s3cs.html5h"p://blog.wishpond.com/post/115675435109/40BupBtoBdateBfacebookBfactsBandBstats55

Data Shift

Data distribution can change between data from different sources.

Saenko5et5al.5ECCV520105

Dataset Bias

Torralba5and5Efros5CVPR520115

•  Dataset has its own bias •  40% drop in performance on average when models trained on

one dataset are used for testing on another dataset •  Finite collection of images cannot capture the vast variations

present in real-world applications

1) Caltech-101, 2) UIUC, 3) MSRC, 4) Tiny Images, 5) ImageNet, 6) PASCAL VOC, 7) LabelMe, 8) SUNS-09, 9) 15 Scenes, 10) Corel, 11) Caltech-256, 12) COIL-100

Cross Sensor Matching •  In many applications

–  New sensors are developed –  Existing ones are upgraded

•  Users cannot be enrolled every time a new sensor is introduced –  Enrollment is expensive and time consuming

•  How to adapt existing algorithms for new sensors?

Pillai5et5al.5PAMI520135

Cross Sensor Matching •  Cross sensor matching degrades performance [Bowyer 2009]

–  Older sensor is less accurate compared to the newer one –  Cross sensor performance is worse than that of the older one

Pillai5et5al.5PAMI520135

Domain Adaptation vs. Traditional Machine Learning

•  Traditional machine learning approaches: training and test data are from the same distribution

•  Domain adaptation: training and test data are from different distribution

Source5domain5

Target5domain5

Domain Adaptation Problem: Given a labeled source dataset and a partially labeled/unlabeled target dataset, learn a classifier for the target dataset.

D15

D15 D15

D15

D15D15D1

D1

D2

D2

D2 D2

Domain Adaptation

•  Domain adaptation has been studied extensively in the natural language processing community

•  Very recently introduced to the vision community for visual recognition

•  Related problems –  Transfer learning –  Multi-view learning –  Self-taught learning –  Class imbalance –  Covariate shift –  Sample selection bias …

An old problem with a new name! - Prof. Rama Chellappa

Domain Adaptation - Applications

Face Recognition

Qiu5et5al.5ECCV52012,5Ni5et5al.5CVPR52013,5Shekhar5et5al.5CVPR520135


Text Recognition

Recognition of hand-written text using computer generated digits 5


Medical applications Object recognition

Semi-Supervised Domain Adaptation

•  Use the knowledge in the labeled source domain and labeled target domain

Source domain Target domain

It is normally assumed that

Unsupervised Domain Adaptation

•  Use the knowledge in the labeled source domain and unlabeled target domain


Multisource Domain Adaptation

•  More than one domains in the source domain •  Applicable to both semi-supervised and unsupervised cases


S15 SK5

Heterogeneous Domain Adaptation

•  The dimensions of features in the source and target domains are assumed to be different


High-resolution 100x100

Low-resolution 30x30

Feature Augmentation-based Approaches

Frustratingly Easy Domain Adaptation

•  Make a domain-specific copy of the original features for each domain (3N dimensional) - general version, source specific and target specific

!  Pass the resulting feature onto the underlying supervised classifier

!  Can be kernelized !  Can be extended to multi-source domain adaptation

"  For a K-domain problem, simply expand the feature space from 3N to (K+1)N

Daume5III5ACL5075

Frustratingly Easy Domain Adaptation

•  Data points from the same domain are twice as similar as those from different domains

•  Data points from the target domain have twice as much influence as source points when making predictions about test target data

Daume5III5ACL5075

View the kernel as a measure of similarity

Heterogeneous Feature Augmentation

Seek for an optimal common space and simultaneously learn a discriminative SVM classifier

Duan5et5al.5ICML52012,5Li5et5al.5TBPAMI520145

SVM: A Brief Introduction

•  Given training data: •  Learn the classification model

•  Find the hyperplane that maximizes the margin between the two classes

Credit:5CHRISTOPHER5J.C.5BURGES5

!  Convex quadratic optimization (QP)


•  Simultaneously learn an SVM classifier and two projection matrices



•  Dual form

P is l-by-N Q is l-by-M

!  Global optimum can be solved using MKL methods

Li5et5al.5TBPAMI520145

Object Recognition Dataset

Saenko5et5al.5ECCV520105

Amazon: consumer images from online merchant sites DSLR: images by DSLR camera Webcam: low quality images from webcams Caltech: From Caltech-256 object dataset

Object Recognition


SVM_T and KCCA HeMap [Shi et al. ICDM 2010] DAMA [Wang and Mahadevan. IJCAI 2011] ARC�t [Kulis et al. CVPR2011]

Text Categorization

Reuters multilingual dataset by using 10 labeled training samples per class from the target domain Spanish


Unsupervised Manifold-based Method

•  How to obtain meaningful intermediate domains? •  How to characterize incremental domain shift

information to perform recognition? •  Extend the feature augmentation method to consider a

manifold of intermediate domains.

Gopalan5et5al.5ICCV52011,5PAMI52014555

Manifold-based Method

•  Generate intermediate subspaces using PCA •  View them as a point on Grassmann manifold •  Sample points along the geodesic path to obtain

geometrically meaningful intermediate subspaces

Gopalan5et5al.5ICCV52011,5PAMI52014555

Manifold-based Method

Gong5et5al.5CVPR52012,5Gopalan5et5al.5ICCV52011,5PAMI52014555

Geodesic Flow Kernel

Gong5et5al.5CVPR520125

•  Embed source and target datasets in a Grassmann manifold

•  Construct a geodesic flow between the two points •  Integrate an infinite number of subspaces along the

flow

Sparse and Low-Rank Models for Domain Adaptation

Low-rank Representation-based Method

Jhuo5et5al.5CVPR520125

Map the source data by a matrix to an intermediate representation where each transformed sample can be reconstructed by a linear combination of the target data samples

Transformation matrix: Low-rank matrix

Robust Domain Adaptation with Low-rank Reconstruction (RDALR)

Jhuo5et5al.5CVPR520125

Dictionary Learning

Mairal5et#al.#CVPR52008,5Bach5et#al.#IEEEBPAMI52012,5Wright5et#al.5Proc.5IEEE52010#

Dictionary Learning What makes dictionary work? Olshausen and Field (Nature, 1996): Data-driven sparse codes close to response of visual receptive fields.

What if the data distribution changes?

Domain Change

Dictionary performance under change in domain:

There is a need for adaptation!

Dictionary Adaptation

Shekhar5et5al.5CVPR520135

Formulation

Reformulation

Multiple Domains

Discriminative Dictionaries

Optimization

Datasets

Domain Adaptation Results

Pose Alignment - CMU Multi-Pie

Hierarchical Sparse Coding

Dimensionality5reduc3on5

Sparse5codes5

Max5pooling5 Repeat5

•  Adaptation is performed on multiple levels of the feature hierarchy •  Adaptation is done jointly with feature learning •  Mechanism to prevent the data dimension from increasing too fast

Nguyen5et5al.5IEEE5TIP520155

Hierarchical Domain Adaptation

The5shared5dic3onary5captures5common5structures5between5the5source5and5target5domains.5

Nguyen5et5al.5ECCV520125

Hierarchical Domain Adaptation

Feature Pooling •  Local pooling

– Maximum/average over a local neighborhood –  Invariant to small translations –  Suppress background responses – Reduce dimension

•  Spatial pyramid pooling – Maximum or average over image quadrants

Experiments – Amazon, Caltech, DSLR, Webcam

Halftoned and Edge Images

Learned Dictionaries

Subspace Interpolation via Dictionary Learning

Ni5et5al5CVPR520135

Image Synthesis

Ni5et5al5CVPR520135

Object Recognition

Feature augmentation:

Ni5et5al5CVPR520135

Sparse Representation-based Classification

+ 0.20 0.15 0.33 0.51 + + 0.21 + Self-expressiveness property:

Training samples:

[ 0 0 0 0 0 0 0 0 0 0 0.20 0.15 0.33 0.51 0.21]

Sparse5vector5

Wright5et#al.5PAMI520095

Sparse Representation-based Classification

Sparse5vector5

Reconstruc3on5errors5

Domain Adaptive Sparse Representation-based Classification

Z15 Zc5

DASRC Formulation Self-expressiveness property:

Orthogonal rows:

Regularization:

Overall optimization:

Optimization

Update5X:5 Update5P:5

Alterna3ng5Direc3on5Method5of5Mul3pliers5(ADMM),5Boyd5et#al.#2011,5Elhamifar5and5Vidal5201355Method5of5Splibng5Orthogonality5Constraints5(SOC),5Lai5and5Osher520145

Iterative optimization scheme is followed: •  Update X, keeping P fixed •  Update P, keeping X fixed

DASRC Algorithm

Mobile Devices

h"p://resources.infosecins3tute.com/androidBforensics/5Aviv5et#al.#2010,#USENIX5Workshop5on5Offensive5Technologies5

Smudge5A"ack5

PIN5or5Password5 Pa"ern5Unlock5

Smartphone Sensors

Camera5 Gyroscope5

Magnetometer5

Barometer5

Thermometer5

Microphone5

Accelerometer5

Photometer5

Gravity5 GPS5

Touch Gestures

Orienta3on5

Pressure5Speed5Area5

User interaction behavior on touchscreens

Face-based Authentication

Visual stream acquired by the front-facing camera

Data Collection - Enrollment + Four Tasks

•  Enrollment •  Scroll test

–  View a collection of images that are arrayed horizontally and vertically

•  Document test –  Count the number figures, tables etc.

•  Popup test –  Drag and position an image in the center of the iPhone

•  Picture test –  Count the number of cars in a poster like image

50 users, 750 videos and 15,490 swipes

Sample Data Enrollment Document Test Scroll Test

Touch Data

Task515 Task525 Task535 Task545

Swipe Feature 1.  Inter-swipe time 2.  Swipe duration 3.  Start x 4.  Start y 5.  Stop x 6.  Stop y 7.  Direct end-to-end distance 8.  Mean resultant length 9.  Up/down/left/right flag 10.  Direction of end-to-end line 11.  Length of trajectory 12.  Average direction 13.  Average velocity 14.  Median acceleration at first 5 points 15.  Mid-swipe area covered 16.  Ratio end-to-end distance and length of trajectory 17.  20% pairwise velocity 18.  50% pairwise velocity 19.  80% pairwise velocity 20.  20% pairwise acceleration 21.  50% pairwise acceleration 22.  80% pairwise acceleration 23.  Median velocity at last 3 points 24.  Largest deviation from end-to-end line 25.  20% deviation from end-to-end line 26.  50% deviation from end-to-end line 27.  80% deviation from end-to-end line 28.  Mid-swipe pressure 29.  Phone orientation

Preprocessing and Feature Extraction

Viola5&5Jones,5IJCV52004,5Asthana5et#al.5CVPR520135

Identification Results

Image)set,based)methods:)Affine5HullBbased5Image5Set5Distance5(AHISD)5Convex5HullBbased5Image5Set5Distance5(CHISD)5SparseB5Approximated5Nearest5Points5(SANP)5MeanBSequence5SRC5(MSSRC)5Dic3onaryBbased5Face5Recogni3on5from5Video5(DFRV)5

S3ll)image,based)methods:)Eigen5Faces5(EF)5Fisher5Faces5(FF)5Sparse5Representa3onBbased5Classifica3on5(SRC)5LargeBMargin5Nearest5Neighbor55(LMNN)5

Faces5

Cross-Session Identification Results

Face5data5

Touch5data511Bswipe5classifica3on5

Results – Touch Data

[1]5Wright5et#al.5PAMI52009,5[2]5Saenko5et#al.5ECCV52010,5[3]5Gopalan5et#al.5PAMI52014,5[4]5Shekhar5et#al.5CVPR52013,5[5]5Ni5et#al.5CVPR52013,5[6]5Hoffman5et#al.5ECCV52012555

SingleBsource5domain5adapta3on5

Mul3Bsource5domain5adapta3on5

Training: 20 samples per class from the source domain, 5 samples per class from the target domain Testing: remaining data from the target domain

Results – Face Data

SingleBsource5domain5adapta3on5

Mul3Bsource5domain5adapta3on5

Learned Projections

P15

P25

P35

Open Problems •  Propagation of pdf along intermediate domains. •  Relax the need to have all target samples. •  Physical and statistical characterizations of dataset bias

and domain shifts –  measuring distribution mismatch and generalization bounds –  integration of physical and statistical models –  How to adapt visible to IR or SAR images?

•  Efficient online adapting algorithms? scalable algorithms for adaptation between large datasets, incremental adaptation. High dimensional data, role of dimension reduction techniques in DA.

References 1.  V.M.5Patel,5R.5Gopalan,5R.5Li,5and5R.5Chellappa.5"Visual5Domain5Adapta3on:5A5Survey5of5Recent5Advances",5IEEE5Signal5Processing5

Magazine5(SPM),5Vol.532,5pp.553B69,5May52015.52.  R.5Gopalan,5R.5Li,5V.M.5Patel,5and5R.5Chellappa.5"Domain5Adapta3on5for5Visual5Recogni3on",5In5Founda3ons5and5Trends5in5Computer5

Graphics5and5Vision,5NOW5Publishers,52015.53.  H.5Zhang,5V.5M.5Patel,5S.5Shekhar5and5R.5Chellappa,5"Domain5adap3ve5sparse5representa3onBbased5classifica3on,"5IEEE#Interna,onal#

Conference#on#Automa,c#Face#and#Gesture#Recogni,on#(FG),52015.554.  A.5Shrivastava,5S.5Shekhar,5and5V.5M.5Patel,5"Unsupervised5domain5adapta3on5using5parallel5transport5on5Grasmann5manifold,"5IEEE#

Winter#conference#on#Applica,ons#of#Computer#Vision#(WACV),52014.55.  S.5Shekhar,5V.5M.5Patel,5H.5V.5Nguyen5and5R.5Chellappa,5"Generalized5domain5adap3ve5dic3onaries,"5IEEE#conference#on#Computer#

Vision#and#PaAern#Recogni,on#(CVPR),52013.556.  Q.5Qiu,5V.5M.5Patel,5P.5Turaga,5R.5Chellappa,5"Domain5adap3ve5dic3onary5learning,"5European#Conference#on#Computer#Vision#(ECCV),5

2012.557.  H.5V.5Nguyen,5H.5T.5Ho,5V.5M.5Patel,5and5R.5Chellappa,5"Joint5hierarchical5domain5adapta3on5and5feature5learning,"5IEEE#Transac,ons#

on#Image#Processing,52015*.58.  A.55Torralba55and55A.55A.55Efros,55“Unbiased55look55at55dataset55bias,”55in5IEEE55Conference55on55Computer55Vision55and55Pa"ern5

Recogni3on5,52011.59.  A.5Khosla,5T.55Zhou,5T.5Malisiewicz,5A.5A.5Efros,55and5A.5Torralba,5“Undoing5the5damage5of55dataset5bias,”5in5European5Conference5on5

Computer5Vision5,52012.510.  S.5J.5Pan5and5Q.5Yang,5“A5survey5on5transfer5learning,”5IEEE5Transac3ons5on5Knowledge5and5Data5Engineering,5vol.522,5no.510,5pp.5

1345–1359,52010.511.  H.55Daume55III,55“Frustra3ngly55easy55domain55adapta3on,”55in5Conference55of55the55Associa3on55for55Computa3onal55Linguis3cs,52007.512.  W.5Li,5L.5Duan,5D.5Xu,5and5I.5Tsang,5"Learning5with5augmented5features5for5supervised5and5semiBsupervised5heterogeneous5domain5

adapta3on",5IEEE5Transac3ons5on5Pa"ern5Analysis5and5Machine5Intelligence,5vol.536,5no.56,5pp.51134B1148,5Jun.52014.513.  R.55Gopalan,55R.55Li,55and55R.55Chellappa,55“Domain55adapta3on55for55object55recogni3on:55An55unsupervised55approach,”55in5IEEE5

Interna3onal5Conference5on5Computer5Vision,52011,5pp.5999B1006.514.  R.5Gopalan,5R.5Li,5and5R.5Chellappa,5"Unsupervised5Adapta3on5Across5Domain5Shius5By5Genera3ng5Intermediate5Data5

Representa3ons",5IEEE5Transac3ons5on5Pa"ern5Analysis5and5Machine5Intelligence5(PAMI),5Vol.536,5pp.52288B2302,5Nov52014.515.  I.BH.55Jhuo,55D.55Liu,55D.55Lee,55and55S.BF.55Chang,55“Robust55visual55domain55adapta3on55with55lowBrank55reconstruc3on,”55in5IEEE5

Conference5on5Computer5Vision5and5Pa"ern5Recogni3on,52012,5pp.52168–2175.5

References 1.  B.55Kulis,55K.55Saenko,55and55T.55Darrell,55“What55you55saw55is55not55what55you55get:55Domain55adapta3on55using55asymmetric55kernel5

transforms,”5in5IEEE5Conference5on5Computer5Vision5and5Pa"ern5Recogni3on,52011,5pp.51785–1792.52.  K.5Saenko,5B.5Kulis,5M.5Fritz,5and5T.5Darrell.,5“Adap3ng5visual5category5models5to5new5domains,”5in5European5Conference5on5

Computer5Vision,5vol.56314,52010,5pp.5213–226.53.  J.5Ni,5Q.5Qiu,5and5R.5Chellappa,5“Subspace5interpola3on5via5dic3onary5learning5for5unsupervised5domain5adapta3on,”5in5IEEE5

Interna3onal5Conference5on5Computer5Vision,52013.54.  H.5V.5Nguyen,5V.5M.5Patel,5N.5M.5Nasrabadi,5and5R.5Chellappa,5“Sparse5embedding:5A5framework5for5sparsity5promo3ng5

dimensionality5reduc3on,”5in5European5conference5on5Computer5vision,52012.55.  B.55Gong,55K.55Grauman,55and55F.55Sha,55“Connec3ng55the55dots55with55landmarks:55Discrimina3vely55learning55domainBinvariant5features5

for5unsupervised5domain5adapta3on,”5in5Interna3onal5Conference5on5Machine5Learning,52013.5

Acknowledgement

Ashish)Srivastava)

Sumit)Shekhar) Hien)Nguyen) Yi,Chen)Chen) Huy)Tho)Ho)

Heng)Zhang) Sayantan)Sarkar)

Rama)Chellappa)U.5Maryland5

[email protected] http://www.rci.rutgers.edu/~vmp93/

domain adaptation for visual...

Documents