learning classifiers for non-iid data

Download Learning Classifiers For Non-IID Data

If you can't read please download the document

Upload: nixie

Post on 20-Mar-2016

43 views

Category:

Documents


6 download

DESCRIPTION

Learning Classifiers For Non-IID Data. Balaji Krishnapuram , Computer-Aided Diagnosis and Therapy Siemens Medical Solutions, Inc. Collaborators: Volkan Vural, Jennifer Dy [North Eastern], Ya Xue [Duke], Murat Dundar, Glenn Fung, Bharat Rao [Siemens] Jun 27, 2006. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

  • Learning Classifiers For Non-IID Data Balaji Krishnapuram, Computer-Aided Diagnosis and TherapySiemens Medical Solutions, Inc.

    Collaborators: Volkan Vural, Jennifer Dy [North Eastern], Ya Xue [Duke], Murat Dundar, Glenn Fung, Bharat Rao [Siemens]

    Jun 27, 2006

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    OutlineImplicit IID assumption in traditional classifier design

    Often, not valid in real life. Motivating CAD problems

    Convex algorithms for Multiple Instance Learning (MIL)

    Bayesian algorithms for Batch-wise classification Faster, approximate algorithms via mathematical programming

    Summary / Conclusions

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    IID assumption in classifier designTraining data D={(xi,yi)i=1N: xi 2 Rd, yi 2 {+1,-1}}, Testing data T ={(xi,yi)i=1M: xi 2 Rd, yi 2 {+1,-1}},

    Assume each training/testing sample drawn independently from identical distribution:(xi,yi) ~ PXY(x,y)

    This is why we can classify one test sample at a time, ignoring the features of the other test samplesEg. Logistic Regression: P(yi=1|xi,w)=1/(1+exp(-wT xi))

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Evaluating classifiers: learning-theoryBinomial test set bounds: With high probability over the random draw of M samples in testing set T, if M large and a classifier w is observed to be accurate on T, with high probability its expected accuracy over a random draw of a sample from PXY(x,y) will be high

    If the IID assumption fails, all bets are off !Thought experiment: repeat same test sample M times

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Training classifiers: learning theoryWith high probability over the random draw of N samples in training set D, the expected accuracy on a random sample from PXY(x,y) for the learnt classifier w will be high iffaccurate on the training set D; and N largesatisfies intuition before seeing data (prior, large margin etc)

    PAC-Bayes, VC-theory etc rely on iid assumptionRelaxation to exchangeability being explored

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CAD: Correlations among candidate ROI

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Hierarchical Correlation Among Samples

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Additive Random Effect ModelsThe classification is treated as iid, but only if given bothFixed effects (unique to sample) Random effects (shared among samples)

    Simple additive model to explain the correlationsP(yi|xi,w,ri,v)=1/(1+exp(-wT xi vT ri)) P(yi|xi,w,ri)=s P(yi|xi,w,ri,v) p(v|D) dv Sharing vT ri among many samples correlated prediction

    But only small improvements in real-life applications

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CAD detects early stage colon cancer

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Candidate Specific Random Effects Model: PolypsSpecificitySensitivity

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CAD algorithms: domain-specific issuesMultiple (correlated) views: one detection is sufficient

    Systemic treatment of diseases: e.g. detecting one PE sufficient

    Modeling the data acquisition mechanismErrors in guessing class labels for training set.

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    The Multiple Instance Learning ProblemA bag is a collection of many instances (samples)

    The class label is provided for bags, not instances

    Positive bag has at least one +ve instance in it

    Examples of bag definition for CAD applications:Bag=samples from multiple views, for the same regionBag=all candidates referring to same underlying structureBag=all candidates from a patient

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CH-MIL Algorithm: 2-D illustration

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CH-MIL Algorithm for Fishers DiscriminantEasy implementation via Alternating OptimizationScales well to very large datasetsConvex problem with unique optima

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Lung CADDR CADLung Nodules& Pulmonary EmboliComputed TomographyAX*Pending FDA Approval

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CH-MIL: Pulmonary Embolisms

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    CH-MIL: Polyps in Colon

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Classifying a Correlated Batch of SamplesLet classification of individual samples xi be based on uiEg. Linear ui = wT xi ; or kernel-predictor ui= j=1N j k(xi,xj)

    Instead of basing the classification on ui, we will base it on an unobserved (latent) random variable zi

    Prior: Even before observing any features xi (thus before ui), zi are known to be correlated a-priori, p(z)=N(z|0,)

    Eg. due to spatial adjacency = exp(-D), Matrix D=pair-wise dist. between samples

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Classifying a Correlated Batch of SamplesPrior: Even before observing any features xi (thus before ui), zi are known to be correlated a-priori, p(z)=N(z|0,)

    Likelihood: Let us claim that ui is really a noisy observation of a random variable zi :p(ui|zi)=N(ui|zi, 2)

    Posterior: remains correlated, even after observing the features xiP(z|u)=N(z|(-12+I)-1u, (-1+2I)-1)Intuition: E[zi]=j=1N Aij uj ; A=(-12+I)-1

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    SVM-like Approximate AlgorithmIntuition: classify using E[zi]=j=1N Aij uj ; A=(-12+I)-1What if we used A=( + I) instead?Reduces computation by avoiding inversion.Not principled, but a heuristic for speed.Yields an SVM-like mathematical programming algorithm:

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Detecting Polyps in Colon

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Detecting Pulmonary Embolisms

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    Detecting Nodules in the Lung

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy

    ConclusionsIID assumption is universal in MLOften violated in real life, but ignoredExplicit modeling can substantially improve accuracyDescribed 3 models in this talk, utilizing varying levels of informationAdditive Random Effects Models: weak correlation informationMultiple Instance Learning: stronger correlations enforcedBatch-wise classification models: explicit information Statistically significant improvement in accuracyOnly starting to scratch the surface, lots to improve!

    *2005 Siemens Medical Solutions. All rights reserved. Siemens confidential Computer-aided Diagnosis & Therapy