advanced machine learning

Upload: anirudhkulukuru

Post on 04-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Advanced Machine Learning

    1/36

    CS 221: Artificial Intelligence

    Lecture 6:

    Advanced Machine Learning

    Sebastian Thrun and Peter Norvig

    Slide credit: Mark Pollefeys, Dan Klein, Chris Manning

  • 7/29/2019 Advanced Machine Learning

    2/36

    Outline

    Clustering

    K-Means

    EM

    Spectral Clustering

    Dimensionality Reduction

    2

  • 7/29/2019 Advanced Machine Learning

    3/36

    The unsupervised learning problem

    3

    Many data points, no labels

  • 7/29/2019 Advanced Machine Learning

    4/36

    Unsupervised Learning?

    Google Street View

    4

    http://maps.google.com/http://maps.google.com/
  • 7/29/2019 Advanced Machine Learning

    5/36

    K-Means

    5

    Many data points, no labels

  • 7/29/2019 Advanced Machine Learning

    6/36

    K-Means

    Choose a fixed number ofclusters

    Choose cluster centers andpoint-cluster allocations to

    minimize error cant do this by exhaustive

    search, because there aretoo many possibleallocations.

    Algorithm fix cluster centers;

    allocate points to closest

    cluster

    fix allocation; compute

    best cluster centers x could be any set of

    features for which we can

    compute a distance (careful

    about scaling)

    x j i 2j elements o f i' th clusterclusters* From Marc Pollefeys COMP 256 2003

  • 7/29/2019 Advanced Machine Learning

    7/36

    K-Means

  • 7/29/2019 Advanced Machine Learning

    8/36

    K-Means

    * From Marc Pollefeys COMP 256 2003

  • 7/29/2019 Advanced Machine Learning

    9/36

    K-means clustering using intensity alone and color alone

    Image Clusters on intensity Clusters on color

    * From Marc Pollefeys COMP 256 2003

    Results of K-Means Clustering:

  • 7/29/2019 Advanced Machine Learning

    10/36

    K-Means

    Is an approximation to EM Model (hypothesis space): Mixture of N Gaussians

    Latent variables: Correspondence of data and Gaussians

    We notice:

    Given the mixture model, its easy to calculate thecorrespondence

    Given the correspondence its easy to estimate the mixturemodels

  • 7/29/2019 Advanced Machine Learning

    11/36

    Expectation Maximzation: Idea

    Data generated from mixture of Gaussians

    Latent variables: Correspondence between Data Itemsand Gaussians

  • 7/29/2019 Advanced Machine Learning

    12/36

    Generalized K-Means (EM)

  • 7/29/2019 Advanced Machine Learning

    13/36

    Gaussians

    13

    p(x) =1

    2p s exp -(x - m )22s 2

    p(x) = 2p( )-N

    2 S

    - 1

    exp -1

    2 (x- m )T S - 1(x - m )

  • 7/29/2019 Advanced Machine Learning

    14/36

    ML Fitting Gaussians

    14

    m = 1M xi

    i

    =

    1

    Mxi - m( )

    i

    xi - m( )T

    p(x) =1

    2p s exp -(x - m )22s 2

    p(x) = 2p( )-

    N

    2 S

    - 1

    exp -1

    2 (x- m )T S - 1(x - m )

  • 7/29/2019 Advanced Machine Learning

    15/36

    Learning a Gaussian Mixture(with known covariance)

    k

    n

    x

    x

    ni

    ji

    e

    e

    1

    )(2

    1

    )(

    2

    1

    2

    2

    2

    2

    nj E[zij ]i=1

    M

    M-Step

    k

    n

    ni

    ji

    ij

    xxp

    xxpzE

    1

    )|(

    )|(][E-Step

    Sj

    1

    nj

    E[zij] x

    i- m( )

    i=1

    M

    xi - m( )Tm j 1

    nj

    E[zij] x

    i

    i=1

    M

  • 7/29/2019 Advanced Machine Learning

    16/36

    Expectation Maximization

    Converges!

    Proof[Neal/Hinton, McLachlan/Krishnan]:

    E/M step does not decrease data likelihood

    Converges at local minimum or saddle point

    But subject to local minima

  • 7/29/2019 Advanced Machine Learning

    17/36

    EM Clustering: Results

    http://www.ece.neu.edu/groups/rpl/kmeans/

  • 7/29/2019 Advanced Machine Learning

    18/36

    Practical EM

    Number of Clusters unknown

    Suffers (badly) from local minima

    Algorithm: Start new cluster center if many points

    unexplained

    Kill cluster center that doesnt contribute

    (Use AIC/BIC criterion for all this, if you want

    to be formal)

    18

  • 7/29/2019 Advanced Machine Learning

    19/36

    Spectral Clustering

    19

  • 7/29/2019 Advanced Machine Learning

    20/36

    Spectral Clustering

    20

  • 7/29/2019 Advanced Machine Learning

    21/36

    The Two Spiral Problem

    21

  • 7/29/2019 Advanced Machine Learning

    22/36

    Spectral Clustering: Overview

    Data Similarities Block-Detection

    * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  • 7/29/2019 Advanced Machine Learning

    23/36

    Eigenvectors and Blocks

    Block matrices have block eigenvectors:

    Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]

    1 1 0 0

    1 1 0 0

    0 0 1 1

    0 0 1 1

    eigensolver

    .71

    .71

    0

    0

    0

    0

    .71

    .71

    1= 2

    2= 2

    3= 0

    4= 0

    1 1 .2 0

    1 1 0 -.2

    .2 0 1 1

    0 -.2 1 1

    eigensolver

    .71

    .69

    .14

    0

    0

    -.14

    .69

    .71

    1= 2.02

    2= 2.02 3= -0.02 4= -0.02

    * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  • 7/29/2019 Advanced Machine Learning

    24/36

    Spectral Space

    Can put items into blocks by eigenvectors:

    Resulting clusters independent of row ordering:

    1 1 .2 0

    1 1 0 -.2

    .2 0 1 1

    0 -.2 1 1

    .71

    .69

    .14

    0

    0

    -.14

    .69

    .71

    e1

    e2

    e1 e

    2

    1 .2 1 0

    .2 1 0 1

    1 0 1 -.2

    0 1 -.2 1

    .71

    .14

    .69

    0

    0

    .69

    -.14

    .71

    e1

    e2

    e1 e

    2

    * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  • 7/29/2019 Advanced Machine Learning

    25/36

    The Spectral Advantage

    The key advantage of spectral clustering is the spectral spacerepresentation:

    * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  • 7/29/2019 Advanced Machine Learning

    26/36

    Measuring Affinity

    Intensity

    Texture

    Distance

    af f x, y exp 12 i2

    I x I y 2

    af f x, y exp 12 d2

    x y 2

    af f x, y exp 12 t2

    c x c y 2* From Marc Pollefeys COMP 256 2003

  • 7/29/2019 Advanced Machine Learning

    27/36

    Scale affects affinity

    * From Marc Pollefeys COMP 256 2003

  • 7/29/2019 Advanced Machine Learning

    28/36

    * From Marc Pollefeys COMP 256 2003

  • 7/29/2019 Advanced Machine Learning

    29/36

    Dimensionality Reduction

    29

  • 7/29/2019 Advanced Machine Learning

    30/36

    The Space of Digits (in 2D)

    30

    M. Brand, MERL

  • 7/29/2019 Advanced Machine Learning

    31/36

    Dimensionality Reduction with PCA

    [email protected]

  • 7/29/2019 Advanced Machine Learning

    32/36

    Linear: Principal Components

    Fit multivariate Gaussian

    Compute eigenvectors of

    Covariance

    Project onto eigenvectors

    with largest eigenvalues

    32

  • 7/29/2019 Advanced Machine Learning

    33/36

    Other examples of unsupervised learning

    33

    Mean face (after alignment)

    Slide credit: Santiago Serrano

  • 7/29/2019 Advanced Machine Learning

    34/36

    Eigenfaces

    34Slide credit: Santiago Serrano

  • 7/29/2019 Advanced Machine Learning

    35/36

    Non-Linear Techniques

    Isomap

    Local Linear Embedding

    35Isomap, Science, M. Balasubmaranian and E. Schwartz

  • 7/29/2019 Advanced Machine Learning

    36/36

    Scape (Drago Anguelov et al)

    36