ecs 289g: visual recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/introduction.pdf ·...

66
ECS 289G: Visual Recognition Fall 2015 Yong Jae Lee Department of Computer Science

Upload: others

Post on 18-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • ECS 289G: Visual RecognitionFall 2015

    Yong Jae LeeDepartment of Computer Science

  • Plan for today

    • Topic overview• Introductions• Course requirements and administrivia• Syllabus tour

  • Computer Vision

    • Let computers see!• Automatic understanding of visual data (i.e.,

    images and video)

    ─ Computing properties of the 3D world from visual data (measurement)

    ─ Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities (perception and interpretation)

    Kristen Grauman

  • What does recognition involve?

    Fei-Fei Li

  • Detection: are there people?

  • Activity: What are they doing?

  • Object categorization

    mountain

    buildingtree

    banner

    vendorpeople

    street lamp

  • Instance recognition

    Potala Palace

    A particular sign

  • Scene categorization

    • outdoor• city• …

  • Attribute recognition

    flat

    graymade of fabric

    crowded

  • Why recognition?

    • Recognition a fundamental part of perception• e.g., robots, autonomous agents

    • Organize and give access to visual content• Connect to information • Detect trends and themes

    • Why now?

    Kristen Grauman

  • Posing visual queries

    Kooaba, Bay & Quack et al.

    Yeh et al., MIT

    Belhumeur et al.

    Kristen Grauman

  • Interactive systems

    Shotton et al.

  • Autonomous agents

    Google self-driving car

    Mars rover

  • Exploring community photo collections

    Snavely et al.

    Simon & SeitzKristen Grauman

  • Discovering visual patterns

    Actions

    Categories

    Characteristic elements

    Wang et al.

    Doersch et al.

    Lee & Grauman

  • What are the challenges?

  • What we see What a computer sees

  • Challenges: robustness

    Illumination Object pose

    ViewpointIntra-class appearance

    Occlusions

    Clutter

    Kristen Grauman

  • Context cues

    Kristen Grauman

    Challenges: context and human experience

  • Challenges: context and human experience

    Context cues Function Dynamics

    Kristen Grauman

  • Challenges: context and human experience

    Fei Fei Li, Rob Fergus, Antonio Torralba

  • Challenges: context and human experience

    Fei Fei Li, Rob Fergus, Antonio Torralba

  • Challenges: context and human experience

    Fei Fei Li, Rob Fergus, Antonio Torralba

  • Challenges: scale, efficiency

    • Half of the cerebral cortex in primates is devoted to processing visual information

  • 6 billion images 70 billion images 1 billion images served daily

    10 billion images

    100 hours uploaded per minute

    Almost 90% of web traffic is visual!

    :From

    Challenges: scale, efficiency

  • Biederman 1987

    Challenges: scale, efficiency

    Fei Fei Li, Rob Fergus, Antonio Torralba

  • Challenges: learning with minimal supervision

    MoreLess

    Kristen Grauman

  • Slide from Pietro Perona, 2004 Object Recognition workshop

  • Slide from Pietro Perona, 2004 Object Recognition workshop

  • Inputs in 1963…

    L. G. Roberts, Machine Perception of Three Dimensional Solids,Ph.D. thesis, MIT Department of Electrical Engineering, 1963.

    Kristen Grauman

    http://www.packet.cc/files/mach-per-3D-solids.html

  • Inputs in 1990s…

    Figure credit: Kristen Grauman

  • Inputs in 2000s…

    Figure credit: Kristen Grauman

  • Personal photo albums

    Surveillance and security

    Movies, news, sports

    Medical and scientific images

    … and inputs today

    Svetlana Lazebnik

    Understand and organize and index all this data!!

    http://www.picsearch.com/http://www.picsearch.com/

  • 2012

  • ImageNet Challenge 2012• “AlexNet”: Similar framework to LeCun’98 but:

    • Bigger model (7 hidden layers, 650,000 units, 60,000,000 params)• More data (106 vs. 103 images)• GPU implementation (50x speedup over CPU)

    • Trained on two GPUs for a week• Better regularization for training (DropOut)

    A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

    http://www.cs.toronto.edu/%7Efritz/absps/imagenet.pdf

  • Using CNN for Image Classification

    “car”

    AlexNet

    Fixed input size: 224x224x3

  • ImageNet Challenge 2012

    • Krizhevsky et al. -- 16.4% error (top-5)• Next best (non-convnet) – 26.2% error

    0

    5

    10

    15

    20

    25

    30

    35

    SuperVision ISI Oxford INRIA Amsterdam

    Top-

    5 er

    ror r

    ate

    %

  • ImageNet Challenge 2012-2014

    Team Year Place Error (top-5) External data

    SuperVision – Toronto(7 layers)

    2012 - 16.4% no

    SuperVision 2012 1st 15.3% ImageNet 22k

    Clarifai – NYU (7 layers) 2013 - 11.7% no

    Clarifai 2013 1st 11.2% ImageNet 22k

    VGG – Oxford (16 layers) 2014 2nd 7.32% no

    GoogLeNet (19 layers) 2014 1st 6.67% no

    Human expert* 5.1%

    Team Method Error (top-5)

    DeepImage - Baidu Data augmentation + multi GPU 5.33%

    PReLU-nets - MSRA Parametric ReLU + smart initialization 4.94%

    BN-Inception ensemble - Google

    Reducing internal covariate shift 4.82%

    Best non-convnet in 2012: 26.2%

    http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

  • Introductions

  • Course Information

    • Website: https://sites.google.com/a/ucdavis.edu/ecs-289-visual-recognition/

    • Office hours: by appointment (email)

    • Email: start subject with “[ECS289G]”

    https://sites.google.com/a/ucdavis.edu/ecs-289-visual-recognition/

  • This course

    • Survey state-of-the-art research in computer vision

    • High-level vision and learning problems with focus on recent deep learning techniques

  • Goals

    • Understand, analyze, and discuss state-of-the-art techniques

    • Identify interesting open questions and future directions

  • Prerequisites

    • Courses in computer vision and machine learning

  • Requirements

    • Class Participation [25%]

    • Paper Reviews [25%]

    • Paper Presentations (1-2 times) [25%]

    • Final Project [25%]

  • Class participation

    • Actively participate in discussions• Read all assigned papers before each class

  • Paper reviews• Detailed review of one paper before each class • One page in length

    ─ A summary of the paper (2-3 sentences)─ Main contributions─ Strengths and weaknesses─ Experiments convincing?─ Extensions?─ Additional comments, including unclear points, open

    research questions, and applications

    • Email by 11:59 pm the day before each class• Email subject “[ECS 289G] paper review”

  • Paper presentations• 1 or 2 times during the quarter• ~20 minutes long, well-organized, and polished:

    ─ Clear statement of the problem─ Motivate why the problem is interesting, important, and/or difficult─ Describe key contributions and technical ideas─ Experimental setup and results─ Strengths and weaknesses─ Interesting open research questions, possible extensions, and

    applications

    • Slides should be clear and mostly visual (figures, animations, videos)

    • Look at background papers as necessary

  • Paper presentations• Search for relevant material, e.g., from the authors'

    webpage, project pages, etc. • Each slide that is not your own must be clearly

    cited• Meet with me one day before your

    presentation with prepared slides; your responsibility to email me to schedule a time

    • Email slides on day of presentation• Email subject "[ECS 289G] presentation slides"

    • Skip paper review the day you are presenting

  • Final project• One of:

    ─ Design and evaluation of a novel approach─ An extension of an approach studied in class─ In-depth analysis of an existing technique─ In-depth comparison of two related existing techniques

    • Work in pairs• Talk to me if you need ideas

  • Final project• Final Project Proposal (5%) due TBD

    ─ 1 page

    • Final Project Presentation (10%) due TBD─ 15 minutes

    • Final Project Paper (10%) due TBD─ 6-8 pages

  • Image classification

    • Convolutional neural networks AlexNet, VGGNet

  • Object detection

    • Regions with Convolutional Neural Networks

  • CNN Visualization and Analysis

    • What is encoded by a CNN?

  • CNN Visualization and Analysis

    • What is encoded by a CNN?

  • Fooling CNNs

    Intriguing properties of neural networks [Szegedy ICLR 2014]

    http://arxiv.org/pdf/1312.6199v4.pdf

  • Semantic Segmentation

  • Human pose

    • Finding people and their poses

  • Supervision using spatial context, motion

  • Language and Images

    • Generating image descriptions

  • Neural Network Art

  • Coming up• Carefully read class webpage• Sign-up for papers you want to present

    - Need 2 volunteers to present on 10/1

    • Next class: overview of my research

  • ECS 289G: Visual Recognition�Fall 2015Plan for todayComputer VisionSlide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Why recognition?Posing visual queriesInteractive systemsAutonomous agentsExploring community photo collectionsDiscovering visual patternsWhat are the challenges?Slide Number 18Challenges: robustnessSlide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Challenges: scale, efficiencyChallenges: scale, efficiencySlide Number 27Challenges: learning with minimal supervisionSlide Number 29Slide Number 30Inputs in 1963…Inputs in 1990s…Inputs in 2000s…Slide Number 34Slide Number 35ImageNet Challenge 2012Using CNN for Image ClassificationImageNet Challenge 2012Slide Number 39Slide Number 40IntroductionsCourse InformationThis courseGoalsPrerequisitesRequirementsClass participationPaper reviewsPaper presentationsPaper presentationsFinal projectFinal projectSlide Number 53Image classificationObject detectionCNN Visualization and AnalysisCNN Visualization and AnalysisFooling CNNsSemantic SegmentationHuman poseSupervision using spatial context, motionLanguage and ImagesNeural Network ArtSlide Number 64Coming upSlide Number 66