foundaons)of)large/scale) mul’mediainformaon) managementand)retrieval) lecture)#1...

26
Founda’ons of LargeScale Mul’media Informa’on Management and Retrieval Lecture #1 Introduc’on Edward Y. Chang Founda’ons of LSMM 1 Edward Chang

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  Large-­‐Scale  Mul'media  Informa'on  

Management  and  Retrieval    

Lecture  #1  Introduc'on    

Edward  Y.  Chang  

Founda'ons  of  LSMM   1  Edward  Chang  

Page 2: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Foundations of LSMM 2  Edward  Chang  

Page 3: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Datasets  

•  YouTube  •  Facebook  Photos  •  NeIlix  Movies  •  Apple  Music  •  Archive.org  

– hMp://www.archive.org/details/audio  – 400k  recordings  

Founda'ons  of  LSMM   3  Edward  Chang  

Page 4: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Technical  Challenges  

•  Volume,  both  too  small  and  too  large  •  Variety,  text,  video,  image,  music,  social,  etc.  •  Velocity  

Founda'ons  of  LSMM   4  Edward  Chang  

Page 5: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  LSMM   5  Edward  Chang  

Page 6: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  LSMM   6  Edward  Chang  

Page 7: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Key  Applica'ons  Use  Image  as  an  Example  

•  Content-­‐based  Retrieval  – Given  an  image,  find  perceptually  similar  ones  

•  Annota'on  – Given  an  image,  iden'fy  what  (objects,  events),  where  (loca'on),  when  ('me),  and  who  (people)  

•  Do  the  above  both  quickly  and  accurately  

•  Applicable  to  any  mul'media  data  types  

Founda'ons  of  LSMM   7  Edward  Chang  

Page 8: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Key  Subrou'nes  

•  Feature  Extrac'on    •  Machine  Learning    

–  Small  sample  pool    –  Large  sample  pool    

•  Similarity    •  Mul'modal  Fusion    •  Indexing    •  Scalability    

–  Parallel  algorithms  – Online  algorithms  

Founda'ons  of  LSMM   8  Edward  Chang  

Page 9: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Feature  Extrac'on  

•  Representa've  Methods  – Color,  texture,  shape  – SIFT  – Bio-­‐mo'vated  HMAX  – Deep  Learning  

•  Model-­‐Based  vs.  Data-­‐Driven  

Founda'ons  of  LSMM   9  Edward  Chang  

Page 10: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Classifica'on  •  Query-­‐Concept  Learning  

– Online  –  Concept-­‐dependent  learning  –  Imbalanced  data  –  Large  D  – D  >>  N  – N-­‐  >>  N+  

•  Annota'on  – Offline  –  Feature-­‐to-­‐seman'cs  mapping  –  Large  N  

Founda'ons  of  LSMM   10  Edward  Chang  

Page 11: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Similarity  

•  Tradi'onal  Distance  Func'on  •  Dynamic  Par'al  Func'on  •  Learning  Similarity  from  Data  

Founda'ons  of  LSMM   11  Edward  Chang  

Page 12: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Mul'modal  Fusion  

•  Feature  Fusion  (Cross-­‐Modality)  – Weighted  sum  – PCA  – Dimensionality  curse  

•  Personaliza'on  (Cross-­‐Domain)  – Latent  behavior  

•  Context  +  Content  – Loca'on,  'me,  et  al.  

Founda'ons  of  LSMM   12  Edward  Chang  

Page 13: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Storage  and  Indexing  

•  Dimensionality  Curse  •  High-­‐Dimensional  Indexing  

Founda'ons  of  LSMM   13  Edward  Chang  

Page 14: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Large-­‐Scale  Learning  3V  Problem  

•  Huge  Volume  of  Data  •  High  Velocity  •  High  Variety  

Founda'ons  of  LSMM   14  Edward  Chang  

Page 15: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Key  Parallel  Algorithmic  Work •  Parallel  Spectral  Clustering,  W.-­‐Y.  Chen,  Y.  Song,  H.  Bai,  Chih-­‐Jen  

Lin,  and  E.  Y.  Chang,  IEEE  Transac'ons  on  PaMern  Analysis  and  Machine  Intelligence  (PAMI),  2010.  

•  PLDA+:  Parallel  Latent  Dirichlet  Alloca'on  with  Data  Placement  and  Pipeline  Processing,  ACM  Transac'ons  on  Intelligent  Systems  and  Technology,  2011.  

•  Parallel  Support  Vector  Machines,  E.  Y.  Chang,  et  al.,  NIPS  2007.  •  PFP:  Parallel  FP-­‐Growth  for  Query  Recommenda'on,    

H.  Li,  Y.  Wang,  D.  Zhang,  M.  Zhang,  and  E.  Y.  Chang,  ACM  Recommenda'on  Systems,  Lausanne,  October  2008    

 

Foundations of LSMM 15  Edward  Chang  

Page 16: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  LSMM   16  Edward  Chang  

Page 17: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  LSMM   17  Edward  Chang  

Page 18: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  LSMM   18  Edward  Chang  

Page 19: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Founda'ons  of  LSMM   19  Edward  Chang  

Page 20: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Compara've  Study  on  Small  Dataset  

Page 21: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Large  Dataset  10,000  instances  per  class  

Page 22: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Compara've  Study  on  Large  Dataset  

Page 23: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Data  Center  (Cloud)

Founda'ons  of  LSMM 23  Edward  Chang  

Page 24: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Sample  Hierarchy •  Server  

– 16GB  DRAM;  160  GB  SSD;  5  x  1TB  disk  

•  Rack  – 40  servers  – 48  port  Gigabit  Ethernet  switch  

•  Warehouse  – 10,000  servers  (250  racks)  – 2K  port  Gigabit  Ethernet  switch

Founda'ons  of  LSMM   24  Edward  Chang  

Page 25: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Syllabus    

•  Feature  Extrac'on  (Lecture  #2)  •  Machine  Learning  (Lecture  #3)  

–  Small  sample  pool    –  Large  sample  pool    

•  Similarity  (Lecture  #4)  •  Mul'modal  Fusion  (Lecture  #5)  •  Indexing  (Lecture  #6)  •  Scalability  (Lectures  #7  &  #8)  

–  Parallel  algorithms  – Online  algorithms  

Founda'ons  of  LSMM   25  Edward  Chang  

Page 26: Foundaons)of)Large/Scale) Mul’mediaInformaon) Managementand)Retrieval) Lecture)#1 ...infolab.stanford.edu/.../Lecture1-Intro.pdf · 2012-06-06 · Foundaons)of)Large/Scale) Mul’mediaInformaon)

Reading  

•  Founda'ons  of  Large-­‐Scale  Mul'media  Informa'on  Management  and  Retrieval,  E.  Y.  Chang,  Springer,  2011  – Chapter  #1  

Edward  Chang   Founda'ons  of  LSMM   26