thank you!christos/talks/10-kdd-award/faloutsos10ia.pdf · • ricardo baeza-yates (yahoo!...

79
CMU SCS Thank you! C. Faloutsos CMU

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Thank you!

C. Faloutsos CMU

Page 2: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Large Graph Mining

C. Faloutsos CMU

Page 3: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Large Graph Mining Data Mining for fun (and profit)

C. Faloutsos CMU

Page 4: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 4

Outline

•  Credit where credit is due •  Technical part – Data mining

– Can it be automated? – Research challenges

•  Non-technical part: `Listen’ – To the data – To non-experts

Page 5: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Nominator

•  Jian Pei

KDD'10 C. Faloutsos 5

Page 6: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers

•  Charu C. Aggarwal (IBM Research) •  Ricardo Baeza-Yates (Yahoo! Research) •  Albert-Laszlo Barabasi (Northeastern University) •  Denilson Barbosa (University of Alberta) •  Yixin Chen (Washington University at St. Louis)

KDD'10 C. Faloutsos 6

Page 7: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  William Cohen (Carnegie Mellon University) •  Diane J. Cook (Washington State University) •  Gautam Das (University of Texas at Arlington) •  Inderjit S. Dhillon (University of Texas at Austin) •  Chris H. Q. Ding (University of Texas at

Arlington)

KDD'10 C. Faloutsos 7

Page 8: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Petros Drineas (Rensselaer Polytechnic Institute) •  Tina Eliassi-Rad (Lawrence Livermore National

Laboratory) •  Greg Ganger (Carnegie Mellon University) •  Minos Garofalakis (Technical University of Crete) •  James Garrett (Carnegie Mellon University)

KDD'10 C. Faloutsos 8

Page 9: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Dimitrios Gunopulos (University of Athens) •  Xiaofei He (Zhejiang University) •  Panagiotis G. Ipeirotis (New York University) •  Eamonn Keogh (UCR) •  Hiroyuki Kitagawa (University of Tsukuba) •  Tamara Kolda (Sandia Nat. Labs)

KDD'10 C. Faloutsos 9

Page 10: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Flip Korn (AT&T Research) •  Nick Koudas (University of Toronto) •  Hans-Peter Kriegel •  Ravi Kumar (Yahoo! Research) •  Laks Lakshmanan (UBC) •  Jure Leskovec (Stanford University)

KDD'10 C. Faloutsos 10

Page 11: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Nikos Mamoulis (Hong Kong University) •  Heikki Manilla (Aalto University, •  Dharmendra S. Modha (IBM Research) •  Mario Nascimento (University of Alberta) •  Jennifer Neville (Purdue University) •  Beng Chin Ooi (National University of Singapore)

KDD'10 C. Faloutsos 11

Page 12: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Dimitris Papadias (Hong Kong University of

Science and Technology) •  Spiros Papadimitriou (IBM Research) •  Jian Pei (Simon Fraser University) •  Foster Provost (New York University) •  Oliver Schulte (Simon Fraser University) •  Dennis Shasha (New York University) •  Srinivasan Parthasarathy (OSU)

KDD'10 C. Faloutsos 12

Page 13: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Jimeng Sun (IBM Research) •  Dacheng Tao (Nanyang University of Technology) •  Yufei Tao (The Chinese University of Hong Kong) •  Evimaria Terzi (Boston University) •  Alex Thomo (University of Victoria) •  Andrew Tomkins (Google Research)

KDD'10 C. Faloutsos 13

Page 14: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d •  Caetano Traina (University of Sao Paulo) •  Vassilis Tsotras (University of California, Riverside) •  Alex Tuzhilin (New York University) •  Haixun Wang (Microsoft Research)

KDD'10 C. Faloutsos 14

Page 15: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Endorsers, cont’d

•  Wei Wang (University of North Carolina at Chapel Hill)

•  Philip S. Yu (University of Illinois, Chicago) •  Zhongfei Zhang (Binghamton University, State

University of New York)

KDD'10 C. Faloutsos 15

Page 16: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD committee

•  Ramasamy Uthurusamy, Chair •  Robert Grossman (University of Illinois at

Chicago) •  Jiawei Han (University of Illinois at

Urbana-Champaign) •  Tom Mitchell (Carnegie Mellon University) •  Gregory Piatetsky-Shapiro (KDnuggets)

KDD'10 C. Faloutsos 16

Page 17: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD committee cnt’d

•  Raghu Ramakrishnan (Yahoo! Research) •  Sunita Sarawagi (Indian Institute of

Technology, Bombay) •  Padhraic Smyth (University of California at

Irvine) •  Ramakrishnan Srikant (Google Research)

KDD'10 C. Faloutsos 17

Page 18: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD committee cnt’d

•  Xindong Wu (University of Vermont) •  Mohammed J. Zaki (Rensselaer Polytechnic

Institute)

KDD'10 C. Faloutsos 18

Page 19: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Family •  Parents Nikos & Sophia •  Siblings Michalis*, Petros*, Maria

•  Wife Christina#

(*) : and co-authors (#) : and research impact evaluator (‘grandpa’ test - see later…)

KDD'10 C. Faloutsos 19

Page 20: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 20

Academic ‘parents’

•  Christodoulakis, Stavros (T.U.C.)

•  Sevcik, Ken (U of T)

•  Roussopoulos, Nick (UMD)

Page 21: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 21

Academic ‘children’ •  King-Ip (David) Lin •  Ibrahim Kamel •  Flip Korn •  Byoung-Kee Yi •  Leejay Wu •  Deepayan Chakrabarti

Page 22: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 22

Academic ‘children’

•  Jia-Yu (Tim) Pan •  Spiros Papadimitriou •  Jimeng Sun •  Jure Leskovec •  Hanghang Tong

Page 23: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 23

Academic ‘children’

•  Mary McGlohon •  Fan Guo •  Lei Li •  Leman Akoglu •  Dueng Horng (Polo) Chau •  Aditya Prakash •  U Kang

Page 24: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 24

CMU colleagues

•  Tom Mitchell •  Garth Gibson •  Greg Ganger •  M. (Satya) Satyanarayanan •  Howard Wactlar •  Jeannette Wing •  + +

Page 25: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 25

Co-authors

•  [dblp 7/2010:] All 300 of you

•  Agma J. M. Traina (22) •  Caetano Traina Jr. (20) •  …

Page 26: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Funding agencies

•  NSF (Maria Zemankova, Frank Olken, ++) •  DARPA, LLNL, PITA •  IBM, MS, HP, INTEL, Y!, Google,

Symantec, Sony, Fujitsu, …

KDD'10 C. Faloutsos 26

Page 27: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 27

Outline

•  Credit where credit is due •  Technical part – Data mining

– Can it be automated? – Research challenges

•  Non-technical part: `Listen’ – To the data – To non-experts

Page 28: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 28

Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. Knowl. Discov. 15(1): 3-20 (2007)

Page 29: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 29

Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. Knowl. Discov. 15(1): 3-20 (2007)

Page 30: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 30

But: how can compression •  do forecasting? •  spot outliers?

•  do classification?

Page 31: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 31

OK – then, isn’t compression a solved problem (gzip, LZ)?

Page 32: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

… compression is undecidable!

Theorem*: for an arbitrary string x, computing its Kolmogorov complexity K(x) is undecidable

KDD'10 C. Faloutsos 32 (*) E.g., [T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons,1991, section 7.7]

A.N. Kolmogorov

EVEN WORSE than NP-hard!

Page 33: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

… compression is undecidable!

…which means there will always be better data mining tools/models/patterns to be discovered

-> job security -> job satisfaction

KDD'10 C. Faloutsos 33

Page 34: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 34 Body weight

Response to new drug

Page 35: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 35 income

$ spent

Page 36: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 36 income

$ spent

Page 37: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 37 income

$ spent

Page 38: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 38 income

$ spent

3/4

Page 39: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 39

Metabolic rate

3/4

mass

Let’s see some examples of models

http://universe-review.ca /R10-35-metabolic.htm

Page 40: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 40

Metabolic rate

3/4

mass

Kleiberg’s law

http://universe-review.ca /R10-35-metabolic.htm

Page 41: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 41

Outline

•  Credit where credit is due •  Technical part – Data mining

– Can it be automated? NO! •  Always room for better models

– Research challenges

•  Non-technical part: `Listen’ – To the data – To non-experts

Page 42: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

•  Eg.: clustering – k-means (or our favorite clustering algo)

•  How many clusters are in the Sierpinski triangle?

KDD'10 C. Faloutsos 42

Page 43: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 43

Page 44: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 44

K=3 clusters?

Page 45: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 45

K=3 clusters? K=9 clusters?

Page 46: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 46

Piece-wise flat

Mixture of (Gaussian)

clusters

Page 47: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 47

Piece-wise flat

Mixture of (Gaussian)

clusters

¾ Power law

??

Page 48: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 48

Piece-wise flat

Mixture of (Gaussian)

clusters

¾ Power law

ONE, but Self-similar

‘cluster’

Page 49: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 49

ONE, but Self-similar

‘cluster’

•  Barnsley’s method of IFS (iterated function systems) can easily generate it [Barnsley+Sloan, BYTE, 1988]

~100 lines of C code: www.cs.cmu.edu~/christos/www/SRC/ifs.tar

Page 50: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 50

•  But, does self-similarity appear in real life?

Page 51: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 51

Page 52: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 52

Page 53: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 53

Page 54: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 54

Page 55: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 55

•  the red is true •  origin: Norway • but most other coastlines are ‘self-similar’, too!

Page 56: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

How can we find better models?

•  Obviously, an art (‘undecidable’!) •  Helps if we

– Listen to domain experts and – Listen to the data (next)

KDD'10 C. Faloutsos 56

Page 57: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 57

Outline

•  Credit where credit is due •  Technical part – Data mining

– Can it be automated? NO! – Research challenges

•  Listen to the data (the more, the better!)

•  Non-technical part: `Listen’ – To the data – To non-experts

Page 58: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 58

Scalability

•  Google: > 450,000 processors in clusters of ~2000 processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003]

•  Yahoo: ~5Pb of data [Fayyad’07] •  ‘M45’: 4K proc’s, 3Tb RAM, 1.5 Pb disk

Page 59: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Promising research direction: scalability

•  challenges – Vast amounts of data; storing; cooling (!); …

•  … and opportunities: – DATA: Easier to collect (clickstreams, sensors

etc) – S/W: Hadoop, hbase, pig, … : open source – H/W: 1Tb disk: ~ US$ 100

KDD'10 C. Faloutsos 59

Page 60: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Promising research direction

•  The more data, the more subtle patterns we may discover

•  Examples of subtle patterns:

KDD'10 C. Faloutsos 60

Page 61: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 61

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Duration (log scale)

PDF: fraction of customers

(log scale)

Page 62: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 62

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Duration (log scale)

PDF: fraction of customers

(log scale)

(mixture of) Gaussians

Page 63: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 63

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Page 64: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 64

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Zipf (Pareto,

Power-law)

Page 65: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 65

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Page 66: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 66

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

lognormal

Page 67: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 67

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Page 68: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 68

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

dPln (=doubly

Pareto Lognormal)

Page 69: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

So, dPln is the answer?

KDD'10 C. Faloutsos 69

Page 70: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

So, dPln is the answer?

KDD'10 C. Faloutsos 70

Yes, for the moment…

Page 71: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

So, dPln is the answer?

KDD'10 C. Faloutsos 71

With more data, who knows?!

Page 72: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

KDD'10 C. Faloutsos 72

Outline

•  Credit where credit is due •  Technical part – Data mining

– Can it be automated? NO! – Research challenges

•  Listen to the data (the more, the better!)

•  Non-technical part: ‘Listen’ – To the data – To non-experts

Page 73: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Listen to non-experts

•  Explain ‘why’, to a non-expert (‘grandpa’) •  (and, even harder, explain ‘how’ – e.g.:

– Frobenious Perron for irreducible MC

KDD'10 C. Faloutsos 73

Page 74: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Listen to non-experts

•  Explain ‘why’, to a non-expert (‘grandpa’) •  (and, even harder, explain ‘how’ – e.g.:

– Frobenious Perron for irreducible MC -> pageRank -> random surfer

KDD'10 C. Faloutsos 74

Page 75: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Summary •  Data mining = compression = undecidable =

job security •  Hence: always room for better models/

patterns

– Listen to the data (Gb, Tb and Pb of them!) – Listen to domain experts (e.g., ¾ Kleiberg’s

law) •  Listen to non-experts (‘explain to grandpa’) KDD'10 C. Faloutsos 75

Page 76: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Compression, fun, recursion

•  The shortest, recursive joke: •  There are 3 types of data miners

KDD'10 C. Faloutsos 76

Page 77: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Compression, fun, recursion

•  The shortest, recursive joke: •  There are 3 types of data miners

– Those who can count

KDD'10 C. Faloutsos 77

Page 78: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Compression, fun, recursion

•  The shortest, recursive joke: •  There are 3 types of data miners

– Those who can count – And those who can not

KDD'10 C. Faloutsos 78

Page 79: Thank you!christos/TALKS/10-KDD-award/Faloutsos10IA.pdf · • Ricardo Baeza-Yates (Yahoo! Research) • Albert-Laszlo Barabasi (Northeastern University) • Denilson Barbosa (University

CMU SCS

Thank you! For the honor, and for making this wonderful research community

KDD'10 C. Faloutsos 79