big data science - uzhuser.math.uzh.ch/hothorn/talks/big_data_science_uzh_2014.pdfbig data science...

36
EBPI Epidemiology, Biostatistics and Prevention Institute Big Data Science Torsten Hothorn 2014-03-31

Upload: others

Post on 30-May-2020

25 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

EBPI Epidemiology, Biostatistics and Prevention Institute

Big Data ScienceTorsten Hothorn

2014-03-31

Page 2: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

The end of theory

The End of Theory: The Data Deluge Makes the ScientificMethod Obsolete (Chris Anderson, Wired Magazine 16.07)

Petabytes allow us to say: “Correlation is enough.”

University of Zurich, EBPI 2014-03-31 Big Data Science Page 2

Page 3: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Big data science

– Big data

– Data science

– Predictive modelling

– Business intelligence

– Machine learning

– (parts of) Artificial intelligence; neural networks

– (parts of) Pattern recognition

– Knowledge discovery in data (KDD)

– ...

University of Zurich, EBPI 2014-03-31 Big Data Science Page 3

Page 4: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Big data science

– Big data revolution

– Data science

– Predictive modelling

– Business intelligence

– Machine learning

– (parts of) Artificial intelligence; neural networks

– (parts of) Pattern recognition

– Knowledge discovery in data (KDD)

– ...

University of Zurich, EBPI 2014-03-31 Big Data Science Page 4

Page 5: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

“Big data” in journal titles

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Year published

Num

ber

of p

aper

s

050

100

150

200

250

300

(Source: Web of Science)

University of Zurich, EBPI 2014-03-31 Big Data Science Page 5

Page 6: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Big data science

– Big data revolution

– Data science

– Predictive modelling

– Business intelligence

– Machine learning

– (parts of) Artificial intelligence; neural networks

– (parts of) Pattern recognition

– Knowledge discovery in data (KDD)

– ...

University of Zurich, EBPI 2014-03-31 Big Data Science Page 6

Page 7: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

But what about ...

Statistics?

Interestingly, Andersons article starts with the famous quote ofGeorge Box

All models are wrong, but some are useful.

Anderson uses 8 times the term “statistic*” in his 1336 wordslong article.

So, what is the connection between big data etc. and statisticsnow and what is the future of statistics?

Whoever wishes to foresee the future must consult thepast. (Machiavelli)

University of Zurich, EBPI 2014-03-31 Big Data Science Page 7

Page 8: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Statistics

Statistics is the science (the art?) of collecting, analysing,interpreting and communicating data.

The word “statistics” refers to “state”

– statisticum (lat) regarding the state

– statista (ital) statesman, politician

So, originally (and, to a large extend, still today), statistics isconcerned with data describing the population, economy,administration etc. of a state. This is where the “bean counter”connotation comes from.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 8

Page 9: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Early Zurich statistics

Johann Heinrich Waser(statistician in Zurich,1742-1780) published a bookwith the title “Swiss Blood andFrench Money” containingdata about the Zurich warfonds with a publisher inGottingen (1780).

He was accused of treason, sentenced to death and executed inZurich in 1780.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 9

Page 10: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Early Zurich statistics

Johann Heinrich Waser(statistician in Zurich,1742-1780) published a bookwith the title “Swiss Blood andFrench Money” containingdata about the Zurich warfonds with a publisher inGottingen (1780).

He was accused of treason, sentenced to death and executed inZurich in 1780 (the year the NZZ was founded).

University of Zurich, EBPI 2014-03-31 Big Data Science Page 10

Page 11: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Rings a bell?

University of Zurich, EBPI 2014-03-31 Big Data Science Page 11

Page 12: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Statistics in academia

Scientists (working empirically) have

– a hypothesis/theory–and thus a (probabilistic) model

– an experiment–and thus data

Statistical methods

– use the data to estimate free parameters in the model

– assess their uncertainty

– and provide means to falsify a theory and/or to formulate abetter theory

Estimation is performed by either optimisation (frequentists,this talk) or integration (Bayesians, not really today).

University of Zurich, EBPI 2014-03-31 Big Data Science Page 12

Page 13: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Models for conditional distributions

As an example, suppose a theory states that one or moreexplanatory variables X affect the distribution of a (so-called“response”) variable Y .We are interested if and how the conditional distribution of Ygiven X = x

(Y |X = x) ∼ PY |X=x

depends on x through a function f (x):

ξ(Y |X = x) = f (x)︸ ︷︷ ︸statistical model

:= arg minf

EY ,Xρ(Y , f (X ))︸ ︷︷ ︸minimisation problem

University of Zurich, EBPI 2014-03-31 Big Data Science Page 13

Page 14: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Statistical decision theory

Abraham Wald (1902-1950)established statistical decisiontheory; in a nutshell, astatistical model is defined bythe minimal expected lossEY ,X (ρ(Y , f (X ))).

Statistical decision theory is the common foundation ofstatistics, machine learning, neural networks, patternrecognition, KDD, etc. But the language is different incomput[er,ational] science and statistics.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 14

Page 15: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Same thing, different name

Machine learning Statisticssupervised learning regression

ξ(Y |X = x) = f (x)

target variable response variable

Y

attribute, feature explanatory variable, covariate

X

hypothesis model, regression function

f

University of Zurich, EBPI 2014-03-31 Big Data Science Page 15

Page 16: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Same thing, different name

instances, examples samples, observations, realisations

(Yi ,Xi ) ∼ P(Y ,X ), i = 1, . . . , n

learning estimation, fitting

f = arg minf

n∑i=1

ρ(Yi , f (Xi )) + λpen(f )

classification prediction

f (x)

generalisation error risk

EY ,Xρ(Y , f (X ))

University of Zurich, EBPI 2014-03-31 Big Data Science Page 16

Page 17: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

So, what’s the difference?

ρ (and thus ξ, the optimisation problem and optimiser) is oftendifferent causing much confusion. For binary Y , the loss ρ ishinge loss, exponential loss log-density binomial distribution

−3 −2 −1 0 1 2 3

01

23

45

6

monotone

(2y − 1)f

Loss

ρ0−1

ρSVM

ρexp

ρlog−lik

−3 −2 −1 0 1 2 3

01

23

45

6

non−monotone

(2y − 1)f

Loss

ρ0−1

ρL2

ρL1

University of Zurich, EBPI 2014-03-31 Big Data Science Page 17

Page 18: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

So, what’s the difference?

Traditionally, machine learners are more interested in black boxclassification, i.e. f (x) or even only Y .Statisticians focus on interpretation, i.e., look at

f (x) = x>β (linear model)

or

f (x) =J∑

j=1

fj(x) (additive model)

Have a strong background in optimisation.Have a strong background in modelling.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 18

Page 19: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Some history

The median regression model

ρ(Y , f (X )) = |Y − f (X )| ⇒ f (x) = Median(Y |X = x)

was suggested by Boscovic and Laplace in the late 18th century.

The optimisation problem f = arg minf∑n

i=1 |Yi − f (Xi )| is(was?) hard to solve.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 19

Page 20: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Some history

The mean regression model

ρ(Y , f (X )) = |Y − f (X )|2 ⇒ f (x) = E(Y |X = x).

was suggested only a little later by Legendre and Gauß.

Why? Because f = arg minf∑n

i=1 |Yi − f (Xi )|2 was relativelyeasy to compute with f (x) = x>β.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 20

Page 21: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Some history

Carl-Friedrich Gauß(1777-1855), thegreat-grandfather of statistics,replaced a not-so-nice lossfunction with a nice one andsuggested a fast optimisationalgorithm (Gaussianelimination).So he was actually a machinelearner!

We see this pattern over and over again.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 21

Page 22: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Same model, different optimiser

Machine learning Statistics

artificially neural networks additive/nonlinear logistic regressionsupport vector machines generalised mixed/additive modelsboosting generalised additive modelsdecision trees regression treesrandom forests random forests

random forests?

University of Zurich, EBPI 2014-03-31 Big Data Science Page 22

Page 23: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Working together

(cited 5189 times)

Talking to each other really helps.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 23

Page 24: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

What’s different in big data?

Doug Laney (2001), a META Group/Gartner (!) employee:

Big data is high volume, high velocity, and/or highvariety information assets that require new forms ofprocessing to enable enhanced decision making,insight discovery and process optimisation.

Wikipedia has

Big data uses inductive statistics and concepts fromnonlinear system identification to infer laws(regressions, nonlinear relationships, and causaleffects) from large data sets to reveal relationships,dependencies, and to perform predictions of outcomesand behaviours.

In other words: Statistics for (large) data sets from multipleunplanned retrospective observational studies / sources.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 24

Page 25: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Not much!

One of the most shattering examples of re-selling existingstatistical technology under a new name is A/B testing.

(Source: smashingmagazine.com)

University of Zurich, EBPI 2014-03-31 Big Data Science Page 25

Page 26: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Not much!

This is a permutation test, most of the time applied incorrectly.And with big data, the test will always be significant anyways.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 26

Page 27: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

What’s the technical challenge?

Problem:RAM too small for data; can’t load all the data to computesomething.

This has been the rule with all data over the last 300 years, notthe exception.

Solution:(Finite) sampling and assessment of variability: go back toSTA101.

Good news for statisticians: you can bootstrap from the trueinstead of the empirical distribution.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 27

Page 28: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Bias & missings are much bigger problems

Variables

Obs

erva

tions

Y X1 X2

n

i

321

observed

University of Zurich, EBPI 2014-03-31 Big Data Science Page 28

Page 29: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Opportunities

– We may have enough data to model the whole conditionaldistribution PY |X=x and not just some real-valuedfunctional ξ(PY |X=x) like the mean, for example byconditional transformation models (Hothorn, Kneib,Buhlmann, 2014).

– This allows probabilistic forecasts (Gneiting & Katzfuss,2014).

– Funny: In biometry, Kaplan-Meier estimates and (to acertain extend) the Cox model for survival times alwayslooked at the whole conditional distribution!

University of Zurich, EBPI 2014-03-31 Big Data Science Page 29

Page 30: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Opportunities

– Big data instead of meta-analysis: The PRO-ACT database has time-course information of more than 8500 ALSpatients from multiple clinical trials. Use this pooled datato model ALS disease progression (Hothorn & Jung, 2014)instead of somehow merging multiple analyses.

– Merge different data sources (police records, roadinformation systems, weather records, satellite images,browsing surveys) to model spatial and temporaldistribution of wildlife-vehicle collisions (Hothorn et al,2012).

University of Zurich, EBPI 2014-03-31 Big Data Science Page 30

Page 31: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

Can we learn something?

– Statisticians are rather hesitant to new models andtechniques because partially educated and employed forpolicing science (sample size? power? analysis plan?significance?).

– In the 1990ies, statisticians lost track of microbiology; nowthere is bioinformatics.

– However, it seems statisticians are still needed. Think (lackof) reproducibility (Lancet Jan 11 series “Increasing value,reducing waste”; Hothorn & Leisch, 2011).

– Is p < .05 necessary and sufficient for reproducibility?

– Statistics needs better marketing. The trademark of myown field, biometry, was hijacked by people scanningfingerprints and irises.

University of Zurich, EBPI 2014-03-31 Big Data Science Page 31

Page 32: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

And data science?

(from R-blogger Drew Conway)

University of Zurich, EBPI 2014-03-31 Big Data Science Page 32

Page 33: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

And data science?

Nate Silver @ twitter after his JSM 2013 talk

Data scientist is just a sexed up word for statistician.

Just do good work and call yourself whatever youwant.

Just make sure your grant agency gets the point.

Thank you very much!

University of Zurich, EBPI 2014-03-31 Big Data Science Page 33

Page 34: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

And data science?

Nate Silver @ twitter after his JSM 2013 talk

Data scientist is just a sexed up word for statistician.

Just do good work and call yourself whatever youwant.

Just make sure your grant agency gets the point.

Thank you very much!

University of Zurich, EBPI 2014-03-31 Big Data Science Page 34

Page 35: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

And data science?

Nate Silver @ twitter after his JSM 2013 talk

Data scientist is just a sexed up word for statistician.

Just do good work and call yourself whatever youwant.

Just make sure your grant agency gets the point.

Thank you very much!

University of Zurich, EBPI 2014-03-31 Big Data Science Page 35

Page 36: Big Data Science - UZHuser.math.uzh.ch/hothorn/talks/big_data_science_UZH_2014.pdfBig data science {Big datarevolution {Data science {Predictive modelling {Business intelligence {Machine

References

Hothorn & Leisch (2011)http://dx.doi.org/10.1093/bib/bbq084

Hothorn, Brandl & Muller (2012)http://dx.doi.org/10.1371/journal.pone.0029510

Gneiting & Katzfuss (2014) http://dx.doi.org/10.1146/annurev-statistics-062713-085831

Hothorn & Jung (2014)http://dx.doi.org/10.3109/21678421.2014.893361

Hothorn, Kneib & Buhlmann (2014)http://dx.doi.org/10.1111/rssb.12017

University of Zurich, EBPI 2014-03-31 Big Data Science Page 36