statistics for data science · statistics for data science msc data science wise 2019/20 prof. dr....

26
Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics for Data Science

MSc Data Science WiSe 2019/20

Prof. Dr. Dirk Ostwald

1

Page 2: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Introduction

2

Page 3: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Computational Cognitive Neuroscience

3

Page 4: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Initial MSc program proposal 11/2016

1

2

Referenzen

4

Page 5: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Data science

The art of creating meaning from data

5

Page 6: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

6

Page 7: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Data analysis is data reduction

Raw Data Reported Data

7

Page 8: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Data analysis is model-based

Model Formulation

Model estimation

Model Evaluation

DataReality

The Scienti-c Method

8

Page 9: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Data analysis is an interpretative device

9

Page 10: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Data science = Statistics = Machine learning = Artificial intelligence?

Statistics• Probabilistic models

• Theoretical analysis

• Optimality

• Asymptotics

• Science philosophy

Machine learning

• Deterministic models

• Classification

• Bayesian models

• Benchmarking

• Applications

AI (est. 2015)

• Deep learning

• Reinforcement

• Machine learning

• Data analysis

• Hype

10

Page 11: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Statistics

The art of creating meaning from data

and quantifying its associated uncertainty

11

Page 12: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Themes

12

Page 13: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Data science

Historical development of statistics

13

Page 14: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Historical development of statistics

1900 Karl Pearson’s chi-square test

1908 Student’s t statistic

1925 Fisher’s Statistical Methods for Research Workers

1933 Neyman and Pearson’s optimal hypothesis testing

1937 Neyman’s confidence intervals

1950 Wald’s statistical decision theory

1950 Savage’s & de Finetti’s Bayesian decision theory

1961 Raiffa & Schlaifer’s Applied statistical decision theory

1962 Tukey’s The future of data analysis

1971 Lindley’s Bayesian statistics

1972 Cox’s proportional hazards

1979 Bootstrap and MCMC

1995 False discovery rate and LASSO

1996 Support vector machines

2000 Microarray and neuroimaging multiple testing

2010 Resurgence of neural networks as deep learning

2015 Data science

14

Page 15: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Course outline

Section Topic Aims

Probability (1) Probability spaces A common language

(2) Random variables A common language

(3) Joint distributions A common language

(4) Random variable transformations Core mechanism of statistics

(5) Expectation, covariance, limits Making good predictions

Frequentism (6) Foundations, maximum likelihood Creating estimators

(7) Finite-sample estimator properties How good are estimators?

(8) Asymptotic estimator properties How good are estimators?

(9) Confidence intervals How good are estimates?

(10) Hypothesis testing Making good decisions

(11) Nonparametric inference Subduing parametrics

Bayesianism (12) Foundations, conjugate inference Probabilistic inference

(13) Bayesian filtering Dynamic inference

(14) Prior distributions Objective subjectivism

(15) Markov chain Monte Carlo Approximate inference

(16) Variational inference Approximate inference

15

Page 16: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Some typical Machine Learning topics

• Principal and independent component analysis

• Logistic regression and linear discriminants

• Support vector machines

• Kernel methods

• Neural networks and Deep learning

• Gaussian process regression

Some typical Artificial Intelligence topics

• Reinforcement learning

• (Partially observable) Markov Decision Processes

16

Page 17: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Community nomenclatures

Statistic Machine Learning Meaning

Data Training data Data

Estimation Learning, Training Using data to estimate parameters

Frequentist inference - Optimal many samples methods

Bayesian inference Bayesian inference Data-based uncertainty updating

Covariates Features Structural and known data predictors

17

Page 18: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Central postulates of Probability theory

• Chance processes can be described mathematically.

• Mathematics can be used to make predictions about random events.

• Reasoning about uncertain events is naturally related to measuring volumes.

18

Page 19: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Central postulates of Frequentist inference

• Probabilities are interpreted as limiting relative frequencies and are consid-

ered objective properties of the real world.

• Parameters are fixed, unknown constants, referred to as true, but unknown

values. No probability statements are made about parameters.

• Statistical procedures are designed to have good long run frequency prop-

erties and are typically assessed by studying their sampling distributions.

19

Page 20: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Central postulates of Bayesian inference

• Probabilities are interpreted as degrees of belief, not limiting frequencies.

Statements like “the probability that it will rain this afternoon is 0.5” are

meaningful.

• Parameters are fixed, unknown constants, about which probabilistic state-

ments quantifying our uncertainty about their value can be made.

• Probabilistic statements about parameters are made with the help of prob-

ability distributions, from which further inferences, such as point or interval

estimates, can be derived.

20

Page 21: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Mathematical preliminaries

• Sets and functions

• Matrix algebra

• Calculus

• Probabilistic foundations

• Gaussian distributionsProf. Dr. Dirk Ostwald

Statistical Methods 2019/20MSc Social, Cognitive, and Affective Neuroscience

21

Page 22: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Key references

Applied Statistical Inference

Leonhard HeldDaniel Sabanés Bové

Likelihood and Bayes

22

Page 23: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Course components

Component Aims

Vorlesung Core course content, knowledge acquisition, breadth

Ubung Active participation, knowledge consolidation, depth

Study questions Focus, memorization support, examination

Exercises (Theory) Theoretical depth, self-study, oral presentations

Exercises (Programming) Intuition, Python training, application

Exam Knowledge reproduction

23

Page 24: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Statistics

Course requirements

Vorlesung

• Written exam, pass or fail

• 30 questions, 90 min

• Study questions = exam question pool

• 17.02.2020, 06.04.2020, 20.07.20, 28.09.20

Ubung

• Attendance of 85% of the sessions

• Presentation of one theoretical or programming exercise in class

• Please sign up at bit.ly/2Vzgxx7

• Submission of Jupyter notebook of all programming exercises

24

Page 25: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Additional course of interest WiSe 2019/20

25

Page 26: Statistics for Data Science · Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1. Introduction 2. Data science Computational Cognitive Neuroscience

Additional course of interest SoSe 2020

26