applied statistics – challenges and reward

31
1 Applied Statistics – Challenges and Reward Wenjiang Fu, Ph.D Computational Genomics Lab, Department of Epidemiology Michigan State University [email protected] www.msu.edu/~fuw

Upload: arvid

Post on 13-Jan-2016

44 views

Category:

Documents


2 download

DESCRIPTION

Applied Statistics – Challenges and Reward. Wenjiang Fu, Ph.D Computational Genomics Lab, Department of Epidemiology Michigan State University [email protected] www.msu.edu/~fuw. What is Statistics ?. “Lies, Damned Lies, and Statistics” “Figures fool when fools figure” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Applied Statistics  – Challenges and Reward

1

Applied Statistics – Challenges and Reward

Applied Statistics – Challenges and Reward

Wenjiang Fu, Ph.D

Computational Genomics Lab, Department of Epidemiology

Michigan State University

[email protected] www.msu.edu/~fuw

Page 2: Applied Statistics  – Challenges and Reward

2

What is Statistics ?What is Statistics ?

“Lies, Damned Lies, and Statistics”

“Figures fool when fools figure”

A branch of mathematical science that studies data through probability distribution and modeling.

Fields: probability theory, actuarial science, biostatistics, finance statistics, industrial statistics, etc.

Related fields: biometrics, bioinformatics, geo-statistics, statistical mechanics, econometrics, etc.

Page 3: Applied Statistics  – Challenges and Reward

3

Grand challenges we are facing …Grand challenges we are facing …

“Data”Knowledge

&Information

Decision

Statistics

21st century will be the golden age of statistics !

Page 4: Applied Statistics  – Challenges and Reward

4

Grand challenges we are facing …Grand challenges we are facing …

1. Data collection technology has advanced dramatically, but without sufficient statistical sampling design and experimental design.

2. Advancement of technology for discovering and retrieving useful information has been lagging and has become the bottleneck.

3. More sophisticated approaches are needed for decision making and risk management.

Page 5: Applied Statistics  – Challenges and Reward

5

Statistical Challenges -- Massive Amount of DataStatistical Challenges -- Massive Amount of Data

 

Page 6: Applied Statistics  – Challenges and Reward

6

Statistical Challenges – Image DataStatistical Challenges – Image Data

Page 7: Applied Statistics  – Challenges and Reward

7

Statistical Challenges – Functional Data, Graph (Network) Data, and Shape DataStatistical Challenges – Functional Data, Graph (Network) Data, and Shape Data

Page 8: Applied Statistics  – Challenges and Reward

8

Statistical Challenges – Click Stream DataStatistical Challenges – Click Stream Data

Page 9: Applied Statistics  – Challenges and Reward

9

Statistical Challenges – Data Fusion and AssimilationStatistical Challenges – Data Fusion and Assimilation

Data

Page 10: Applied Statistics  – Challenges and Reward

10

Statistics in ScienceStatistics in Science

Cosmic microwave background radiationHigh Energy Physics

Tick-by-tick stock data Genomic/proteomic data

Page 11: Applied Statistics  – Challenges and Reward

11

Statistics in ScienceStatistics in Science

Finger Prints Microarray

Page 12: Applied Statistics  – Challenges and Reward

12

What do we do? What do we do?

New ways of thinking and attacking problems

Finding sub-optimal but computationally feasible solutions.

New paradigm for new types of data

Be satisfied with ‘very rough’ approximations

Turn research results into easy and publicly available software and programs

Join force with computer scientists.

Page 13: Applied Statistics  – Challenges and Reward

13

Some ‘hot’ research directions Some ‘hot’ research directions

Dimension reduction

Visualization

Dynamic systems

Simulation and real time computation

Uncertainty and risk management

Interdisciplinary research

Page 14: Applied Statistics  – Challenges and Reward

14

Example 1. Sociology dataExample 1. Sociology data

Homicide Arrest Rate (per 105) (R. O'Brien, 2000)

1960 1965 1970 1975 1980 1985 1990 1995

15 8.89 9.07 17.22 17.54 18.02 16.32 36.52 35.24

20 14.00 15.18 23.76 25.62 23.95 21.11 29.10 32.34

25 13.45 14.69 20.09 21.05 18.91 16.79 17.99 16.75

30 10.73 11.70 16.00 15.81 15.22 12.59 12.44 10.05

35 9.37 9.76 13.13 12.83 12.31 9.60 9.38 7.27

40 6.48 7.41 10.10 10.52 8.79 7.50 6.81 5.48

45 5.71 5.56 7.51 7.32 6.76 5.31 5.17 3.67

Page 15: Applied Statistics  – Challenges and Reward

15

Result through statistical modeling Result through statistical modeling

age

ag

e e

ffect

15 20 25 30 35 40 45

-0.5

0.0

0.5

1.0

Age trend

period

pe

rio

d e

ffect

1960 1970 1980 1990

-0.5

0.0

0.5

1.0

Period trend

cohort

coh

ort

effe

ct

1920 1930 1940 1950 1960 1970 1980

-0.5

0.0

0.5

1.0

Cohort trend

Page 16: Applied Statistics  – Challenges and Reward

16

Example 2. Epidemiological study dataExample 2. Epidemiological study data

Mortality from Cervical Cancer in Ontario 1960-94 Rate (per 105 person-year) and Frequency

Age Year 60-64 65-69 70-74 75-79 80-84 85-89 90-94

20-24 0.15 2

0.11 2

0.15 3

0.14 3

0.14 3

0.20 4

0.13 1

25-29 1.22 14

0.52 8

1.24 23

0.80 16

0.88 20

0.47 11

0.93 8

30-34 3.15 35

2.94 37

2.01 32

1.45 27

1.79 38

1.31 32

1.08 11

35-39 5.38 62

4.47 52

3.59 46

3.86 61

3.12 60

2.47 55

2.16 21

40-44 9.80 116

7.15 84

4.32 51

5.12 66

3.71 60

2.47 63

2.16 33

45-49 15.66 160

10.97 130

7.75 91

4.69 55

5.17 67

5.02 83

3.41 27

50-54 17.01 151

13.32 138

8.19 97

6.82 80

6.12 72

4.65 61

5.79 35

55-59 18.56 141

15.23 133

11.53 118

9.12 107

5.94 70

5.81 69

5.77 29

60-64 22.44 144

16.08 121

13.66 117

10.71 108

7.93 92

7.35 86

4.02 19

65-69 23.53 128

18.87 119

15.31 112

13.79 115

10.36 102

7.60 86

6.83 31

70-74 25.89 116

19.36 97

15.36 89

15.18 103

13.95 108

10.42 96

10.44 44

75-79 29.12 94

20.08 75

23.84 102

16.29 82

14.90 88

11.50 78

12.73 38

80-84 31.76 62

24.72 59

21..51 60

23.82 79

12.69 50

17.40 81

12.77 27

85 + 33.16 42

28.95 50

22.90 50

24.94 68

15.23 51

13.88 56

10.42 19

Page 17: Applied Statistics  – Challenges and Reward

17

Results from statistical modeling Results from statistical modeling

age

age

effe

ct

20 30 40 50 60 70 80

-3-2

-10

1

Age trend, 95% CI

period

perio

d ef

fect

1960 1965 1970 1975 1980 1985 1990

-3-2

-10

1

Period trend, 95% CI

cohort

coho

rt ef

fect

1880 1900 1920 1940 1960

-3-2

-10

1

Cohort trend, 95% CI

Page 18: Applied Statistics  – Challenges and Reward

18

Example 3 Medical study data: Ob/GynExample 3 Medical study data: Ob/Gyn

Modeling of PlGF: Placental Growth Factor

Page 19: Applied Statistics  – Challenges and Reward

19

SNP: Single Nucleotide PolymorphismSNP: Single Nucleotide Polymorphism

Homologous pairs of chromosomes

Paternal allele

Maternal allele

Paternal allele

Maternal allele

ACGAACAGCTTGCTTGTCGA

ACGAGCAGCT

TGCTCGTCGA

SNP A/G

Page 20: Applied Statistics  – Challenges and Reward

20The International HapMap Consortium (Nature 2003)

Page 21: Applied Statistics  – Challenges and Reward

21

Allele, Haplotype and Diplotype

A

B

a

b

SNP 1: two alleles A and a

SNP 2: two alleles B and b

Haplotype [AB]

Diplotype [AB][ab]

Haplotype [ab]

Page 22: Applied Statistics  – Challenges and Reward

22

Microarray Technology: 2 channelsMicroarray Technology: 2 channels

Hybridization:

A T C G T A G

| | | | | | |

T A G C A T C

Page 23: Applied Statistics  – Challenges and Reward

23

Microarray normalization: between slides

Boxplots of log ratios from 3 replicate self-self hybridizations.Left panel: before normalizationMiddle panel: after within print-tip group normalizationRight panel: after a further between-slide scale normalization.

Page 24: Applied Statistics  – Challenges and Reward

24

Affymetrix SNP ArrayAffymetrix SNP Array

Illustration of SNP annotation on Affymetrix SNP array.

Adopted from Matsuzaki et al 2004.

‘AB’ SNP: AC

A – A, B – C.

Page 25: Applied Statistics  – Challenges and Reward

25

Computational Genomics Data: SNP GenotypeComputational Genomics Data: SNP Genotype

Error rate : 1 – 5 % : GIGO – Garbage in Garbage out

Page 26: Applied Statistics  – Challenges and Reward

26

Computational Genomics Data: SNP GenotypeComputational Genomics Data: SNP Genotype

Page 27: Applied Statistics  – Challenges and Reward

27

Genetic Variation influences

- disease susceptibility- disease progression- therapeutic response- unwanted drug effects

Genetics is pointing the way to personalized medicine…

With the development of human HapMap project, coupling with advanced statistical approaches, we

are entering an era to design personalized medicine based on individual’s genetic profile.

Prospects IProspects I Genome-oriented Medicine

Page 28: Applied Statistics  – Challenges and Reward

28

Whole Genome-wide Association StudiesWhole Genome-wide Association Studies

Page 29: Applied Statistics  – Challenges and Reward

29

Whole Genome-wide Association StudiesWhole Genome-wide Association Studies

Successful study:

Wellcome Trust Case-Control Consortium

GWAS on 7 diseases with 14,000 patients and 2000 common controls. (Nature 2007)

Hypertension, diabetes, etc.

Page 30: Applied Statistics  – Challenges and Reward

30

Recruiting Graduate StudentsRecruiting Graduate Students

Epidemiology: Study distribution of Disease;

Biostatistics: data modeling, computation;

Quantitative Biology Initiative: MSU cross-disciplinary center.

Background: Mathematics, Statistics, Physics, Biology, Chemistry, and others.

Opportunity: Contact your department graduate director/chairman for funding from the Ministry of Education. MSU Epi/Biostatistics provide partial funding and cover tuition fee.

Qualification: TOEFL, GRE, GPA, Reference letter.

My contact: [email protected] www.msu.edu/~fuw

Application: WWW.MSU.EDU

Page 31: Applied Statistics  – Challenges and Reward

31

Thank you!

Q and A.

Office: CMS 415.