applied statistics – challenges and reward

Post on 13-Jan-2016

44 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Applied Statistics – Challenges and Reward. Wenjiang Fu, Ph.D Computational Genomics Lab, Department of Epidemiology Michigan State University fuw@msu.edu www.msu.edu/~fuw. What is Statistics ?. “Lies, Damned Lies, and Statistics” “Figures fool when fools figure” - PowerPoint PPT Presentation

TRANSCRIPT

1

Applied Statistics – Challenges and Reward

Applied Statistics – Challenges and Reward

Wenjiang Fu, Ph.D

Computational Genomics Lab, Department of Epidemiology

Michigan State University

fuw@msu.edu www.msu.edu/~fuw

2

What is Statistics ?What is Statistics ?

“Lies, Damned Lies, and Statistics”

“Figures fool when fools figure”

A branch of mathematical science that studies data through probability distribution and modeling.

Fields: probability theory, actuarial science, biostatistics, finance statistics, industrial statistics, etc.

Related fields: biometrics, bioinformatics, geo-statistics, statistical mechanics, econometrics, etc.

3

Grand challenges we are facing …Grand challenges we are facing …

“Data”Knowledge

&Information

Decision

Statistics

21st century will be the golden age of statistics !

4

Grand challenges we are facing …Grand challenges we are facing …

1. Data collection technology has advanced dramatically, but without sufficient statistical sampling design and experimental design.

2. Advancement of technology for discovering and retrieving useful information has been lagging and has become the bottleneck.

3. More sophisticated approaches are needed for decision making and risk management.

5

Statistical Challenges -- Massive Amount of DataStatistical Challenges -- Massive Amount of Data

 

6

Statistical Challenges – Image DataStatistical Challenges – Image Data

7

Statistical Challenges – Functional Data, Graph (Network) Data, and Shape DataStatistical Challenges – Functional Data, Graph (Network) Data, and Shape Data

8

Statistical Challenges – Click Stream DataStatistical Challenges – Click Stream Data

9

Statistical Challenges – Data Fusion and AssimilationStatistical Challenges – Data Fusion and Assimilation

Data

10

Statistics in ScienceStatistics in Science

Cosmic microwave background radiationHigh Energy Physics

Tick-by-tick stock data Genomic/proteomic data

11

Statistics in ScienceStatistics in Science

Finger Prints Microarray

12

What do we do? What do we do?

New ways of thinking and attacking problems

Finding sub-optimal but computationally feasible solutions.

New paradigm for new types of data

Be satisfied with ‘very rough’ approximations

Turn research results into easy and publicly available software and programs

Join force with computer scientists.

13

Some ‘hot’ research directions Some ‘hot’ research directions

Dimension reduction

Visualization

Dynamic systems

Simulation and real time computation

Uncertainty and risk management

Interdisciplinary research

14

Example 1. Sociology dataExample 1. Sociology data

Homicide Arrest Rate (per 105) (R. O'Brien, 2000)

1960 1965 1970 1975 1980 1985 1990 1995

15 8.89 9.07 17.22 17.54 18.02 16.32 36.52 35.24

20 14.00 15.18 23.76 25.62 23.95 21.11 29.10 32.34

25 13.45 14.69 20.09 21.05 18.91 16.79 17.99 16.75

30 10.73 11.70 16.00 15.81 15.22 12.59 12.44 10.05

35 9.37 9.76 13.13 12.83 12.31 9.60 9.38 7.27

40 6.48 7.41 10.10 10.52 8.79 7.50 6.81 5.48

45 5.71 5.56 7.51 7.32 6.76 5.31 5.17 3.67

15

Result through statistical modeling Result through statistical modeling

age

ag

e e

ffect

15 20 25 30 35 40 45

-0.5

0.0

0.5

1.0

Age trend

period

pe

rio

d e

ffect

1960 1970 1980 1990

-0.5

0.0

0.5

1.0

Period trend

cohort

coh

ort

effe

ct

1920 1930 1940 1950 1960 1970 1980

-0.5

0.0

0.5

1.0

Cohort trend

16

Example 2. Epidemiological study dataExample 2. Epidemiological study data

Mortality from Cervical Cancer in Ontario 1960-94 Rate (per 105 person-year) and Frequency

Age Year 60-64 65-69 70-74 75-79 80-84 85-89 90-94

20-24 0.15 2

0.11 2

0.15 3

0.14 3

0.14 3

0.20 4

0.13 1

25-29 1.22 14

0.52 8

1.24 23

0.80 16

0.88 20

0.47 11

0.93 8

30-34 3.15 35

2.94 37

2.01 32

1.45 27

1.79 38

1.31 32

1.08 11

35-39 5.38 62

4.47 52

3.59 46

3.86 61

3.12 60

2.47 55

2.16 21

40-44 9.80 116

7.15 84

4.32 51

5.12 66

3.71 60

2.47 63

2.16 33

45-49 15.66 160

10.97 130

7.75 91

4.69 55

5.17 67

5.02 83

3.41 27

50-54 17.01 151

13.32 138

8.19 97

6.82 80

6.12 72

4.65 61

5.79 35

55-59 18.56 141

15.23 133

11.53 118

9.12 107

5.94 70

5.81 69

5.77 29

60-64 22.44 144

16.08 121

13.66 117

10.71 108

7.93 92

7.35 86

4.02 19

65-69 23.53 128

18.87 119

15.31 112

13.79 115

10.36 102

7.60 86

6.83 31

70-74 25.89 116

19.36 97

15.36 89

15.18 103

13.95 108

10.42 96

10.44 44

75-79 29.12 94

20.08 75

23.84 102

16.29 82

14.90 88

11.50 78

12.73 38

80-84 31.76 62

24.72 59

21..51 60

23.82 79

12.69 50

17.40 81

12.77 27

85 + 33.16 42

28.95 50

22.90 50

24.94 68

15.23 51

13.88 56

10.42 19

17

Results from statistical modeling Results from statistical modeling

age

age

effe

ct

20 30 40 50 60 70 80

-3-2

-10

1

Age trend, 95% CI

period

perio

d ef

fect

1960 1965 1970 1975 1980 1985 1990

-3-2

-10

1

Period trend, 95% CI

cohort

coho

rt ef

fect

1880 1900 1920 1940 1960

-3-2

-10

1

Cohort trend, 95% CI

18

Example 3 Medical study data: Ob/GynExample 3 Medical study data: Ob/Gyn

Modeling of PlGF: Placental Growth Factor

19

SNP: Single Nucleotide PolymorphismSNP: Single Nucleotide Polymorphism

Homologous pairs of chromosomes

Paternal allele

Maternal allele

Paternal allele

Maternal allele

ACGAACAGCTTGCTTGTCGA

ACGAGCAGCT

TGCTCGTCGA

SNP A/G

20The International HapMap Consortium (Nature 2003)

21

Allele, Haplotype and Diplotype

A

B

a

b

SNP 1: two alleles A and a

SNP 2: two alleles B and b

Haplotype [AB]

Diplotype [AB][ab]

Haplotype [ab]

22

Microarray Technology: 2 channelsMicroarray Technology: 2 channels

Hybridization:

A T C G T A G

| | | | | | |

T A G C A T C

23

Microarray normalization: between slides

Boxplots of log ratios from 3 replicate self-self hybridizations.Left panel: before normalizationMiddle panel: after within print-tip group normalizationRight panel: after a further between-slide scale normalization.

24

Affymetrix SNP ArrayAffymetrix SNP Array

Illustration of SNP annotation on Affymetrix SNP array.

Adopted from Matsuzaki et al 2004.

‘AB’ SNP: AC

A – A, B – C.

25

Computational Genomics Data: SNP GenotypeComputational Genomics Data: SNP Genotype

Error rate : 1 – 5 % : GIGO – Garbage in Garbage out

26

Computational Genomics Data: SNP GenotypeComputational Genomics Data: SNP Genotype

27

Genetic Variation influences

- disease susceptibility- disease progression- therapeutic response- unwanted drug effects

Genetics is pointing the way to personalized medicine…

With the development of human HapMap project, coupling with advanced statistical approaches, we

are entering an era to design personalized medicine based on individual’s genetic profile.

Prospects IProspects I Genome-oriented Medicine

28

Whole Genome-wide Association StudiesWhole Genome-wide Association Studies

29

Whole Genome-wide Association StudiesWhole Genome-wide Association Studies

Successful study:

Wellcome Trust Case-Control Consortium

GWAS on 7 diseases with 14,000 patients and 2000 common controls. (Nature 2007)

Hypertension, diabetes, etc.

30

Recruiting Graduate StudentsRecruiting Graduate Students

Epidemiology: Study distribution of Disease;

Biostatistics: data modeling, computation;

Quantitative Biology Initiative: MSU cross-disciplinary center.

Background: Mathematics, Statistics, Physics, Biology, Chemistry, and others.

Opportunity: Contact your department graduate director/chairman for funding from the Ministry of Education. MSU Epi/Biostatistics provide partial funding and cover tuition fee.

Qualification: TOEFL, GRE, GPA, Reference letter.

My contact: fuw@msu.edu www.msu.edu/~fuw

Application: WWW.MSU.EDU

31

Thank you!

Q and A.

Office: CMS 415.

top related