using large data sets to study factors associated with the incidence of multiple sclerosis

18
sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd

Upload: renee

Post on 24-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Using large data sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd. Multiple Sclerosis (MS). A complex autoimmune disease with both acute and chronic phases. Confounding factors include: genetic background - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using large data sets to study factors associated with the incidence of multiple sclerosis

Using large data sets to study factors associated

with the incidence of multiple sclerosis.

Tamah FridmanDavid GlickJohn Kidd

Page 2: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• A complex autoimmune disease with both acute and chronic phases.

• Confounding factors include:o genetic background o viral infections including EBV and HSV o nutritional factors o environmental factors such as latitude and

smoking

Page 3: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• In a more general way, this module could be used to explore the difference between correlation and causation.

• For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS.

Page 4: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• There is a vast literature examining the effects of o geography omigration o infectious diseases o sunlight related to vitamin D levelso cigarette smoking o diet o hormones

Page 5: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• Over time a number of data sets have been published that explore relationships between environmental factors and MS.

• Many of these are single studies that were later included in one or more “meta-analysis” articles.

• In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others.

Page 6: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via:o A GIS ploto A scatter plot o 3-D Principle Component Analysis (PCA)

• These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways.

Page 7: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

Page 9: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. o Hepatitis C: -0.0152o Cervical cancer: -0.34991o Liver cancer: -0.25501o HIV: -0.1451o Lung cancer: 0.547928

Page 10: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

Country ms rate Hep C rate cerv ca rate liv ca rate HIV rate lung ca rate

Afghanistan 0.4 3.8 2.6 3.8 0 7.2

Albania 2.8 0.1 1.5 6.7 0.2 31

Algeria 0.1 0.1 3.4 1.3 2 10.6

Andorra 0.4 0.6 0.8 4.9 0 21.6

Angola 0.2 1 12.5 9.6 79.2 2.3

Antigua/Bar. 0 0 5.4 5.2 19.7 8.3

This slide is a sample—the complete spreadsheet contains 192 countries.

Page 11: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow.

Page 12: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

f(x) = − 0.0482397115134393 x + 0.310970668054575R² = 0.0110671587093064

ms rate (Y) versus Hep C rate (X)

ms rateLinear (ms rate)

Page 13: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

f(x) = 0.017306523346616 x + 0.0283145007695525R² = 0.300225342134711

ms rate (Y) versus lung cancer rate (X)

ms rateLinear (ms rate)

Page 14: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)• The complete Excel spreadsheet was also used

in Principal Component Analysis (PCA).• The data were saved in a tab delimited format

and then imported into the NIA Array Analysis Tool for Principle Component Analysis.

• The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html

Page 15: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)• As something completely different, meta-analysis

data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data.

• A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation.

• However, students “inventing” their own analysis can be expected to encounter similar problems.

Page 16: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

Page 17: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

Page 18: Using large data sets to study factors associated with the incidence of multiple sclerosis

Multiple Sclerosis (MS)

• We are deeply indebted to: • Ileana Betancourt and Colleen McLinn for

help with GIS • Jeff Lutgen and Bruce Wiggins for help

with Excel.