using large data sets to study factors associated with the incidence of multiple sclerosis
DESCRIPTION
Using large data sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd. Multiple Sclerosis (MS). A complex autoimmune disease with both acute and chronic phases. Confounding factors include: genetic background - PowerPoint PPT PresentationTRANSCRIPT
Using large data sets to study factors associated
with the incidence of multiple sclerosis.
Tamah FridmanDavid GlickJohn Kidd
Multiple Sclerosis (MS)
• A complex autoimmune disease with both acute and chronic phases.
• Confounding factors include:o genetic background o viral infections including EBV and HSV o nutritional factors o environmental factors such as latitude and
smoking
Multiple Sclerosis (MS)
• In a more general way, this module could be used to explore the difference between correlation and causation.
• For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS.
Multiple Sclerosis (MS)
• There is a vast literature examining the effects of o geography omigration o infectious diseases o sunlight related to vitamin D levelso cigarette smoking o diet o hormones
Multiple Sclerosis (MS)
• Over time a number of data sets have been published that explore relationships between environmental factors and MS.
• Many of these are single studies that were later included in one or more “meta-analysis” articles.
• In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others.
Multiple Sclerosis (MS)
• In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via:o A GIS ploto A scatter plot o 3-D Principle Component Analysis (PCA)
• These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways.
Multiple Sclerosis (MS)
Multiple Sclerosis (MS)
• Link to interactive ArcGIS plot:• http://arcgis.com/explorer/?open=2e7723700ef942b7a5aa2f8cbd96a5fc&extent=37882315.9514645,2989772.13723539,44144037.3085845,6061929.17807238
Multiple Sclerosis (MS)
• The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. o Hepatitis C: -0.0152o Cervical cancer: -0.34991o Liver cancer: -0.25501o HIV: -0.1451o Lung cancer: 0.547928
Multiple Sclerosis (MS)
Country ms rate Hep C rate cerv ca rate liv ca rate HIV rate lung ca rate
Afghanistan 0.4 3.8 2.6 3.8 0 7.2
Albania 2.8 0.1 1.5 6.7 0.2 31
Algeria 0.1 0.1 3.4 1.3 2 10.6
Andorra 0.4 0.6 0.8 4.9 0 21.6
Angola 0.2 1 12.5 9.6 79.2 2.3
Antigua/Bar. 0 0 5.4 5.2 19.7 8.3
This slide is a sample—the complete spreadsheet contains 192 countries.
Multiple Sclerosis (MS)
• The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow.
Multiple Sclerosis (MS)
0 1 2 3 4 5 60
0.5
1
1.5
2
2.5
3
f(x) = − 0.0482397115134393 x + 0.310970668054575R² = 0.0110671587093064
ms rate (Y) versus Hep C rate (X)
ms rateLinear (ms rate)
Multiple Sclerosis (MS)
0 10 20 30 40 50 600
0.5
1
1.5
2
2.5
3
f(x) = 0.017306523346616 x + 0.0283145007695525R² = 0.300225342134711
ms rate (Y) versus lung cancer rate (X)
ms rateLinear (ms rate)
Multiple Sclerosis (MS)• The complete Excel spreadsheet was also used
in Principal Component Analysis (PCA).• The data were saved in a tab delimited format
and then imported into the NIA Array Analysis Tool for Principle Component Analysis.
• The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html
Multiple Sclerosis (MS)• As something completely different, meta-analysis
data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data.
• A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation.
• However, students “inventing” their own analysis can be expected to encounter similar problems.
Multiple Sclerosis (MS)
Multiple Sclerosis (MS)
Multiple Sclerosis (MS)
• We are deeply indebted to: • Ileana Betancourt and Colleen McLinn for
help with GIS • Jeff Lutgen and Bruce Wiggins for help
with Excel.