![Page 2: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/2.jpg)
Outline
I Course overview
I Course main topicsI Basics
I Common epidemiologic study designsI Basic genetics
![Page 3: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/3.jpg)
Acknowledgements
I Bhramar Mukherjee (U Michigan)
I Shawn Lee (U Michigan)
I James Dai
I Chongzhi Di
I Charles Kooperberg
I Ross Prentice
I Yingye Zheng
![Page 4: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/4.jpg)
Overview
I This course will review many (but selected) currentdevelopments in statistical methods for (genetic)epidemiologic studies
I Cover some basic models and standard analysis techniques butis not meant to be comprehensive
I Focus on opportunities and challenges
I Things that are not covered:I Data preprocessing and QC stepsI Software packages
![Page 5: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/5.jpg)
Main topics
I Various topics that are related to observational studiesI Why?
I A lot of data (genetics, environmental risk factor, biomarkers,various molecular data on the phenotypes)
I Lots of interesting questionsI Messy and complex
![Page 6: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/6.jpg)
Topics: Genetic association studies
I This course will review many statistical methods in geneticassociation studies
I Single variant analysis and confounding; effect size estimationand winner’s curse
I Meta-analysisI Set-based (genetic) association analysis
I No biology prerequisites
![Page 7: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/7.jpg)
Topics: Environmental risk factors
I Measurement error (Ross Prentice)
I Functional data analysis and applications to high-resolutionbio-signal data from wearable devices (Chongzhi Di)
![Page 8: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/8.jpg)
Topics: Genetics and Environment
I Gene-environment interaction, genome-wide search strategiesand machine learning approaches (Charles Kooperberg)
I Mendelian randomization and instrumental variable analysis(James Dai)
![Page 9: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/9.jpg)
Topics: Study design and risk estimation
I Current developments of sub-sampling study designs forepidemiologic and biomarker studies (Yingye Zheng)
I Absolute risk estimation from case-control and cohort studies
![Page 10: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/10.jpg)
Evaluation
I Credit/No Credit course; audit is welcome
I No homework, but there will be recommended readingsI Final will be a 20-30 minute presentation of a self-chosen
topicI An in-depth reading of a particular topic covered in this course
or related to the research interests of (guest) lecturers
![Page 11: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/11.jpg)
Complex Diseases
I Many complex diseases are contributed by both genes andenvironment
I Twin studies suggest that about 30% of colorectal cancer riskis due to genetic factors
![Page 12: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/12.jpg)
Observational Studies
I Powerful tools for studying disease etiology (risk factors ordisease causation)
I Investigator has no control over exposureI Two common study designs
I Case-control studiesI Cohort studies
![Page 13: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/13.jpg)
Case-control studies
I Identify risk factors for a disease/outcome
I Schulz KF and Grimes DA 2002. Case-control studies. Lancet
359:431-34.
![Page 14: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/14.jpg)
Example: Diet, Activity, and Lifestyle Study (DALS)
I DALS is a population-based case-control study of coloncancer.
I Case-definition Participants were recruited from threelocations: the Kaiser Permanente Medical Care Program(KPMCP) of California, Utah, and Minnesota. Eligibilitycriteria included age at diagnosis of primary colorectal cancerbetween 30 and 79 years in 1991-1994. Individuals withfamilial adenomatous polyposis, Crohns disease, or ulcerativecolitis were excluded.
I Control-definition Controls from KPMCP were randomlyselected from membership lists. In Utah and Minnesota,controls were randomly selected through random-digit dialingand driver license lists.
I Matching Cases and controls were matched by age (5-year)and sex.
![Page 15: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/15.jpg)
Case-Control Data Analysis
I Y : Disease status
I Z : Covariates
I Logistic regression model
Pr(Y = 1|Z ) =exp(β0 + β′Z )
1 + exp(β0 + β′Z )(1)
![Page 16: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/16.jpg)
Case-Control Data Analysis
I Bayes’ rule and logistic regression model (1) imply that
Pr(Z |Y = 1)
Pr(Z |Y = 0)=
Pr(Z ,Y = 1)/Pr(Y = 1)
Pr(Z ,Y = 0)/Pr(Y = 0)
=Pr(Y = 1|Z ) Pr(Y = 0)
Pr(Y = 0|Z ) Pr(Y = 1)
=Pr(Y = 0)
Pr(Y = 1)exp(β0 + β′Z ) (2)
![Page 17: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/17.jpg)
Case-Control Data Analysis
I Now imagine cases and controls are members of a second,hypothetical population of individuals whose diseaseprobability is π, the proportion of sampled subjects are cases,but the covariate distribution Pr(Z |Y = 1) and Pr(Z |Y = 0)still satisfy (2).
I In this hypothetical population, from Bayes’ rule
PZ ≡ Pr(Y = 1|Z ) =π Pr(Z |Y = 1)
π Pr(Z |Y = 1) + (1− π) Pr(Z |Y = 0)
=π/(1− π) Pr(Z |Y = 1)/Pr(Z |Y = 0)
1 + π/(1− π) Pr(Z |Y = 1)/Pr(Z |Y = 0)
=exp(β∗ + βZ )
1 + exp(β∗ + βZ )
I β∗ = β0 + log{π/(1− π)} − log{Pr(Y = 1)/Pr(Y = 0)}I β0: Baseline disease probability is not identifiable
![Page 18: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/18.jpg)
Case-Control Data AnalysisI Constraint ∫
PZ f (Z )dZ = π
I It happens that the constraints are satisfied when maximizingthe likelihood function based on PZ . The score equationcorresponding to β∗0 solves the same constraint. Suppose thedata consist of (Yi ,Zi ), i = 1, . . . , n
n∑i=1
Yi −n∑
i=1
PZi= 0
I Despite case-control studies are restrospective in nature, thedata can be analyzed as if they were prospectively collectedusing a logistic model and the odds ratio approximates therelative risk if the disease prevalence is low
Anderson JA(1972). Separate sample logistic discrimination. Biometrika, 59: 19-35.
Prentice RL & Pyke R(1979). Logistic disease incidence models and case-control
studies. Biometrika, 66: 403-411.
![Page 19: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/19.jpg)
Confounding
I Exposure of interest may be confounded by a third factor thatis associated with exposure and the disease.
![Page 20: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/20.jpg)
Control for confounding
I At the design phase:I Sampling cases and controls from the same targeted populationI Matching controls to cases on factors that are potentially
important for disease (e.g., age, sex)I If these factors are fixed to be the same in the cases and
controls then they can not confound the association
I At the analysis phase:I Multivariable adjustment or stratification
![Page 21: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/21.jpg)
Case-Control Designs
I Advantages:I Quick and cheap (relatively)I Best for rare diseasesI Evaluation of multiple exposures
I Concerns:I Selection bias: Cases and controls selected on criteria related
to the exposureI Recall bias: Presence of disease may affect ability to recall or
report the exposureI Beware of reverse causation: the disease results in a change in
behavior (exposure)
![Page 22: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/22.jpg)
(Prospective) Cohort Design
I Cohort is a group of people with something in common,usually an expoure or a defined population group, identifiedbefore occurrance of disease under investigation
I The study population is followed over a period of time todetermine the frequency of disease
I Study the association of exposures with diseases in this groupof people
![Page 23: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/23.jpg)
Example: Women’s Health Initiative (WHI)
I The Women’s Health Initiative (WHI) initiated in 1991consists of three clinical trials (hormone therapy, dietarymodification and calcium/vitamin D) and an observationalstudy
I Investigate major health issues causing morbidity andmortality (e.g., cardiovascular disease, cancer, andosteoporosis) in postmenopausal women
I WHI enrolled more than 160,000 postmenopausal womenaged 50 - 79 years (at time of study enrollment) over 15 years
![Page 24: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/24.jpg)
Cohort Data Analysis
I Survival analysis with time to event outcome
I Suppose T is time to the event of interests and Z arecovariates
λ(t|Z ) = λ0(t) exp(β′Z )
I λ(t|Z ) = lim∆t→0 Pr(t ≤ T < t + ∆t|T ≥ t,Z )
I Longitudinal analysis with repeated (or longitudinalmeasurements)
I Suppose Yij is the measurement (e.g., blood pressure) for theith subject at tij th time
E(Yij) = β0 + β1tij + β′2Zij
![Page 25: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/25.jpg)
Subsampling Design
I Case-cohort and nested case-control study designs.I Commonalities:
I Sampling from a prospective cohort where disease outcomesand some baseline information are known for all the individuals
I Include all individuals who develop the disease during follow-up(cases)
I Differences:I In case-cohort studies, controls come from a subcohort
sampled from the entire cohort at baseline, while in nestedcase-control designs, controls are sampled from individuals atrisk at the times when cases are identified.
I Dr. Yingye Zheng will cover (newest) methods developmentfor subsampling data
![Page 26: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/26.jpg)
Example: WHI
I Cases included colorectal cancer cases from 2009 database
I A control was selected for each case from the risk set at thetime of the case’s diagnosis
I Additional matching variables: age, race, trialarm/observation, and center
I Exclusion criteria: a prior history of colorectal cancer atbaseline, IRB approval not available for data submission intodbGaP, and not sufficient DNA available.
![Page 27: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/27.jpg)
Cohort Studies
I Advantages:I Temporality can be establishedI Several outcomes related to exposure can be studied
simultaneouslyI Best for relatively common diseases or rare exposures
I Concerns:I Large population is needed; time consuming and expensiveI Study itself may alter people’s behaviorI Cohort subjects may differ from the general population
because of eligibility criteria and characteristics related to theirself-selection
![Page 28: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/28.jpg)
Genetics-Central Dogma
I The central dogma describes the flow of genetic informationin cells from DNA to messenger RNA (mRNA) to protein.
![Page 29: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/29.jpg)
Human Genome
I 3 billion base pairs
I 22 autosomes and 2 sex chromosomes
I Approximately 20,000 protein-coding genes
I Protein-coding sequences account for only a very smallfraction of the genome ( ∼ 1.5%)
I Other types of genesI Non-coding RNA : ∼20, 000I Pseudogenes : ∼14, 000
![Page 30: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/30.jpg)
Human Chromosomes
![Page 31: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/31.jpg)
DNA Mutations
I DNA mutations can occur becauseI DNA damage from environmental agentsI Mistakes occur when a cell copies its DNA in preparation for
cell devision
I Mutations are essential to evoluation; they are the rawmaterial of genetic variation
![Page 32: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/32.jpg)
DNA Mutations
I Substitution
CTGGAG
CTGGTG
I Insertion
CTGGG
CTGGCTGG
I Deletion
CTGGCTGG
CTGGG
I Frameshift
ATG TCG AAT
TGT CGA AT
![Page 33: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/33.jpg)
Single Nucleotide Polymorphism (SNP)
I SNP is the most common form of polymorphisms
![Page 34: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/34.jpg)
SNP
I Locus : Specific location of variant on a chromosome.
I Allele : One of a number of alternative forms of the samegene (variant) in a specific locus.
I Previous example: A vs G
I Suppose G is a major allele and A is a minor alleleI Homozygote : individuals with identical pairs of alleles
I GG : major allele homozygoteI AA : minor allele homozygote
I Heterozygote: individuals with two different allelesI AG
![Page 35: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/35.jpg)
SNP
I Number of dbSNP > 60 millions
I The density plot of the minor allele frequencies
![Page 36: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/36.jpg)
Inheritance
I Inheritance mechanism makes the following characteristicsI One locus: Hardy-Weinberg EquilibriumI Multiple loci: Linkage Disequilibrium
![Page 37: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/37.jpg)
Hardy-Weinberg Law
I Hardy-Weinberg equilibrium (HWE) means the frequency of adiploid genotype is the product of the frequencies of itsconsitituent alleles
I Suppose a locus A has two alleles, A1 and A2
I p is the frequency of allele A1 and 1− p is the frequency ofallele A2
I Genotype frequency?
Pr(A1A1), Pr(A1A2), Pr(A2,A2)
I At HWEI p2 is the frequency of A1A1 homozygotesI 2p(1− p) is the frequency of A1A2 heterozygotesI (1− p)2 is the frequency of A2A2 homozygotes
![Page 38: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/38.jpg)
Properties of HWE
I HWE occurs when the two alleles of an individual are randomdraws from the population. One generation of random matingproduces HWE.
I Allele frequency of the next generation
Pr(A1) = Pr(A1A1) + Pr(A1A2)/2 = p2 + p(1− p) = p
I Allele frequency doesn’t changeI A variety of factors can disturb the equilibrium
I Inbreeding and other forms of non-random matingI Subdivision of the populationI Natural selection and genetic drifts
![Page 39: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/39.jpg)
HWE
I HWE had a profound effect in early geneticsI In genetic association studies, HWE is primarily used to check
genotyping qualityI Association studies assume that samples are unrelated
individuals in HWE.I Genotype calls are very precise with error rate ∼ 10−3.I Genotyping technology may be affected by sample preparation,
DNA quality, lab conditions, and SNP conditions. Badly calledSNPs may be out of HWE.
![Page 40: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/40.jpg)
Genotype QC
I Example: Suppose 10% of A1A2 was mis-called to A1A1.
I Probabilities of observed genotypes are
oPr(A1A1) = p2 + 2p(1−p)∗0.1,
oPr(A1A2) = 2p(1−p)∗0.9
I Based on the observed genotypes we can calculate the allelefrequency
p′ = (p2 + 2p(1− p) ∗ 0.1) + p(1− p) ∗ 0.9 = p + 0.1p(1− p)
I By HWE, the probability of A1A1 should bep2 + 2p(1− p) ∗ 0.1 + 0.12p2(1− p)2.
![Page 41: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/41.jpg)
Inferrinng population substructure
I HWE has also been used for deriving population substructure
I Suppose in the sample there are two sub-populations Pop1:Pop2 = 1:1
I The allele frequency is: 0.7(Pop1) and 0.3(Pop2)
Pr(A1A1) = (0.7 ∗ 0.7 + 0.3 ∗ 0.3)/2 = 0.29
Pr(A1A2) = (2 ∗ 0.3 ∗ 0.7 + 2 ∗ 0.3 ∗ 0.7)/2 = 0.42
I The allele frequency is 0.29+0.42/2 = 0.50
I Under HWE, Pr(A1A1) = 0.25 and Pr(A1A2) = 0.50
![Page 42: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/42.jpg)
Test for HWE
I Compare observed genotypes vs expected genotypes fromHWE
I Pearson χ2 test:
T =3∑j
(Oj − Ej)2
Ej
I Test statistic follows χ21 distribution
I Fisher exact test can be used
![Page 43: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/43.jpg)
Recombination
![Page 44: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/44.jpg)
Recombination
I Introduce genetic diversity.
I Crossovers more likely to occur between genes that are furtheraway; likelihood of a recombination event is proportional tothe distance
I Allows for mapping genes
![Page 45: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/45.jpg)
Linkage disequilibrium (LD)
I Two loci A (two alleles A1 and A2), B (two alleles B1 and B2)
I Haplotype (alleles that are located closely together and thattend to be inherited together)
I Frequency of haplotype and allele
Haplotype FrequencyA1B1 x11
A1B2 x12
A2B1 x21
A2B2 x22
Allele FrequencyA1 p1 = x11 + x12
A2 p2 = x21 + x22
B1 q1 = x11 + x21
B2 q2 = x12 + x22
I Under no LDD = x11 − p1q1 = 0
I When only genotypes are available, EM algorithm can be usedto estimate haplotype frequencies
![Page 46: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/46.jpg)
Linkage Disequilibrium (LD)
I LD block structure for a particular region.
![Page 47: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/47.jpg)
LD
I LD is an extremely important feature in genetic data
I It allows to investigate diseases with fewer markers
![Page 48: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/48.jpg)
LD
I LD makes it hard to adjust for the multiple tests and to findcausal variants.
![Page 49: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/49.jpg)
Summary
Today we cover
I Course overview
I Epidemiologic study design
I Genome, DNA-mutation
I Hardy-Weinberg equilibrium
I Linkage disequilibrium
![Page 50: Special Topics in (Genetic) Epidemiology · Overview I This course will review many (but selected) current developments in statistical methods for (genetic) epidemiologic studies](https://reader030.vdocument.in/reader030/viewer/2022041107/5f0af36e7e708231d42e2432/html5/thumbnails/50.jpg)
Recommended to read
I Thomas, D. C. (2004). Statistical methods in geneticepidemiology. Oxford University Press.
I Weiss, N. S., & Koepsell, T. D. (2014). Epidemiologicmethods: studying the occurrence of illness. Oxford UniversityPress.
I Breslow, N. E. & Day, N. E. (1980). Statistical Methods inCancer Research. International Agency for Research on Cancer