10 liu, dajiang

10
Statistical Genetics Using Sequence Data Dajiang J. Liu Department of Statistics

Upload: hadley-wickham

Post on 02-Dec-2014

1.107 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 10 Liu, Dajiang

Statistical Genetics Using Sequence Data

Dajiang J. Liu

Department of Statistics

Page 2: 10 Liu, Dajiang

Why We Study Statistical Genetics• Statistics is originated from genetics• R.A. Fisher: “The Correlation Between Relatives on the Supposition of Mendelian

Inheritance”– Introduced the concept of variance in this article

• Francis Galton: Regression of human height toward the mean:– Introduced correlation and regression

• Karl Pearson: – “Mendelism and the problem of mental defect”– “Tuberculosis, heredity and environment”

• Why don’t we seek our roots?

• In order to find disease genes in the genome, statistics is a must

Page 3: 10 Liu, Dajiang

Statistical Genetics

• Disease gene mapping: – The determination of the sequence of genes and their

relative distances from one another on a specific chromosome

– Technology driven field:1. Mendel’s era: Segregation Analysis

- Patience: peas, fruit fly: inbreeding is necessary

Experimental Design

Page 4: 10 Liu, Dajiang

Statistical Genetics• Modern era:

– Microsatellite Markers:• Genetic linkage analysis

– Extremely successful for mapping and identifying Mendelian traits

– Single nucleotide polymorphism (SNP) marker• Case control studies:

– Genome Wide Association Studies: To identify common variants involved in complex traits

ComputationalTechniques for

likelihood in Pedigrees

Statistics play a major role

Page 5: 10 Liu, Dajiang

Statistical Genetics• Sequencing Era:

• Study of diseases due to rare variants is emerging

ABI SOLiD sequencer

Statistics is ALL for sequencing data

Page 6: 10 Liu, Dajiang

Statistical Genetics

• Data we work with

Human Genome Project

Hap Map Project

1000 GenomeProject

Page 7: 10 Liu, Dajiang

Multi-facotorial Disease Etiology Hypothesis

• Common Disease Common Variants Hypothesis (CD/CV) hypothesis:– Common diseases are caused by a few common variants with

moderate effect– E.g. Age-related Macular Degeneration:

• Common variants are likely to have lower odds ratio than rare variants:

Page 8: 10 Liu, Dajiang

Multi-facotorial Disease Etiology Hypothesis

• Common Disease Rare Variants Hypothesis:– Common diseases are caused by multiple rare

variants with large effect size:– The discovery of rare variants will have high impact

on public health since they will aid in risk prediction and treatment

• E.g. Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol

• E.g. Colorectal Adenomas

Page 9: 10 Liu, Dajiang

Challenges on Statistical Methodologies• Variants misclassification:

– Non-causal variants Included:• Huge number of mutations on the genome:

– Most of them are not causing the disease under study

– Causal Variants Excluded:• Intronic mutations:• Intergenic regions:

• Unknown patterns of interactions:1. Within gene interactions: e.g. Hirschsprung’s disease (RET gene)2. Gene x gene interactions: e.g. breast cancer genes (BRCA 1 BRCA2 x

CHEK2)

Adaptive methods are needed

Page 10: 10 Liu, Dajiang

Kernel Based Adaptive Clustering• Combine variant classification with association testing into a

coherent framework• Applicable to population based case/control studies using unrelated

individuals• Robust against variants misclassifications• Can handle gene x gene interactions and gene x environment

interactions