by: brittany duncan mentors: janet sinsheimer phd (ucla) mary sehl m.d.(ucla) dna repair snps...

By: Brittany DuncanMentors:

Janet Sinsheimer PhD (UCLA)Mary Sehl M.D.(UCLA)

DNA repair SNPs Associated with Breast

Cancer

What We Aim to Do

To ultimately determine: What SNP and Environmental factors

contribute to breast cancer Whether a combination of SNPs acting

independently might be significant SNP-SNP interactions associated with

breast cancer

Why is this Important?

Medical: Determining SNP associations with Breast

Cancer would: Help predict and prevent future cases

Bioinformatics: Comparing two analysis techniques will:

Help to create generalized method for analyzing future SNP interactions

SNP-Single Nucleotide Polymorphism

www.dnalandmarks.com/.../marker_systems_snp.html

•A single nucleotide change at one particular locus

•Must be present in at least 1% of the population

•Can result in genotypic and phenotypic effects

ACCGTTGTGACCTGCAGTGGAAACAGTATGA

ACCATTGTGACATGCAGTGGAAACAGTGTGA

http://www.dnalandmarks.com/.../marker_systems_snp.html

http://www.dnalandmarks.com/.../marker_systems_snp.html

Mechanisms of DNA Repair

NER = nucleotide-excision repair, BER = base-excision repair, MMR = mismatch repair, DSBR =double strand break repair, DRCCD = damage recognition cell cycle delay response, NHEJ = non-homologous end-joining HR = Homologous Recombination

DSBR pathway

DSBR pathway Double stranded break repair pathway

One mechanism responsible for the repair and maintenance of the integrity of DNA

BRCA1 and 2 key elements in this pathway

Vulnerability to breast cancer may be due to an individual’s capability in repairing damaged DNA

Steps to Success

Recreate data found in previous paper

Implement Cordell and Clayton: Stepwise regression method

Write up results and Create tables

Future Direction: Compare results to Lasso method

UCLA Cancer Registry

UCLA familial cancer registry Participants may have cancer or not but must

meet these criteria: Be 18 yrs or older Two family members with a same type of

cancer or related cancers Or must have a family history of cancer

susceptibility Mutation in BRCA1 or BRCA2 gene

http://www.registry.mednet.ucla.edu/

http://www.registry.mednet.ucla.edu/

Preliminary Work

Case/control study 399 Caucasian (unrelated) women were chosen

for study 104 SNPs in 17 genes of the DSBR pathway were

chosen Logistic regression analysis conducted on each SNP

to determine associations with breast cancer Adjusted models to include covariates

Findings 12 significant SNPs

Confirming Data:The Process

First Step: Defining Variables

Genotype. Frequency DV DVG – G 199 +0 +0A – G 143 +1 +1A – A 19 +2 +1

Additive• A allele confers risk in having breast cancer and

A-A even more soDominant• A allele confers risk in having breast cancer

regardless of number of copies

Example of SNP rs16889040 on RAD21 gene, Chromosome 5

Additive Dominant

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.42388 0.72444 -1.965 0.049358 age 0.04464 0.01305 3.419 0.000628 brca1 0.49067 0.39063 1.256 0.209079 brca2 -0.11683 0.49631 -0.235 0.813896 EDUCATION1 0.08139 0.33849 0.240 0.809976 EDUCATION2 0.28671 0.34757 0.825 0.409424 Ashkenazi_status -0.68789 0.28608 -2.405 0.016192 SNP -0.76382 0.27855 -2.742 0.006104

Logit(Y) = B0 + B1X1 ….+ Bn Xn

Example output from Logistic Regression Dominant Model rs16889040

Education

MRE11A

NBS1 RAD50

ATM

BRCA1

XRCC6 XRCC5

DNA-PK

XRCC4LIG4

ZNF350

BRIP1

RAD51

BRCA2

RAD54L

RAD52

XRCC2

XRCC3

TP53

Double-Strand Break

Repaired DNA

Non-Homologous End Joining

H2AX

H2AX

RAD21

Homologous Recombination

Cordell and Clayton Method:Stepwise Logistic Regression

Stepwise Logistic Regression:

Stepwise logistic regression Cordell and Clayton Method used 8 genes that had significant SNPs in

them Ran forward regression analysis on each gene

Performed LRT and from test found p-value

Cumulative Effects

Cumulative Effects: SNPs in model but act independently

Findings: No Accumulation of SNPS were

found significant

Interactive Effects

Multiplicative effects- interaction between SNPs

Findings: RAD21 Gene interesting but not enough information to be

considered significant SNPd: SNPf SNPd: SNPg SNPf: SNPg

Three way interaction was found to be not significant

SNPd = rs16888927

SNPf = rs16888997

SNPg = rs16889040

SNP Interactions

SNPs OR(eβ) p-value .

SNPd: SNPf 1.81212 0.090404

SNPd: SNPg 1.76986 0.096392

SNPf: SNPg 1.78383 0.090659

Using p-value threshold of 0.05

Special Thanks

To my amazing mentors at UCLA: Janet Sinsheimer PhD, Biostatistics lab Mary Sehl M.D., Dr. Sinsheimer’s lab UCLA

For making the SoCalBSI program possible: The wonderful mentors at California State Los Angeles

Dr. Momand , Dr. Warter Perez, Dr. Sharp, Dr. Johnston, Mr. Johnston, Dr. Huebach, Dr. Krilowicz

Program Coordinator Ronnie Cheng

Funding: American Society of Clinical Oncology – Mary Sehl National Science Foundation - SOCALBSI National Institute of Health - SOCALBSI Economic and Workplace Development -SOCALBSI

Question Slides

Recoding for Education

Why Use Education?

Why Only Caucasian Women?

LRT/Chi^2

NEHJ and HR

Multiple vs Independent

LRT Test

Three Way Interaction

OR

Lasso Method

Recoding for Education Logistic Regression Education: 1-8 answers in a survey

1-3 highest education high school (control) 4-5 some college 6-8 higher education Educ1 Educ21-3 0 0 μ1 = μ + 0X α1 + 0Xα24-5 1 0 μ2 = μ + 1X α1 + 0X α26-8 0 1 μ3 = μ + 0X α1 + 1X α2

Coded in 0 and 1 transformation from linear to logistic Linear: Y = B0 + B1X1 ….+ Bn Xn

Logistic: ln[ pi/(1-pin) ] = B0 + B1X1 ….+ Bn Xn Y == {0,1} Essentially the log of the probability of the odds

Back

Why Use Education as a Covariate?

Routinely include at least 1 socioeconomic covariate

Education: Not necessarily because statistically

interesting, but because other studies have repeatedly found significance

Back

Why Only White Women?

Homogeneous Population In different populations (men and other

ethnicities), different genes may be involved Not enough sampling of any other group How data was found:

Registry Website and Questionnaire in English Location of UCLA Etc…

Back

LRT

Roughly estimated as a chi-squared distribution

X2= 3.84 for 1 df

P-val = .05

http://www.union.edu/PUBLIC/BIODEPT/chi.html Back

http://www.union.edu/PUBLIC/BIODEPT/chi.html

Cell cycle with NEHJ and HR

Alignment and ligation of termini at DSB

HR

http://www2.mrc-lmb.cam.ac.uk/personal/sl/Html/Graphics/CellCycle.gifLord, Garret, Ashworth Clin Cancer Res 2006; 12(15)

GC- use sister chromatid as template

SSA-homologous sequences aligned, residues no longer present are deleted Back

http://www2.mrc-lmb.cam.ac.uk/personal/sl/Html/Graphics/CellCycle.gif

Multiple vs. Acting Independently

Cumulative:

logit(P(Y)) = α + βTz +Ɣ1SNP1 + Ɣ2SNP2

Multiplicative:

logit(P(Y)) = α + βTz +Ɣ1SNP1 + Ɣ2SNP2 +Ɣ3SNP1*SNP2

Covariates

Independent

Combination of two

Back

LRT Test

Equ: LRT= 2ln(L(HA)/L(H0) )

For a 1 df, 3.84 or higher corresponds to a p-value of 0.05 or lower

Alternative model fits the data better

Less than 3.84

Null model fits the data better

Testing for which model fits the data better

Back

Three Way Interaction

logit(P(Y)) = α + βTz +SNPd + SNPf + SNPg +SNPd*SNPf*SNPg

Covariates

Back

ODDS RATIO

Coded in 0 and 1 transformation from linear to logistic Linear: Y = B0 + B1X1 ….+ Bn Xn

Logistic: ln[ pi/(1-pin) ] = B0 + B1X1 ….+ Bn Xn

Y == {0,1}

Odds Ratio is eB because of Logistic Regression’s Transformed form

Back

Lasso Penalized Regression

Exploratory method used when large amount of predictors and small amount of data

Penalizes model for having to many borderline significant predictors

F(θ) = 1/2Σi(yi - μ –Σj(xijβj))2 + λΣj| βj |

Least Squares Penalty Term

Back

by: brittany duncan mentors: janet sinsheimer phd (ucla) mary sehl m.d.(ucla) dna repair snps...

Documents

breast cancer slide

type of cancer

process slide

significant snps slide

damaged dna slide

snp associations

additive dominant slide

ucla dna repair snps