jaya m. satagopan memorial sloan-kettering cancer center ...jaya m. satagopan memorial...
TRANSCRIPT
![Page 1: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/1.jpg)
Introductory Bayesian Analysis
Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate)
March 14, 2013
![Page 2: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/2.jpg)
Bayesian Analysis
• Fit probability models to observed data
• Unknown parameters – Summarize using probability distribution – For example, P(mutation increases risk by 10% | data) – Posterior distribution
• Prior information – External data – Elicit from available data
![Page 3: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/3.jpg)
This lecture
• Bayes theorem – Prior from external source
• Loss function, Expected loss
• Bayesian analysis with data-adaptive prior – Minimize squared error loss
• Bayesian penalized estimation – Prior to minimize other loss functions
• Software packages – Winbugs, SAS
![Page 4: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/4.jpg)
Part 1. Bayes Theorem
![Page 5: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/5.jpg)
Bayes Theorem
• Random variables: Y and θ
• Prior distributions: P(Y), P(θ)
• Conditional distributions: P(Y | θ) and P(θ | Y)
• Know P(θ | Y), P(Y), and P(θ) • • Need P(Y | θ) [posterior distribution]
P Y ! ( ) = P ! Y ( ) ! P Y ( )
P ! ( ) =
P ! Y ( ) ! P Y ( )P ! Y ( )P Y ( )dY"
![Page 6: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/6.jpg)
Example
Say, 5% of the population has a certain disease. When
a person is sick, a particular test is used to determine
whether (s)he has this disease. The test gives a
positive result 2% of the times when a person actually
does not have the disease. The test gives a positive
result 95% of the times when the person does indeed
have the disease. Now, one person gets a positive test.
What is the probability the person has this disease?
![Page 7: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/7.jpg)
Example continued
• Y = 1 (disease) or 0 (no disease) • θ = 1 (positive test) or 0 (negative test)
KNOWN: • P(Y = 1) = 0.05 P(Y = 0) = 1 – P(Y = 1) = 0.95 • P(θ = 1 | Y = 0) = 0.02 P(θ = 1 | Y = 1) = 0.95 NEED: • P(Y = 1 | θ = 1)
P Y =1 ! = 1 ( ) = P ! = 1 Y= 1 ( ) P Y =1( )
P ! = 1 ( )
= P ! = 1 Y= 1 ( ) P Y =1( )
P ! = 1 Y = 1 ( ) P Y = 1 ( ) + P ! = 1 Y = 0 ( ) P Y = 0 ( )
= 0.95!0.050.95!0.05+0.02!0.95
= 0.714
![Page 8: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/8.jpg)
Example – Breast Cancer Risk
• Case-control sampling – Cases (Y = 1) have breast cancer – Controls (Y = 0) do not have breast cancer
• Record BRCA1/2 mutation – Mutation present (θ = 1) or absent (θ = 0)
• Observe P(θ = 1 | Y = 1) and P(θ = 1 | Y = 0) – Mutation frequency in cases and controls
• Need: P(Y = 1 | θ = 1) – Disease risk among mutation carriers
Satagopan et al (2001) CEBP, 10:467-473
![Page 9: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/9.jpg)
Breast cancer risk (continued)
• Use Bayes theorem
• P(θ = 1 | Y = 1) = mutation frequency in cases • P(θ = 1 | Y = 0) = mutation frequency in controls
• P(Y = 1) = 1 – P(Y = 0) = prior information
• Get prior from external source (SEER Registry)
P Y = 1 ! = 1 ( ) = P ! = 1 Y = 1 ( ) P Y =1( )
P ! = 1 Y = 1 ( ) P Y =1( ) + P ! = 1 Y = 0 ( ) P Y = 0( )
![Page 10: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/10.jpg)
Breast cancer risk (continued)
BRCA Muta*on
Case Control
Present 25 23 Absent 179 1090
• P(θ = 1 | Y = 1) = 25/204
• P(θ = 1 | Y = 0) = 23/1113
• P(Y = 1) = 0.0138 – Disease risk in the 40-49
age group (SEER registry)
• P(Y = 1 | θ = 1) = 7.6%
Data for Age group 40-49
http://seer.cancer.gov
![Page 11: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/11.jpg)
Part 2. Loss function, Bayes estimate
![Page 12: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/12.jpg)
Loss Function and Expected Loss
• Parameter θ • Decision (estimate) d(Y) based on data Y • Loss incurred = L(d(Y), θ) ≥ 0
• Squared error loss L(d(Y), θ) = [d(Y) - θ]2
• Absolute deviation L(d(Y), θ) = |d(Y) - θ|
• Expected loss = Risk = R(d,θ) = E{L(d(Y), θ)}
( ) ( )( ) ( )∫= dY Yf ,YdL ,dR θθθ
![Page 13: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/13.jpg)
Bayes Estimation
• There is no single d that has small R(d,θ) for all θ. – No uniformly best d
• Bayes approach
• Get d that minimizes the average risk W(d). – W(d) is also known as the Bayes risk
• Bayes estimate dB of d: W(dB) ≤ W(d)
• For squared error loss, dB is the posterior mean of θ – dB(Y) = E(θ | Y)
( ) ( )( ) ( ) ( )∫ ∫= θθθ dG dY Yf ,YdL dW
![Page 14: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/14.jpg)
Part 3. Bayesian analysis with data-adaptive prior parameters
GxE example
![Page 15: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/15.jpg)
Bayesian analysis of GxE interactions
• Case-control study Y = 1 (case) Y = 0 (control) • Binary risk factors (say) • Genetic factor: G = 0, 1 • Environmental exposure: E = 0, 1
• Is there a significant interaction between G and E ?
• Estimate interaction odds ratio and standard error
Test: • Is this odds ratio = 1? Is this log(odds ratio) = 0 ?
Mukherjee and Chatterjee (2008). Biometrics, 64: 685-694
![Page 16: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/16.jpg)
Interaction odds ratio (ORGE)
Y = 0 (Control data)
E = 1 E = 0
G = 1 N011 N010
G = 0 N001 N000
Y = 1 (Case data)
E = 1 E = 0
G = 1 N111 N110
G = 0 N101 N100
OR0 = Odds of E associated with G among controls OR1 = Odds of E associated with G among cases
OR0 = N011 N000
N001 N010
OR1 = N111 N100
N101 N110
ORGE = OR1
OR0
( )( ) ( )
controlcase
11
GEGE
ORlogORlogORlog
ββ
β
ˆ - ˆ =
- = = ˆ Var !̂GE( ) = Var !̂case( ) + Var !̂control( )
![Page 17: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/17.jpg)
Gene-Environment independence in controls
Y = 0 (Control data)
E = 1 E = 0
G = 1 N011 N010
G = 0 N001 N000
OR0 = N011 N000
N001 N010
= 1
ORGE = OR1
Var !̂GE( ) = Var !̂case( ) < Var !̂case( ) +Var !̂control( )
Independence of G and E in controls unknown. So … Test: βcontrol = 0 If hypothesis is rejected, estimate interaction OR as βGE = βcase - βcontrol. Otherwise, estimate as βGE = βcase Then test whether βGE = 0 for interaction Not a good idea !!
![Page 18: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/18.jpg)
Weighted estimate
• Estimate based on preliminary test T for β0 = 0
• Weighted average of case-only and case-control
estimates. Weights are indicator functions
• Can do better without requiring preliminary test !!
• Choose w to minimize squared error loss
• Bayes risk:
( ) GEcasePTGE, c>TI + c)<I(T = βββ ˆ ˆ ˆ
( ) GEcasewGE, w-1 + w = βββ ˆˆˆ
( ){ } - ˆ datadata GEwGE,GEEE βββ
![Page 19: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/19.jpg)
Bayes estimate
• w is function of and
Alternative explanation: – e is error due to assuming G and E independence in controls
• An estimate of e is: • Prior for e: N(0, σ2). • Bayes estimate of e is
• M & C (2008) suggest estimating σ2 as
• Empirical Bayes estimate:
ecaseBGE, ˆ ˆ += ββ
caseGEe ββ ˆ - ˆ ˆ = ( )2t e,Nee ~ ˆ
( ) et
eeE 22
2
ˆ ˆ +
=σσ
( )GEˆVar β ( )caseGE
2 ˆˆVar t ββ −=
( )GEˆVar β
( ) ˆ ˆ ˆ eeEcaseBGE, += ββ
Shrinkage estimation
![Page 20: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/20.jpg)
Advanced Colorectal Adenoma Example
• 610 cases and 605 controls • G = NAT2 acetylation (yes, no) • E = Smoking (never, past, current) • Note: lack of G and E independence in controls
– Need case-control estimate • EB estimate, credible interval. Is 0 in interval?
![Page 21: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/21.jpg)
Summary
• Uncertainty about underlying assumption • Two possible estimates
• Bayes estimate: weighted average of the two
• Shrinkage estimation
• Data-adaptive estimation of prior parameters – Minimize squared error loss
![Page 22: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/22.jpg)
Part 4. Bayesian penalized estimation
Prior to minimize various loss functions
![Page 23: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/23.jpg)
Part 4a. Bayesian Ridge Regression
Minimize Squared Error Loss Normal Prior
![Page 24: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/24.jpg)
GWAS data (Chen and Witte 2007, AJHG, 81: 397-404)
• 57 unrelated individuals of European ancestry (CEU) – HapMap project
• Outcome = Expression of the CHI3L2 gene – Cheung et al 2005, Nature, 437: 1365-1369
• Risk factors = 39,186 SNPs from Chromosome 1 – Illumina 550K array from HapMap
• SNP rs755467 deemed causal for CHI3L2 expression
• Goal: How well are the neighboring SNPs ranked well?
![Page 25: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/25.jpg)
Application to GWAS
• Y = continuous (or binary) outcome, length N (subjects) • Xm = m-th SNP, m = 1, 2, …, M (=500K, say)
• For each SNP, model: Y = µm + Xmβm + error
• βm is effect of SNP m MLE, std err, p-value
• Find the significant SNPs
• Find the SNPs having the 500 smallest p-values
Chen and Witte 2007. AJHG, 81: 397-404
![Page 26: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/26.jpg)
Hierarchical modeling
• Incorporate external information about SNPs • Bioinformatics data (Z matrix, user-specified)
– conservation, various functional categories
• β = Zπ + U – β length G, Z is G×K, π is K×1 – U is N(0, t2T) T is specified
• Improved estimation via second stage model
• Prior for β is N(Zπ, t2T) – Need {(β - Zπ)’T-1(β - Zπ)}/t2 to be small: Penalization
![Page 27: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/27.jpg)
Posterior inference via MCMC
• Markov chain Monte Carlo approach to get βs • Specify prior for β, π, σ2 • π ~ N(0, *) 1/σ2 ~ Gamma(**, $$) • Specify prior for t2 or fix t2
• Generate samples from full conditional distributions β | Y, π, σ2, t2, … π | Y, β, σ2, t2, … σ2 | Y, β, π, t2, … etc.
Itera*on β parameters
1 β1 β2 βG
2 β1 β2 βG
…
G β1 β2 βG
Posterior Summaries
Avg(β1) Stdev (β1)
Avg(β2) Stdev (β2)
Avg(βG) Stdev (βG)
![Page 28: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/28.jpg)
Chen and Witte GWAS Example
• Plot “p-values” of top 500 SNPs
![Page 29: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/29.jpg)
So, what is going on ?
• Y = µm + Xmβm + error • MLE of β’s • Variance
• β = Zπ + U, U ~ N(0, t2T) • MLE of π’s
• Bayes estimate of β’s
• Large t2: S ≈ 0 ≈ W and • Small t2: W ≈ I and
( ) ˆ , ,ˆ ,ˆ ˆG21 ββββ =
( ) ( ) 12T1T Tt V̂ S ,ˆSZSZZ ˆ−−
+== βπ
V̂
( ) V̂SW ,ˆWZˆW-I ~ =+≈ πββ
ββ ˆ ~ ≈
πβ ˆZ ~ =Shrinkage estimation
![Page 30: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/30.jpg)
Some Remarks
• Sensitivity to choice of prior parameters
• Instead of “p-value”, P(βm > 0), m = 1, …, G
• The Bayes estimate must ideally not be too sensitive to the choice of Z
• The estimated value of π will depend upon Z, but ideally the Bayes estimate should not.
β~
![Page 31: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/31.jpg)
Part 4b. Bayesian LASSO
Minimize Absolute deviation Laplace prior
![Page 32: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/32.jpg)
Diabetes data (Efron et al 2004, The Annals of Statistics, 32: 407-499)
![Page 33: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/33.jpg)
Application to the diabetes study
• Y = continuous (or other type of) outcome (N×1) • X = N×p vector of risk factors • β = p×1 vector of effects (parameters of interest) • Find the significant risk factors
• Y = Xβ + error
• Many p, potentially correlated risk factors etc
• Estimate β to minimize |β - β0| for some β0 (LASSO)
• β0 = 0 or β0 = Zπ, Z given and π must be estimated
Park and Casella (2008). J Am Stat Assoc, 103: 681-686
![Page 34: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/34.jpg)
Bayesian LASSO
• |β - β0| ≈ 1 – exp{ - |β - β0| } • LHS takes the form of a Laplace distribution
• Y = Xβ + error error ~ N(0, σ2I)
• Laplace prior for β with mean β0
• Mixture of normal prior for β and an exponential prior for its variance
( )
( ) 222
2
2
2
0
2j0j22
j0jj
dt t2
exp2
t21exp
21
exp2
f
⎭⎬⎫
⎩⎨⎧−
⎭⎬⎫
⎩⎨⎧ −−=
⎭⎬⎫
⎩⎨⎧ −−=
∫∞
σλ
σλ
ββπσ
ββσλ
σλ
β
![Page 35: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/35.jpg)
Bayesian LASSO setup
( )( )
( )( )21
2
22j
2j
22j
2j
22
a,a Gamma Inverse ~
tindependen p , 1, j ,lexponentia ~ t
tindependen p , 1, j t 0, N ~ t ,
I ,X N ~ , Y
σ
λ
σσβ
σβσβ
=
=
• tj2 are latent variables to facilitate MCMC steps
• a1 and a2 are specified (check for sensitivity)
• λ2 : empirical estimation from data or specify prior – Generally a Gamma(c1, c2) prior
![Page 36: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/36.jpg)
Parameter Estimation
• Get full conditionals, apply MCMC
• Bayes estimate of β – Posterior median
• Original LASSO: quadratic programming methods
![Page 37: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/37.jpg)
Part 4c. Other Bayesian Penalization Methods
Brief survey
![Page 38: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/38.jpg)
Bridge Regression
• Estimate β by minimizing
• γ is pre-specified
• γ = 1 is (Bayesian) LASSO
• γ = 2 is (Bayesian) Ridge
∑=
−p
1jij Z
γπβ
Fu 1998, JCGS, 7: 397-416
![Page 39: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/39.jpg)
Bayesian Elasticnet
• Estimate β by minimizing
• Compromise between LASSO and Ridge penalties
• Normal prior constrained within certain bounds
• Hans (2011). J Am Stat Assoc, 106: 1383-1393
( ) ( )∑∑==
−+−p
1j
2ij
p
1jij Z -1 Z πβλπβλ
![Page 40: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/40.jpg)
Software Packages
• WinBUGS – Specify model for outcome – Specify priors – Output estimated values of β and other parameters – Uses MCMC methods – Diagnostic plots – http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/
contents.shtml
• SAS Proc MCMC – http://support.sas.com/documentation/cdl/en/statug/63033/
HTML/default/viewer.htm#mcmc_toc.htm
![Page 41: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/41.jpg)
References: Textbooks • JS Maritz and T Lwin (1989). Empirical Bayes Methods.
Chapman and Hall.
• JM Bernardo and AFM Smith (1993). Bayesian Theory. Wiley.
• BP Carlin and TA Louis (1996). Bayes and empirical Bayes methods for data analysis. Chapman and Hall.
• A Gelman, JB Carlin, HS Stern, DB Rubin (1996). Bayesian data analysis. Chapman and Hall.
• WR Gilks, S Richardson, DJ Spiegelhalter (1996). Markov chain Monte Carlo in practice. Chapman and Hall.
• T Hastie, R Tibshirani, J Friedman (2001). The Elements of Statistical Learning. Springer.
![Page 42: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/42.jpg)
References: Some papers
• R Tibshirani (1996). Regression shrinkage and selection via the Lasso. JRSS – Series B, 58: 267-288.
• J Fu (1998). Penalized regression: The Bridge versus the Lasso. JCGS, 7: 397-416.
• MA Newton and Y Lee (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 56: 1088-1097.
• JM Satagopan, K Offit, W Foulkes, ME Robson, S Wacholder, CM Eng, SE Karp, CB Begg (2001). The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. Cancer Epidemiology,Biomarkers and Prevention 10: 467-473.
![Page 43: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/43.jpg)
References: Some papers
• CM Kendziorski, MA Newton, H Lan, MN Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 22:3899-3914.
• D Conti, V Cortessis, J Molitor, DC Thomas (2003). Bayesian modeling of complex metabolic pathways. Human Heredity, 56: 83-93.
• B Efron, T Hastie, I Johnstone, R Tibshirani (2004). Least angle regression. The Annals of Statistics, 32: 407-451.
• B Mukherjee, N Chatterjee (2008). Exploiting gene-environment independence for analysis of case-control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64: 685-694.
![Page 44: Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March](https://reader030.vdocument.in/reader030/viewer/2022040913/5e8a221289c8ac42b43ea4f9/html5/thumbnails/44.jpg)
References: Some papers
• GK Chen, JS Witte (2007). Enriching the analysis of genome-wide association studies with hierarchical modeling. AJHG, 81: 397-404.
• T Park, G Casella (2008). The Bayesian Lasso. JASA, 103: 681-686.
• M Park, T Hastie (2008). Penalized logistic regression for .detecting gene interactions. Biostatistics, 9: 30-50
• C Hans (2011). Elastic net regression modeling with the orthant normal prior. JASA, 106: 1383-1393.
Many more: Bioinformatics, Genetic Epidemiology, JASA, JRSS – Series B and C, PLoS One, …