robert tibshirani, stanford university april 20,...

36
Two novel applications of selective inference Robert Tibshirani, Stanford University April 20, 2015 1 / 36

Upload: others

Post on 25-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Two novel applications of selective inference

Robert Tibshirani, Stanford University

April 20, 2015

1 / 36

Page 2: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Outline

1. Review of polyhedral lemma for selective inference

2. Clustering followed by lasso— Protolasso (joint with StephenReid)

3. Assessment of internally derived predictors (joint with SamGross and Jonathan Taylor)

2 / 36

Page 3: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Where selective inference started for me

”A significance test for the lasso”– Annals of Statistics 2014(with discussion)

Richard Lockhart Jonathan Taylor Simon Fraser University !!!!!!!!!!!!!!!!Stanford!University!

Vancouver !!!!!!!!!!PhD!Student!of!Keith!Worsley,!2001!PhD . Student of David Blackwell, !

Berkeley,!1979!!

Ryan  Tibshirani  ,    CMU.  PhD  student  of  Taylor  2011  

                 Rob  Tibshirani                                Stanford  

3 / 36

Page 4: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

What it’s like to work with Jon Taylor

4 / 36

Page 5: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

References

I Reid and Tibshirani (2015) Sparse regression and marginaltesting using cluster prototypes. arXiv:1503.00334

I Jason D. Lee, Dennis L. Sun, Yuekai Sun, Jonathan E. Taylor(2014). Exact post-selection inference, with application to thelasso. arXiv:1311.6238

I Gross, Taylor, Tibshirani (2015). in preparation.

I Hastie, Tibshirani, Wainwright

Statistical Learning with Sparsity: The Lasso andGeneralizationsJune 26, 2015 (has a chapter on selective inference!)

5 / 36

Page 6: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Ongoing work on selective inference

I Optimal inference after model selection (Fithian, Sun, Taylor)

I LAR+forward stepwise (Taylor, Lockhart, Tibs times 2)

I Forward stepwise with grouped variables (Loftus and Taylor)

I Sequential selection procedures (G’Sell, Wager, Chouldechova,Tibs) to appear JRSSB

I PCA (Choi, Taylor, Tibs)

I Marginal screening (Lee and Taylor)

I Many means problem (Reid, Taylor, Tibs)

I Asymptotics (Tian and Taylor)

I Bootstrap (Ryan Tibshirani+friends) in preparation

6 / 36

Page 7: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

R package in development

selectiveInference

Joshua Loftus, Jon Taylor, Ryan Tibshirani, Rob Tibshirani

I Forward Stepwise regression, including AIC stopping rule,categorical variablesforwardStepInf(x,y)

I Fixed lambda LassofixedLassoInf(x,y)

I Least angle regressionlarInf(x,y)

These compute p-values and selection intervals

7 / 36

Page 8: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

The polyhedral lemma

I Response vector y ∼ N(µ,Σ). Suppose we make a selectionthat can be written as

Ay ≤ b

with A, b not depending on y .

I Then for any vector η

F[V−,V+]

η>µ,σ2η>η(η>y)|{AL,λy ≤ bL,λ} ∼ Unif(0, 1)

(truncated Gaussian distribution), where V−,V+ are(computable) constants that are functions of η,A, b.

8 / 36

Page 9: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

V+(y)V−(y)

Pη⊥y

Pηy

ηTy

y

η

{Ay ≤ b}

9 / 36

Page 10: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

The polyhedral lemma: some intuition

I Forward stepwise regression, with standardized predictors:suppose that we choose predictor 2 at step 1, with a positivesign.

I Then predictor 2 “won the competition” and so we know that〈x2, y〉 > ±〈xj , y〉 for all j 6= 2

I Hence to carry out inference for β2, we condition on the set ofy such that 〈x2, y〉 > ±〈xj , y〉 for all j 6= 2. This can bewritten in the form Ay ≤ b.

I From this, it turns out that Vm = |〈xk , y〉| where xk has thesecond largest absolute inner product with y , and Vp = +∞.We carry out inference using the Gaussian distributiontruncated to lie in (Vm,+∞)

I In subsequent steps ` > 1, we condition on results of“competitions” in all of the previous `− 1 steps.

10 / 36

Page 11: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Protolasso: regression with correlated predictors

I Outcome y , predictors X1,X2, . . .Xp.

I The predictors fall into K groups, and are correlated withingroups. The groups may be pre-defined, or discovered viaclustering

11 / 36

Page 12: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Protolasso: A picture

−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−

0 20 40 60 80 100

−3

−1

12

3

ordinary lasso

Variable index

−−−−−−−−−

−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−

− −−−−−

−−

x

xxxxxxxxx

x

xxxxxxxxx

x

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

−−−−−−−−−

−−−−

−−−−

−−

−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−

−−

0 20 40 60 80 100

−3

−1

12

3

protolasso

Variable index

−−−−−−−−−

−−−−

−−−−−−

−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−

−−−

−−−

−−

−−

x

xxxxxxxxx

x

xxxxxxxxx

xxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

12 / 36

Page 13: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Protolasso

1. Cluster predictors into disjoint groups C1,C2, . . . ,CK . Can useany clustering method. We use hierarchical clustering here.

2. Choose prototypes: Pk ∈ Ck (one from each cluster):

Pk = xj : j = argmaxj∈Ck|corr(xj , y)|

for each cluster Ck ⊂ {1, 2, . . . , p} and k = 1, 2, . . . ,K

3. Run Lasso on prototypes, giving selected prototypes M ⊂ P.

4. Do Inference on :I Selected prototypesI Other members of selected clusters.

13 / 36

Page 14: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Post selection inferenceLee et al (2014)

I Focus on Lasso as selection procedure

βλ = argminβ

[1

2||y − Xβ||22 + λ||β||1

]I Define set E ⊂ {1, 2, . . . , p} of selected variables and zE –

their coefficient signs.I Two important results

I Affine constraint:

{E , zE} = {AL,λy ≤ bL,λ}

I Conditional distribution:

F[V−,V+]η>µ,σ2η>η

(η>y)|{AL,λy ≤ bL,λ} ∼ Unif(0, 1)

I Lee and Taylor (2014) extend results to marginal correlationselection as well.

14 / 36

Page 15: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Post selection inferenceExtending to protolasso

I Condition on two-stage selection:I Prototype set: P = P, and signs of marginal correlations of

prototypes with yI Subset of prototypes selected by lasso : M = M and their

signs.

I Stack the constraint matrices: one per cluster prototype andone for second stage lasso. Reid and Tibshirani (2015).

I Do inference on η>µ using the distribution of η>y conditionalon the selection procedure.

15 / 36

Page 16: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Post selection inferenceExtending to protolasso

I p-values or selection intervalsI For selected prototypes: η = (X>

M )+j (partial regressioncoefficients)

I For other members of selected clusters: η = (X>M\{Pk}∪{k1})+j .

(swap out prototype for another cluster member)

16 / 36

Page 17: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

A picture

−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−

0 20 40 60 80 100

−3

−1

12

3

ordinary lasso

Variable index

−−−−−−−−−

−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−

− −−−−−

−−

x

xxxxxxxxx

x

xxxxxxxxx

x

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

−−−−−−−−−

−−−−

−−−−

−−

−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−

−−

0 20 40 60 80 100

−3

−1

12

3

protolasso

Variable index

−−−−−−−−−

−−−−

−−−−−−

−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−

−−−

−−−

−−

−−

x

xxxxxxxxx

x

xxxxxxxxx

xxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

17 / 36

Page 18: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

The Knockoff methodBarber and Candes (2014)

I FDR-controlling framework for variable selection.

I Construct knockoff matrix of p fake predictors X , with theproperties

1. corr(xj , xk) = corr(xj , xk)2. corr(xj , xk) = corr(xj , xk),∀j 6= k3. corr(xj , xj) small

I Then run a competition: run your selection procedure on the2p predictors (X , X ) and see if xj or xj comes in first.

I These are equally likely for noise predictors; xj should win ifit’s signal!

18 / 36

Page 19: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Knockoff for LassoBarber and Candes (2014)

I Statistics for each pair of original and knockoff variables:Wj = Zj − Zj , j = 1, 2, . . . , p where Zj = supλ{λ; βj(λ) 6= 0}and similarly for Zj .

I Data dependent threshold,

T = min[t ∈ W :

1+#{j :Wj≤−t}#{j :Wj≥t}∨1 ≤ q

], where

W = {|Wj | : j = 1, . . . , p} \ {0}.I Select features with Wj ≥ T to control FDR at q.

19 / 36

Page 20: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

KnockoffBarber and Candes (2014)

LemmaLet ε ∈ {±1}p be a sign sequence independent ofW = (W1,W2, . . . ,Wp), with εj = +1 for all non-null variables

and εji .i .d∼ {±1} for null variables. Then

(W1,W2, . . . ,Wp)d= (ε1 ·W1, ε2 ·W2, . . . , εp ·Wp)

20 / 36

Page 21: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Knockoff method for protolasso

1. Cluster columns of X : {C1,C2, . . . ,CK}.2. Split the data by rows into two (roughly) equal parts:

y =

(y (1)

y (2)

)and X =

(X (1)

X (2)

).

3. Extract Prototypes: P = {P1, P2, . . . , PK} using only y (1)

and X (1).

4. Form knockoff matrix X (2) from X (2).

5. Reduce to matrices X(2)

Pand X

(2)

P.

6. Proceed with knockoff screening using y (2) and[X

(2)

PX

(2)

P

].

21 / 36

Page 22: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Knockoffs for protolasso

x x x x

xxx x

X(1)

X(2)

X(2)

y(1)

y(2)

Knockoff

22 / 36

Page 23: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

FDR control

We show that that Barber and Candes’ Lemmas are satisfied, andhence this procedure achieves FDR control

23 / 36

Page 24: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Knockoffs for protolasso

I W(2)k = Z

(2)k − Z

(2)k .

●●

●●●

1 2 3 4 5 6 7 8 9

0.00

0.15

0.30

signal amplitude

aver

age

FD

P

●●

●●

●●

●●●

●●

●●

●●●●●

1 2 3 4 5 6 7 8 9

0.0

0.4

0.8

signal amplitude

aver

age

pow

er

1 2 3 4 5 6 7 8 9

0.00

0.15

0.30

signal amplitude

aver

age

FD

P

●●●●●

●●

1 2 3 4 5 6 7 8 9

0.0

0.4

0.8

signal amplitudeav

erag

e po

wer

1 2 3 4 5 6 7 8 9

0.00

0.15

0.30

signal amplitude

aver

age

FD

P

1 2 3 4 5 6 7 8 9

0.0

0.4

0.8

signal amplitude

aver

age

pow

er

24 / 36

Page 25: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Summary

I High correlation amongst variables.

I Cluster and prototype.

I The test/regress – use post selection framework.

I Extensions to knockoff and non-Gaussian models.

I Extension to marginal testing (Prototest)

25 / 36

Page 26: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Assessment of internally derived predictors

I In biomarker studies, an important problem is to compare apredictor of disease outcome derived from gene expressionlevels to standard clinical predictors.

I Comparing them on the same dataset that was used to derivethe biomarker predictor can lead to results strongly biased infavor of the biomarker predictor.

I Efron and Tibshirani (2002) proposed a method called“pre-validation” for making a fairer comparison between thetwo sets of predictors. Doesn’t exactly work.

I Can do better via selective inference

26 / 36

Page 27: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Xy eta Z y

X yZ

yeta

Naive

Pre−validation

27 / 36

Page 28: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

An example

An example of this problem arose in the paper of Van deVeer et al(2002). Their microarray data has 4918 genes measured over 78cases, taken from a study of breast cancer. There are 44 cases inthe good prognosis group and 34 in the poor prognosis group. Themicroarray predictor was constructed as follows:

1. 70 genes were selected, having largest absolute correlationwith the 78 class labels

2. Using these 70 genes, a nearest centroid classifier wasconstructed.

3. Applying the classifier to the 78 microarrays gave adichotomous predictor ηj for each case j .

4. η was then compared with clinical predictors for predictinggood or bad prognosis (logistic regression)

28 / 36

Page 29: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Pre-validation

1. Divide the cases up into K = 13 equal-sized parts of 6 caseseach.

2. Set aside one of parts. Using only the data from the other 12parts, select the genes having absolute correlation at least .3with the class labels, and form a nearest centroid classificationrule.

3. Use the rule to the predict the class labels for the 13th part

4. Do steps 2 and 3 for each of the 13 parts, yielding a“pre-validated” microarray predictor zj for each of the 78cases.

5. Fit a logistic regression model to the pre-validated microarraypredictor and the 6 clinical predictors.

29 / 36

Page 30: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

P-values on Breast cancer data

Naive test: 1.1e-06 ;

Prevalidated test: 0.0378

30 / 36

Page 31: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Pre-validation doesn’t exactly work

Type I error is somewhat close to nominal level, butanti-conservative

31 / 36

Page 32: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Selective inference to the rescue!

I Response y , external predictors Z = (Z1,Z2, . . .Zm),biomarkers X = (X1,X2, . . .Xp).

I We model y ∼ X to give η, then we run lasso y ∼ (Z , η)

I If first stage model is simple, like

“choose η = Xj most correlated with y”,

then we can simply add constraints to matrix A in selectionrule Ay ≤ b.

32 / 36

Page 33: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

More generally

I Fit y ∼ X by least squares or lasso to yield prediction η.

I Then fit y ∼ (Z , η). We define test for significance of η as

T = (RSS0 − RSS)/RSS ∼ P

I If first stage model is least squares on p predictors, thenP = Fc,d with c = p, d = n − (m + p)

I If first stage model is lasso, the P is a union of truncated Fdistributions.

33 / 36

Page 34: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Simulation

n = 50, nx = 200, nz = 20, coef of X is zero; λ set to 1.5

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Null Simulation, Prevalidation

Expected

Obs

erve

d

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Null Simulation, Selective Inference

Expected

Obs

erve

d

34 / 36

Page 35: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

P-values on Breast cancer data

Naive test: 1.1e-06 ;

Prevalidated test: 0.0378

Selective test: 0.5028

35 / 36

Page 36: Robert Tibshirani, Stanford University April 20, 2015statweb.stanford.edu/~tibs/ftp/lockharttalk.pdf · lasso. arXiv:1311.6238 I Gross, Taylor, Tibshirani (2015). in preparation

Conclusions

I Selective inference is an exciting new area. Lots of potentialresearch problems and generalizations(grad students take note)!!

I Practitioners: look for selectiveInference R package

I Coming soon: Deep Selective Inference r

36 / 36