semiparametric regression based on multiple sourcesbnk/aisc_2012.beam.pdfsemiparametric regression...

60
UMinformal Density Ratio Examples Semiparametric Statistical Formulation Combined semiparametric density estimators Semiparametric regression Application to Testicular Germ Cell Cancer Semiparametric Regression Based on Multiple Sources Benjamin Kedem Department of Mathematics & Inst. for Systems Research University of Maryland, College Park “Give me a place to stand and rest my lever on, and I can move the Earth”, (Archimedes, 287-212 B.C.) AISC, October 5-7, 2012 Benjamin Kedem AISC: October 5-7, 2012

Upload: doantruc

Post on 16-Apr-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Semiparametric Regression Based on MultipleSources

Benjamin Kedem

Department of Mathematics & Inst. for Systems ResearchUniversity of Maryland, College Park

“Give me a place to stand and rest my lever on, and I can move the Earth”,(Archimedes, 287-212 B.C.)

AISC, October 5-7, 2012

Benjamin Kedem AISC: October 5-7, 2012

Page 2: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

1 Cheng, K.F., and Chu, C.K. (2004), “Semiparametric density estimation under a two-sample density ratiomodel,” Bernoulli, 10(4), 583-604.

2 Fokianos, K. (2004). Merging information for semiparametric density estimation. Journal of the RoyalStatistical Society, Series B, 66, 941-958.

3 Fokianos, K., Kedem, B., Qin, J., Short, D.A. (2001). A semiparametric approach to the one-way layout.Technometrics, 43, 56-65.

4 McGlynn K.A., Sakoda1 L.C., Rubertone M.V., Sesterhenn I.A., Lyu C., Graubard B.I., Erickson R.L. (2007).Body size, dairy consumption, puberty, and risk of testicular germ cell tumors. American Journal ofEpidemiology, 165, 355-363.

5 Qin, J., Zhang, B. (1997). A goodness of fit test for logistic regression models based on case-control data.Biometrika, 84, 609-618.

6 Qin, J., and Zhang, B. (2005). Density Estimation Under A Two-Sample Semiparametric Model.Nonparametric Statistics, 1, 665-683.

7 Kedem, B., Kim, E-Y, Voulgaraki, A., and Graubard, B. (2009). Two-Dimensional Semiparametric DensityRatio Modeling of Testicular Germ Cell Data. Statistics in Medicine, 2009, 28, 2147-2159.

8 Voulgaraki, A., Kedem, B., and Graubard, B.(2012). "Semiparametric Regression in Testicular Germ CellData", Annals of Applied Statistics, 6, 1185-1208Supplement: DOI:10.1214/12-AOAS552SUPP

Benjamin Kedem AISC: October 5-7, 2012

Page 3: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Abstract

1 Information is combined from multiple multivariate sources.

2 Density ratio model: Multivariate distributions are “regressed" on a reference distribution.

3 Random cocariates.

4 Kernel density estimator from many data sources: more efficient than the traditional single-sample kerneldensity estimator.

5 Each multivariate distribution and the corresponding conditional expectation (regression) of interest areestimated from the combined data using all sources.

6 Graphical and quantitative diagnostic tools are suggested to assess model validity.

7 The method is applied in quantifying the effect of height and age on weight of germ cell testicular cancerpatients.

8 Comparison with multiple regression, GAM, and nonparametric kernel regression.

Benjamin Kedem AISC: October 5-7, 2012

Page 4: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Main PointsDensity Ratio ExamplesSemiparametric Statistical FormulationCombined semiparametric density estimatorsSome Simulation ResultsApplication to Testicular Germ Cell CancerSummary

Benjamin Kedem AISC: October 5-7, 2012

Page 5: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Main PointsDensity Ratio ExamplesSemiparametric Statistical FormulationCombined semiparametric density estimatorsSome Simulation ResultsApplication to Testicular Germ Cell CancerSummary

Benjamin Kedem AISC: October 5-7, 2012

Page 6: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Main PointsDensity Ratio ExamplesSemiparametric Statistical FormulationCombined semiparametric density estimatorsSome Simulation ResultsApplication to Testicular Germ Cell CancerSummary

Benjamin Kedem AISC: October 5-7, 2012

Page 7: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Main PointsDensity Ratio ExamplesSemiparametric Statistical FormulationCombined semiparametric density estimatorsSome Simulation ResultsApplication to Testicular Germ Cell CancerSummary

Benjamin Kedem AISC: October 5-7, 2012

Page 8: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Main PointsDensity Ratio ExamplesSemiparametric Statistical FormulationCombined semiparametric density estimatorsSome Simulation ResultsApplication to Testicular Germ Cell CancerSummary

Benjamin Kedem AISC: October 5-7, 2012

Page 9: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Main PointsDensity Ratio ExamplesSemiparametric Statistical FormulationCombined semiparametric density estimatorsSome Simulation ResultsApplication to Testicular Germ Cell CancerSummary

Benjamin Kedem AISC: October 5-7, 2012

Page 10: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Outline

1 Density Ratio Examples

2 Semiparametric Statistical Formulation

3 Combined semiparametric density estimators

4 Semiparametric regression

5 Application to Testicular Germ Cell Cancer

Benjamin Kedem AISC: October 5-7, 2012

Page 11: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Multiple filtering of a signal

f1(ω) = |H1(ω)|2f (ω).

. (1)

.

fq(ω) = |Hq(ω)|2f (ω)

That is, q “distortions” or multiple “tilting” of the same referencespectral density f .

Benjamin Kedem AISC: October 5-7, 2012

Page 12: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Linear system with Gaussian noise

X t = αX t−1 + εt , ε = (ε1t , ...., εqt , εmt)′

εjt ∼ gj ∼ N(0, σ2j ), j = 1, ...,q,m. Choose gm ≡ g. We get

many distortions of the same reference g:

g1(x) = eα1+β1x2g(x)

.

.

.

gq(x) = eαq+βqx2g(x)

Benjamin Kedem AISC: October 5-7, 2012

Page 13: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Analysis of variance

Consider the classical one-way ANOVA with m = q + 1independent normal random samples:

x11, . . . , x1n1∼g1(x).

.

.

xq1, . . . , xqnq∼gq(x)xm1, . . . , xmnm∼gm(x)

gj(x)∼N( µj, σ2), j = 1, ...,m.

Benjamin Kedem AISC: October 5-7, 2012

Page 14: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Then, holding gm(x) ≡ g(x) as a reference:

g1(x) = exp(α1 + β1x)g(x).

.

.

gq(x) = exp(αq + βqx)g(x)

αj =µ2

m− µ2j

2 σ2 , β j =µj− µm

σ2 , j = 1, ...,q

Benjamin Kedem AISC: October 5-7, 2012

Page 15: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

k -parameter exponential families

g(x ,θ) = d(θ)S(x)exp

{k∑

i=1

ci(θ)Ti(x)

}Let gj(x) ≡ g(x ,θj), g(x) ≡ g(x ,θm).Again, q distortions of a reference g(x):

g1(x) = exp{α1 + β′1h(x)}g(x).

.

.

gq(x) = exp{αq + β′qh(x)}g(x)

Benjamin Kedem AISC: October 5-7, 2012

Page 16: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Case-control: Multinomial logistic regression (Prenticeand Pyke 1979).

RV y s.t. P(y = j) = πj ,∑m

j=1 πj = 1.Assume: For j = 1, ...,m, and any h(x),

P(y = j |x) =exp( α∗j + β jh(x))

1+∑q

k=1 exp( α∗k+ βkh(x))

Define: f (x |y = j) = gj(x), j = 1, ...,m

Then with αj = α∗j + log[ πm/πj ], j = 1, ...,q, and gm ≡ g,

gj(x) = exp( αj + β jh(x))g(x), j = 1, ...,q.

Benjamin Kedem AISC: October 5-7, 2012

Page 17: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Weighted Distributions

Rao(1965) unified the concept of weighted distributions of theform:

pw (x ; θ, α) =W (x ;α)p(x ; θ)

E [W (X ;α)]

where W (x ;α) is known. This is an example of “tilting” of areference distribution. By changing the weight W (x ;α) weobtain different distortions of the same reference p(x ; θ).

Benjamin Kedem AISC: October 5-7, 2012

Page 18: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Biased sampling

Length-biased Sampling: Vardi (1982) introduced

F (y) = 1/µ∫ y

0xdG(x), y ≥ 0,

where µ =∫∞

0 xdG(x) <∞. This is a tilt model.Biased Sampling/Selection Bias: Vardi (1985), and Gill,Vardi, Wellner (1988) considered the more general biasedsampling model

F (y) = W (G)−1∫ y

−∞w(x)dG(x),

where w(x) is known and W (G) =∫∞−∞w(x)dG(x). This is

a tilt model. F ,G are obtained by NPMLE.

Benjamin Kedem AISC: October 5-7, 2012

Page 19: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Biased sampling

Length-biased Sampling: Vardi (1982) introduced

F (y) = 1/µ∫ y

0xdG(x), y ≥ 0,

where µ =∫∞

0 xdG(x) <∞. This is a tilt model.Biased Sampling/Selection Bias: Vardi (1985), and Gill,Vardi, Wellner (1988) considered the more general biasedsampling model

F (y) = W (G)−1∫ y

−∞w(x)dG(x),

where w(x) is known and W (G) =∫∞−∞w(x)dG(x). This is

a tilt model. F ,G are obtained by NPMLE.

Benjamin Kedem AISC: October 5-7, 2012

Page 20: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Comparison Distributions (Parzen 1977,...,2009)

Sequence of cdf’s: {F1, ...,Fq} � G, with cont. densitiesf1, ..., fq,g.Comparison Distributions are defined as:

Dj(u;G,Fj) = Fj(G−1(u)), 0 < u < 1, j = 1, ...,q

Then by differentiation, with x = G−1(u):

f1(x) = d(G(x);G,F1)g(x)···

fq(x) = d(G(x);G,Fq)g(x)

Benjamin Kedem AISC: October 5-7, 2012

Page 21: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Outline

1 Density Ratio Examples

2 Semiparametric Statistical Formulation

3 Combined semiparametric density estimators

4 Semiparametric regression

5 Application to Testicular Germ Cell Cancer

Benjamin Kedem AISC: October 5-7, 2012

Page 22: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

m = q + 1 independent p-dim multivariate samples:

xij = (xij1, xij2, . . . , xijp) ∼ gi(x1, . . . , xp), i = 1, . . . ,q,m, j = 1, . . . ,ni

andxi1, xi2, . . . , xini

iid∼ gi

Reference or baseline probability density function (pdf):

g ≡ gm(x) ≡ gm(x1, . . . , xp)

Benjamin Kedem AISC: October 5-7, 2012

Page 23: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Density ratio model:

gi(x)g(x)

= w(x,θi) (2)

or equivalentlygi(x) = w(x,θi)g(x) (3)

1. gi(x), g(x) are unknown.2. w is a known positive and continuous function.3. θi are unknown d-dimensional parameters.4. pdf’s may be continuous or discrete.5. Normal assumption is not made.

Benjamin Kedem AISC: October 5-7, 2012

Page 24: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Problem:Use the combined data from all the samples

x11, ...,x1n1 , ...xm1, ...,xmnm

of size n = n1 + · · ·+ nm to estimate all the parameters θi ,i = 1, ...,q, and all the densities g1, ...,gq and gm = g.

Benjamin Kedem AISC: October 5-7, 2012

Page 25: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

G(x) ≡ Gm(x), pij = dG(xij) = dGm(xij).

The empirical log-likelihood is given by:

l = log L =m∑

i=1

ni∑j=1

log(pij) +

q∑i=1

ni∑j=1

log(w(xij ,θi)) (4)

subject to the constraints:

pij ≥ 0,m∑

i=1

ni∑j=1

pij = 1,m∑

i=1

ni∑j=1

pijw(xij ,θk ) = 1 for k = 1, . . . ,q.

(5)

Benjamin Kedem AISC: October 5-7, 2012

Page 26: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

pij =1n

1

1 +∑q

k=1 µk

[w(xij , θk )− 1

] , (6)

G(x) = Gm(x) =m∑

i=1

ni∑j=1

pij I(xij ≤ x) (7)

More generally, for l = 1, . . . ,m and w(xij , θm) ≡ 1,

Gl(x) =m∑

i=1

ni∑j=1

pijw(xij , θl)I(xij ≤ x) (8)

Benjamin Kedem AISC: October 5-7, 2012

Page 27: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Outline

1 Density Ratio Examples

2 Semiparametric Statistical Formulation

3 Combined semiparametric density estimators

4 Semiparametric regression

5 Application to Testicular Germ Cell Cancer

Benjamin Kedem AISC: October 5-7, 2012

Page 28: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Traditional single sample kernel density estimator:

f (x) =1

nhpn

n∑i=1

K(

x− xi

hn

)(9)

hn is a sequence of bandwidths such that:

1. hn → 0, nhpn →∞ as n→∞.

2. K (x) is defined for p-dimensional x.3 K (x) ≥ 0, symmetric around 0.4.∫

Rp K (x)dx = 1.

Under certain conditions, f (x) is a consistent estimator of f (x)(Parzen 1962, Shao 2003).

Benjamin Kedem AISC: October 5-7, 2012

Page 29: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Fokianos (2004), Cheng and Chu (2004), Qin and Zhang(2005), Voulgaraki, Kedem, Graubard (VKG) (2012), use thethe probabilities pij instead of 1/n:

gl(x) =1hp

n

m∑i=1

ni∑j=1

pijwl(xij)K(

x− xij

hn

)(10)

Benjamin Kedem AISC: October 5-7, 2012

Page 30: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

VKG 2012: Assume that K (·) is a nonnegative boundedsymmetric function with

∫K (x)dx = 1,

∫x′xK (x)dx = k2 > 0.

Assume that gl is continuous at x and twice differentiable in aneighborhood of x. If, as n→∞, hn = O(n−

14+p ), then√

nhpn

(gl(x)− gl(x)−

12

h2n

∫u′∂2gl(x∗)∂x∂x′

uK (u)du)

D→ N(0, σ2(x))

whereσ2(x) =

wl(x)gl(x)∑mk=1 ζkwk (x)

∫K 2(u)du

for any fixed x.

Benjamin Kedem AISC: October 5-7, 2012

Page 31: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

The mean integrated square error (MISE) is defined as:

MISE(gl(x)) = E(∫ ∣∣gl(x)− gl(x)

∣∣2dx)

(11)

Minimizing the MISE with respect to hn, the optimal bandwidthis:

h∗n =

(p/n)∫

wl(x)gl(x)/[∑m

k=1 ζkwk (x)]

dx∫

K 2(u)du∫ (∫u′(∂2gl(x)/∂x∂x′)uK (u)du

)2

dx

1

4+p

(12)

Benjamin Kedem AISC: October 5-7, 2012

Page 32: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

For sufficient large n, an alternative way to find hn is byminimizing

h−pn

n∑i=1

n∑i ′=1

p(ti)wl(ti)p(ti ′)wl(ti ′)

∫K (z)K

(z +

ti − ti ′

hn

)dz

− 2nl(nl − 1)hp

n

∑i 6=j

K(

xli − xlj

hn

). (13)

Benjamin Kedem AISC: October 5-7, 2012

Page 33: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

VKG 2012: If f is the classical single-sample multivariate kerneldensity estimator of gl , then As n→∞, hn → 0 and nhp

n →∞,(a)

AMISE(gl) ≤ AMISE(f )

(b) Using optimal bandwidths, the proposed semiparametricdensity estimator gl(x) is more efficient than f (x), i.e forevery l

eff (f , gl) ≡AMISE∗(gl)

AMISE∗(f )≤ 1

where AMISE∗ is the optimal AMISE .

Benjamin Kedem AISC: October 5-7, 2012

Page 34: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Diagnostic Plots and Goodness of Fit

Define

R2α,k = 1− exp

{−(

xαni − xα

)k}

(14)

ni : Sample size of i th sample.

xα: The number of times the estimated semiparametric cdf falls in theestimated 1 − α confidence interval obtained from the correspondingempirical cdf, both evaluated at the sample points.

k > 0 abd α chosen by the user.

R2α,k takes values between 0 and 1.

Computing R2α,k is both simple and fast.

Benjamin Kedem AISC: October 5-7, 2012

Page 35: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Qin and Zhang (1997):

R23 = exp(−

√n ·max |Gi − Gi |) (15)

Clearly, R23 takes values between 0 and 1.

Benjamin Kedem AISC: October 5-7, 2012

Page 36: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Simulation Results: Gi vs. Gi , i = 1,2Simulation 1: g1 ∼ N((0, 0)′,Σ)) (case), g2 ∼ N((0, 0)′,Σ)) (control),

Σ =

(4 22 3

), n1 = 40, n2 = 30.

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

Simulation 1: G1

Joint Empcdf case

Join

t S

em

. C

DF

case

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

Simulation 1: G2

Joint Empcdf control

Join

t S

em

. re

f. C

DF

contr

ol

Benjamin Kedem AISC: October 5-7, 2012

Page 37: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Simulation 2: g1 ∼ N((0, 0)′,Σ)) (case), g2 ∼ N((1, 1)′,Σ)) (control) with

Σ =

(3 11 2

), n1 = 200, n2 = 200.

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

Simulation 2: G1

Joint Empcdf case

Jo

int

Se

m.

CD

F c

ase

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Simulation 2: G2

Joint Empcdf control

Jo

int

Se

m.

ref.

CD

F c

on

tro

l

Benjamin Kedem AISC: October 5-7, 2012

Page 38: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Simulation 3: g1 from standard two dimensional Multivariate Cauchy (case) and g2

from two dimensional Multivariate Cauchy (control) with µ = (1, 1)′, V =

(5 55 10

),

n1 = 200, n2 = 200.

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Simulation 3: G1

Joint Empcdf case

Jo

int

Se

m.

CD

F c

ase

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Simulation 3: G2

Joint Empcdf control

Jo

int

Se

m.

ref.

CD

F c

on

tro

l

Benjamin Kedem AISC: October 5-7, 2012

Page 39: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Simulation 4: g1 from standard two dimensional Multivariate Cauchy (case) and g2from uniform distribution on the triangle (0, 0), (6, 0), (−3, 4) (control), and n1 = 200,n2 = 200.

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Simulation 4: G1

Joint Empcdf case

Jo

int

Se

m.

CD

F c

ase

0.0 0.1 0.2 0.3 0.4

0.1

0.2

0.3

0.4

0.5

Simulation 4: G2

Joint Empcdf control

Jo

int

Se

m.

ref.

CD

F c

on

tro

l

Benjamin Kedem AISC: October 5-7, 2012

Page 40: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Table: Comparison of R23 and R2

.05,2 for 100 repetitions of case andcontrol.

Simulation Group R23 R2

.05,2

(1) Case 0.6307 1Control 0.5976 1

(2) Case 0.3912 0.9353Control 0.3766 0.9718

(3) Case 0.1080 0.3342Control 0.1129 0.3324

(4) Case 0.0507 0.3361Control 0.0495 0.0033

Benjamin Kedem AISC: October 5-7, 2012

Page 41: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Bandwidth (BW) selection using cross validation.

Case ControlSame BW Diff. BW’s Same BW Diff. BW’s

h h1 h2 h h1 h2

Leave One OutSimulation 1 0.61 0.90 0.40 0.59 0.31 0.61Simulation 2 0.38 0.50 0.20 0.61 0.36 0.71

Alternative (12)Simulation 1 0.64 0.90 0.50 0.63 0.21 0.71Simulation 2 0.30 0.40 0.20 0.74 0.11 0.96

Benjamin Kedem AISC: October 5-7, 2012

Page 42: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Outline

1 Density Ratio Examples

2 Semiparametric Statistical Formulation

3 Combined semiparametric density estimators

4 Semiparametric regression

5 Application to Testicular Germ Cell Cancer

Benjamin Kedem AISC: October 5-7, 2012

Page 43: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Suppose we have m multivariate samples of p-dim vectors:

(Random Covariates, Response) = (x1, . . . , x(p−1), y )

We wish to estimate m conditional expectations:

E(y |x1, . . . , x(p−1))

Benjamin Kedem AISC: October 5-7, 2012

Page 44: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Data:i = 1, . . . ,q,m, j = 1, . . . ,ni ,

(xij1, xij2, . . . , xij(p−1), yij) ∼ gi(x1, . . . , x(p−1), y).

Reference:g ≡ gm(x1, . . . , x(p−1), y)

Distortions:

gi(x1, . . . , x(p−1), y), i = 1, . . . ,q

Benjamin Kedem AISC: October 5-7, 2012

Page 45: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Model:

As in ratio of pdf’s from N(µ1,Σ), N(µ2,Σ) assume

gi(x)g(x)

= w(x, αi ,βi) ≡ exp(αi + β′ix), i = 1, ...,q (16)

where x = (x1, . . . , x(p−1), y)′ and βi = (βi1, . . . , βip)′.

Benjamin Kedem AISC: October 5-7, 2012

Page 46: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Let t denote the vector of combined data of lengthn = n1 + n2 + · · ·+ nm. Following the method of constrainedempirical likelihood we obtain score equations for αj and βj :

∂l∂αj

= −n∑

i=1

ρjwj(ti)

1 + ρ1w1(ti) + · · ·+ ρqwq(ti)+ nj = 0 (17)

∂l∂βj

= −n∑

i=1

ρjwj(ti)ti

1 + ρ1w1(ti) + · · ·+ ρqwq(ti)+

nj∑i=1

(xji1, . . . , yji)′ = 0(18)

j = 1, . . . ,q andρj = nj/nm

wj(ti) = exp(αj + β′j ti)

Benjamin Kedem AISC: October 5-7, 2012

Page 47: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

pi =1

nm· 1

1 + ρ1w1(ti) + · · ·+ ρqwq(ti)(19)

G(t) =1

nm·

n∑i=1

I(ti ≤ t)1 + ρ1w1(ti) + · · ·+ ρqwq(ti)

(20)

wherewj(ti) = exp(αj + β

′j ti)

As n→∞, the estimators θ = (α1, · · · , αq, β1, · · · , βq)′ are

asymptotically normal (Qin and Zhang 1997, Lu 2007).

Benjamin Kedem AISC: October 5-7, 2012

Page 48: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Under the p-dimensional density ratio model we can predict theresponse y given the covariate information x1, x2, . . . , x(p−1) forany of the m data sets as follows:

Ej(y | x1, . . . , x(p−1)) =

nj∑i

yigj(x1, . . . , x(p−1), yi)∑yi

gj(x1, . . . , x(p−1), yi), j = 1, . . . ,q,m.

(21)The gj in (21) are the semiparametric kernel density estimates.

Benjamin Kedem AISC: October 5-7, 2012

Page 49: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Assume that the data are bounded. Then:(a) As n→∞,h→ 0 and nhp →∞,∫

|E(y |x)− E(y |x)|g(x)dx→ 0

in the mean square sense.(b) If, in addition 0 < A < g(x), then∫

|E(y |x)− E(y |x)|dx→ 0

in the mean square sense.

Benjamin Kedem AISC: October 5-7, 2012

Page 50: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Outline

1 Density Ratio Examples

2 Semiparametric Statistical Formulation

3 Combined semiparametric density estimators

4 Semiparametric regression

5 Application to Testicular Germ Cell Cancer

Benjamin Kedem AISC: October 5-7, 2012

Page 51: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Testicular germ cell tumor (TGCT) is a common cancer among U.S.men, mainly in the age group of 15-35 years (McGlynn et al 2003).

Increased risk is significantly related to height, whereas body massindex was not found significant (McGlynn et al 2007).

Using a two dimensional semiparametric model, it was shown thatjointly height and weight are significant risk factors (Kedem et al 2009).

We use TGCT data with 3 variables: age, height (cm), weight (kg).

Case: n1 = 763. Control: n2 = 928. n = 1691 individuals.

We consider two cases:2D with variables height and weight3D with variables height, weight and age.In both cases the control distribution is the reference distribution.

Benjamin Kedem AISC: October 5-7, 2012

Page 52: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

2D Problem: E(Weight | Height)

Diagnostic plots of Gi versus Gi , i = 1, 2 evaluated at (height,weight) pairs.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Case Group

Joint Empirical CDF

Join

t S

em

. C

DF

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Control Group

Joint Empirical CDF

Join

t S

em

. C

DF

Benjamin Kedem AISC: October 5-7, 2012

Page 53: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

2D Case Regression

160 170 180 190 200

60

80

100

120

2D TGCT: E[Weight/Height] for G1

height

weig

ht

Actual points Ref.

Regression

Semiparametric

GAM

NW

70 75 80 85 90 95

−30

−20

−10

010

20

30

40

Residual plot for 2D TGCT Case

Sem. E[weight/height]

hat(

e)

Benjamin Kedem AISC: October 5-7, 2012

Page 54: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

2D Control Regression

150 160 170 180 190 200 210

40

60

80

100

120

2D TGCT: E[Weight/Height] for G2

height

weig

ht

Actual points Ref.

Regression

Semiparametric

GAM

NW

55 60 65 70 75 80 85 90

−40

−20

020

40

Residual plot for 2D TGCT Control

Sem. E[weight/height]

hat(

e)

Benjamin Kedem AISC: October 5-7, 2012

Page 55: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

3D Problem: E(Weight | Height, Age)Diagnostic plots of Gi versus Gi , i = 1, 2 evaluated at (age,height,weight)triplets.

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

Case Group

Joint Empirical CDF

Join

t S

em

. C

DF

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

Control Group

Joint Empirical CDF

Join

t S

em

. C

DF

Benjamin Kedem AISC: October 5-7, 2012

Page 56: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

3D Residuals

Residual plots for the semiparametric model in the 3D TGCTdata set.

70 75 80 85 90 95

−2

0−

10

01

02

03

04

0

Residual plot for 3D TGCT Case

Sem. E[weight/height, age]

ha

t(e

)

55 60 65 70 75 80 85 90

−4

0−

20

02

04

0

Residual plot for 3D TGCT Control

Sem. E[weight/height, age]

ha

t(e

)

Benjamin Kedem AISC: October 5-7, 2012

Page 57: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Some joint probabilities of age, height and weight in the case and control groups.

Probability Case ControlPr(A≤ 45, H≤ 152.40, W≤ 58.967 ) 0.000378 0.000767Pr(A≤ 26, H≤ 165.10, W≤ 58.967 ) 0.004502 0.007074Pr(A≤ 29, H≤ 177.80, W≤ 65.317 ) 0.042723 0.054313Pr(A≤ 33, H≤ 185.42, W≤ 70.307 ) 0.157968 0.184774Pr(A≤ 34, H≤ 180.34, W≤ 79.832 ) 0.316077 0.362967Pr(A≤ 37, H≤ 180.34, W≤ 89.811 ) 0.513664 0.575512Pr(A≤ 40, H≤ 187.96, W≤ 94.801 ) 0.797157 0.833803Pr(A≤ 43, H≤ 200.66, W≤ 99.790 ) 0.943058 0.956300Pr(A≤ 45, H≤ 203.20, W≤ 117.934 ) 0.995010 0.996560

Benjamin Kedem AISC: October 5-7, 2012

Page 58: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Case-control weight and E [weight|height, age]. Empty entries in the table correspond to subjects with the sameheight and age, but possibly different weights.

Control CaseAge Height Weight E [W | H, A] Weight E [W | H, A]

27 162.56 58.967 69.08335 58.967 68.5365228 162.56 77.111 69.05132 65.771 68.59858

68.03930 165.10 68.039 72.20524 72.575 72.002837 165.10 69.40 72.42138 63.503 71.850425 167.64 86.183 73.68129 72.575 73.69978

90.71863.503

Benjamin Kedem AISC: October 5-7, 2012

Page 59: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Case-control weight and E [weight|height, age]. Empty entries in the table correspond to subjects with the sameheight and age, but possibly different weights.

Control CaseAge Height Weight E [W | H, A] Weight E [W | H, A]

30 167.64 72.575 74.81333 88.451 74.9354318 170.18 61.235 73.67032 72.575 73.6751832 170.18 70.307 76.53351 81.647 76.64543

63.50337 172.72 74.843 77.88598 88.451 77.941740 172.72 70.307 77.97789 90.718 78.0441

77.11122 175.26 77.111 76.62195 86.183 76.70862

65.771 65.77179.379 86.18383.91565.771

25 175.26 68.039 77.14234 79.379 77.2175583.915 72.57574.843 83.91583.915 74.84379.379 72.57586.183 74.843

61.23561.23565.77179.379

Benjamin Kedem AISC: October 5-7, 2012

Page 60: Semiparametric Regression Based on Multiple Sourcesbnk/AISC_2012.beam.pdfSemiparametric regression Application to Testicular Germ Cell Cancer ... AISC, October 5-7, 2012 Benjamin Kedem

UMinformal

Density Ratio ExamplesSemiparametric Statistical Formulation

Combined semiparametric density estimatorsSemiparametric regression

Application to Testicular Germ Cell Cancer

Case-control weight and E [weight|height, age]. Empty entries in the table correspond to subjects with the sameheight and age, but possibly different weights.

Control CaseAge Height Weight E [W | H, A] Weight E [W | H, A]

21 185.42 86.183 82.46773 79.379 82.7814072.575 77.111

102.058 97.52222 190.50 97.522 85.23493 86.183 85.64845

95.254 71.66831 190.50 102.058 86.05980 104.326 86.27744

74.84322 193.04 86.183 86.73352 102.058 87.18440

80.73924 193.04 99.337 87.50020 108.862 88.23938

86.18399.790

108.86234 193.04 113.398 87.72937 88.451 88.58960

117.93434 195.58 83.915 88.81524 89.811 89.036535

Benjamin Kedem AISC: October 5-7, 2012