advanced statistical methods: beyond linear regression

Download Advanced Statistical Methods: Beyond Linear Regression

If you can't read please download the document

Upload: basil-turner

Post on 13-Mar-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Advanced Statistical Methods: Beyond Linear Regression. John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009. 1. http://www.stat.usu.edu/~jrstevens/pcmi. ObsFlightTempDamage 1STS166NO 2STS970NO 3STS51B75NO - PowerPoint PPT Presentation

TRANSCRIPT

  • John R. StevensUtah State University

    Notes 2. Statistical Methods I

    Mathematics Educators Workshop 28 March 2009*Advanced Statistical Methods:Beyond Linear Regressionhttp://www.stat.usu.edu/~jrstevens/pcmi

  • What would your students know to do with these data?ObsFlightTempDamage1STS166NO2STS970NO3STS51B75NO4STS270YES5STS41B57YES6STS51G70NO7STS369NO8STS41C63YES9STS51F81NO10STS48011STS41D70YES12STS51I76NO13STS568NO14STS41G78NO15STS51J79NO16STS667NO17STS51A67NO18STS61A75YES19STS772NO20STS51C53YES21STS61B76NO22STS873NO23STS51D67NO24STS61C58YES

  • Two Sample t-test

    data: Temp by Damage t = 3.1032, df = 21, p-value = 0.005383alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.774344 14.047085 sample estimates: mean in group NO mean in group YES 72.12500 63.71429

  • Does the t-test make sense here?Traditional:Treatment Group mean vs. Control Group mean

    What is the response variable?Temperature? [Quantitative, Continuous]Damage? [Qualitative]

  • Traditional Statistical Model 1Linear Regression: predict continuous response from [quantitative] predictorsY=weight, X=heightY=income, X=education levelY=first-semester GPA, X=parents incomeY=temperature, X=damage (0=no, 1=yes)

    Can also control for other [possibly categorical] factors (covariates):SexMajorState of OriginNumber of Siblings

  • Traditional Statistical Model 2Logistic Regression: predict binary response from [quantitative] predictorsY=graduate within 5 years=0 vs. Y=not=1X=first-semester GPAY=0 (no damage) vs. Y=1 (damage)X=temperatureY=0 (survive) vs. Y=1 (death)X=dosage (dose-response model)Can also control for other factors, or covariatesRace, SexGenotypep = P(Y=1 | relevant factors) = prob. that Y=1, given state of relevant factors

  • Traditional Dose-Response Modelp = Probability of death at dose d:

    Look at what affects the shape of the curve, LD50 (lethal dose for 50% efficacy), etc.

  • Fitting the Dose-Response ModelWhy logistic regression?0 = place-holder constant1 = effect of dosage dTo estimate parameters:Newton-Raphson iterative process to maximize the likelihood of the modelCompare Y=0 (no damage) with Y=1 (damage) groups

  • Likelihood Function (to be maximized)likelihood for obs. imultiply probabilities (independence)

  • Estimation by IRLSIteratively Reweighted Least Squares

    equivalent: Newton-Raphson algorithm for iteratively solving score equations

  • Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.0429 7.3786 2.039 0.0415 *Temp -0.2322 0.1082 -2.145 0.0320 *---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

  • What if the data were even better?Complete separation of points

    What should happen to our slope estimate?

  • Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) 928.9 913821.4 0.001 1Temp -14.4 14106.7 -0.001 1

  • Failure?Shape of likelihood function

    Large Standard Errors

    Solution only in 2006

    Rather than maximizing likelihood, consider a penalty:

  • Model fitted by Penalized MLConfidence intervals and p-values by Profile Likelihood

    coef se(coef) Chisq p(Intercept) 30.4129282 16.5145441 11.35235 0.0007535240Temp -0.4832632 0.2528934 13.06178 0.0003013835

  • Beetle Data

    Phosphine

    Total

    Dosage

    Receiving

    Total

    Total

    Survivors Observed at Genotype

    (mg/L)

    Dosage

    Deaths

    Survivors

    -/B

    -/H

    -/A

    +/B

    +/H

    +/A

    0

    98

    0

    98

    31

    27

    10

    6

    20

    4

    0.003

    100

    16

    84

    18

    26

    10

    6

    20

    4

    0.004

    100

    68

    32

    10

    4

    3

    5

    7

    4

    0.005

    100

    78

    22

    1

    4

    7

    2

    6

    2

    0.01

    100

    77

    23

    0

    1

    9

    8

    5

    0

    0.05

    300

    270

    30

    0

    0

    0

    5

    20

    5

    0.1

    400

    383

    17

    0

    0

    0

    0

    10

    7

    0.2

    750

    740

    10

    0

    0

    0

    0

    0

    10

    0.3

    500

    490

    10

    0

    0

    0

    0

    0

    10

    0.4

    500

    492

    8

    0

    0

    0

    0

    0

    8

    1.0

    7850

    7,806

    44

    0

    0

    0

    0

    0

    44

    10,798

    10,420

    378

  • Dose-response modelRecall simple model:

    pij = Pr(Y=1 | dosage level j and genotype level i)

    But when is genotype (covariate Gi) observed?

  • Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -2.657e+01 8.901e+04 -2.98e-04 1dose -7.541e-26 1.596e+07 -4.72e-33 1G1+ -3.386e-28 1.064e+05 -3.18e-33 1G2B -1.344e-14 1.092e+05 -1.23e-19 1G2H -3.349e-28 1.095e+05 -3.06e-33 1dose:G1+ 7.541e-26 1.596e+07 4.72e-33 1dose:G2B 3.984e-12 3.075e+07 1.30e-19 1dose:G2H 7.754e-26 2.760e+07 2.81e-33 1G1+:G2B 1.344e-14 1.465e+05 9.17e-20 1G1+:G2H 3.395e-28 1.327e+05 2.56e-33 1dose:G1+:G2B -3.984e-12 3.098e+07 -1.29e-19 1dose:G1+:G2H -7.756e-26 2.763e+07 -2.81e-33 1Before we fix this, first a little detour

  • A Multivariate Gaussian MixtureComponent j is MVN(j,j) with proportion j

  • The Maximum Likelihood Approach

  • A Possible Work-AroundKeys here:the true group memberships are unknown (latent)statisticians specialize in unknown quantities

  • A reasonable approach1. Randomly assign group memberships , and estimate group means j , covariance matrices j , and mixing proportions j2. Given those values, calculate (for each obs.) j = E[j|] = P(obs. in group j)3. Update estimates for j , j , and j , weighting each observation by these : 4. Repeat steps 2 and 3 to convergence

  • Plotting character and color indicate most likely component

  • The EM (Baum-Welch) Algorithm- maximization made easier with Zm = latent (unobserved) data; T = (Z,Zm) = complete dataStart with initial guesses for parametersExpectation: At the kth iteration, compute Maximization: Obtain estimate by maximizing over Iterate steps 2 and 3 to convergence ($?)

  • Beetle Data NotationObserved values Unobserved (latent) values If Nij had been observed:

    How Nij can be [latently] considered:

  • Likelihood FunctionParameters =(p,P) and complete data T=(n,N) After simplification:

    Mechanism of missing data suggests EM algorithm

  • Missing at Random (MAR)Necessary assumption for usual EM applicationsCovariate x is MAR if probability of observing x does not depend on x or any other unobserved covariate, but may depend on response and other observed covariates (Ibrahim 1990)Here genotype is observed only for survivors, and for all subjects at zero dosage

  • Initialization StepTwo classes of marginal information hereFor all dosage levels j observeAt zero dosage level observe for genotype iAllows estimate of Pi Consider marginal distn. of missing categorical covariate (genotype)Using zero dosage level:

    This is the key the marginal distribution of the missing categorical covariate

  • Expectation StepDropping constants and :

    Need to evaluate:

    (*)

  • Expectation StepBayes Formula:

    Multinomial (*)

  • Expectation StepFor :Not needed for maximization only affects EM convergence rateDirect calculation from multinomial distn. is possible but computationally prohibitiveNeed to employ some approximation strategySecond-order Taylor series about , using Binets formula(*)

  • Expectation StepConsider Binets formula (like Stirlings):

    Have:

    Use a second-order Taylor series approximation taken about as a function of :(*)

  • Maximization StepPortion of related to :

    Portion of related to :by Lagrange multipliersby Newton-Raphson iterations, with some parameterization(*)

  • Convergence

  • Dose Response Curves (log scale)

  • EM Resultstest statistic for H0: no dosage effectseparation of points

    Confidence

    LD50

    L95

    U95

    t

    -/B

    0.0035

    0.0031

    0.0039

    3.99

    -/H

    0.0033

    0.0028

    0.0038

    4.98

    -/A

    0.0290

    -7.1862

    7.2442

    0.13

    +/B

    0.0484

    0.0123

    0.0845

    0.09

    +/H

    0.0664

    0.0407

    0.0921

    4.20

    +/A

    0.7382

    0.1428

    1.3336

    1.36

  • Topics Used HereCalculusDifferentiation & Integration (including vector differentiation)Lagrange MultipliersTaylor Series ExpansionsLinear AlgebraDeterminants & EigenvaluesInverting [computationally/nearly singular] MatricesPositive DefinitenessProbabilityDistributions: Multivariate Normal, Binomial, MultinomialBayes FormulaStatisticsLogistic RegressionSeparation of Points[Penalized] Likelihood MaximizationEM AlgorithmBiology a little time and communication

    *