1 chapter 16 logistic regression analysis. 2 content logistic regression conditional logistic...

42
1 Chapter 16 logistic Regression Analysis

Upload: susanna-bishop

Post on 21-Jan-2016

310 views

Category:

Documents


3 download

TRANSCRIPT

  • Chapter 16 logistic RegressionAnalysis

  • ContentLogistic regression Conditional logistic regression Application

  • Purpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variables (risk factors). Logistic regression is a kind of nonlinear regression.Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no. 2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categorical variable should be quantified.

  • Implication: Logistic regression can be used to study the quantitative relations between the happening of some diseases or phenomena and many risk factors. There are some demerits to use test (or u test ): 1. can only study one risk factor. 2. can only educe the qualitative conclusion.

  • Category:1.Between-subjects (non-conditional) logistic regression equation2. Paired (conditional) logistic regression equation

  • 1 logistic regression non-conditional logistic regression

  • I Basic ConceptionThe probability of positive outcome under the function of m independent variables can be marked like this:

  • If: Regression modelProbability: P01logitP Scale:

    While

    is the constant term

    is the coefficient of regression

    _1076792764.unknown

    _1076792910.unknown

  • 1

    0.01799

    0.01889

    0.01984

    0.02084

    0.02188

    0.02298

    0.02413

    0.02533

    0.0266

    0.02792

    0.02931

    0.03077

    0.0323

    0.0339

    0.03557

    0.03733

    0.03917

    0.04109

    0.04311

    0.04522

    0.04743

    0.04974

    0.05215

    0.05468

    0.05732

    0.06009

    0.06297

    0.06599

    0.06914

    0.07243

    0.07586

    0.07944

    0.08317

    0.08707

    0.09112

    0.09535

    0.09975

    0.10433

    0.1091

    0.11405

    0.1192

    0.12455

    0.13011

    0.13587

    0.14185

    0.14805

    0.15447

    0.16111

    0.16798

    0.17509

    0.18243

    0.19

    0.19782

    0.20587

    0.21417

    0.2227

    0.23148

    0.24049

    0.24974

    0.25923

    0.26894

    0.27888

    0.28905

    0.29943

    0.31003

    0.32082

    0.33181

    0.34299

    0.35434

    0.36586

    0.37754

    0.38936

    0.40131

    0.41338

    0.42556

    0.43782

    0.45017

    0.46257

    0.47502

    0.4875

    0.5

    0.5125

    0.52498

    0.53743

    0.54983

    0.56218

    0.57444

    0.58662

    0.59869

    0.61064

    0.62246

    0.63414

    0.64566

    0.65701

    0.66819

    0.67918

    0.68997

    0.70057

    0.71095

    0.72112

    0.73106

    0.74077

    0.75026

    0.75951

    0.76852

    0.7773

    0.78583

    0.79413

    0.80218

    0.81

    0.81757

    0.82491

    0.83202

    0.83889

    0.84553

    0.85195

    0.85815

    0.86413

    0.86989

    0.87545

    0.8808

    0.88595

    0.8909

    0.89567

    0.90025

    0.90465

    0.90888

    0.91293

    0.91683

    0.92056

    0.92414

    0.92757

    0.93086

    0.93401

    0.93703

    0.93991

    0.94268

    0.94532

    0.94785

    0.95026

    0.95257

    0.95478

    0.95689

    0.95891

    0.96083

    0.96267

    0.96443

    0.9661

    0.9677

    0.96923

    0.97069

    0.97208

    0.9734

    0.97467

    0.97587

    0.97702

    0.97812

    0.97916

    0.98016

    0.98111

    0.98201

    1

    0.5

    P

    Z

    Sheet1

    -40.01799

    -3.950.01889

    -3.90.01984

    -3.850.02084

    -3.80.02188

    -3.750.02298

    -3.70.02413

    -3.650.02533Z

    -3.60.0266

    -3.550.02792

    -3.50.02931

    -3.450.03077

    -3.40.0323

    -3.350.0339

    -3.30.03557

    -3.250.03733

    -3.20.03917

    -3.150.04109

    -3.10.04311

    -3.050.04522

    -30.04743

    -2.950.04974

    -2.90.05215

    -2.850.05468

    -2.80.05732

    -2.750.06009

    -2.70.06297

    -2.650.06599

    -2.60.06914

    -2.550.07243

    -2.50.07586

    -2.450.07944

    -2.40.08317

    -2.350.08707

    -2.30.09112

    -2.250.09535

    -2.20.09975

    -2.150.10433

    -2.10.1091

    -2.050.11405

    -20.1192

    -1.950.12455

    -1.90.13011

    -1.850.13587

    -1.80.14185

    -1.750.14805

    -1.70.15447

    -1.650.16111

    -1.60.16798

    -1.550.17509

    -1.50.18243

    -1.450.19

    -1.40.19782

    -1.350.20587

    -1.30.21417

    -1.250.2227

    -1.20.23148

    -1.150.24049

    -1.10.24974

    -1.050.25923

    -10.26894

    -0.950.27888

    -0.90.28905

    -0.850.29943

    -0.80.31003

    -0.750.32082

    -0.70.33181

    -0.650.34299

    -0.60.35434

    -0.550.36586

    -0.50.37754

    -0.450.38936

    -0.40.40131

    -0.350.41338

    -0.30.42556

    -0.250.43782

    -0.20.45017

    -0.150.46257

    -0.10.47502

    -0.050.4875

    00.5

    0.050.5125

    0.10.52498

    0.150.53743

    0.20.54983

    0.250.56218

    0.30.57444

    0.350.58662

    0.40.59869

    0.450.61064

    0.50.62246

    0.550.63414

    0.60.64566

    0.650.65701

    0.70.66819

    0.750.67918

    0.80.68997

    0.850.70057

    0.90.71095

    0.950.72112

    10.73106

    1.050.74077

    1.10.75026

    1.150.75951

    1.20.76852

    1.250.7773

    1.30.78583

    1.350.79413

    1.40.80218

    1.450.81

    1.50.81757

    1.550.82491

    1.60.83202

    1.650.83889

    1.70.84553

    1.750.85195

    1.80.85815

    1.850.86413

    1.90.86989

    1.950.87545

    20.8808

    2.050.88595

    2.10.8909

    2.150.89567

    2.20.90025

    2.250.90465

    2.30.90888

    2.350.91293

    2.40.91683

    2.450.92056

    2.50.92414

    2.550.92757

    2.60.93086

    2.650.93401

    2.70.93703

    2.750.93991

    2.80.94268

    2.850.94532

    2.90.94785

    2.950.95026

    30.95257

    3.050.95478

    3.10.95689

    3.150.95891

    3.20.96083

    3.250.96267

    3.30.96443

    3.350.9661

    3.40.9677

    3.450.96923

    3.50.97069

    3.550.97208

    3.60.9734

    3.650.97467

    3.70.97587

    3.750.97702

    3.80.97812

    3.850.97916

    3.90.98016

    3.950.98111

    40.98201

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1

    0.5

    P

    Z

    Sheet2

    Sheet3

  • The meaning of model parameter By constant we mean the natural logarithm of likelihood ratio between happening and non-happening when exposure dose is zero. By regression coefficient we mean the change of logitP when the independent variable changes by one unit.

  • The statistical indicator--odds ratio which is used to measure the function of risk factor in the epidemiology ,the formula of computation is: Odds ratio (OR)

    In the formula ,

    is the incidence of a disease when

    is

    ,and

    is the incidence of a disease when

    is

    .

    is called odds ratio when many variables had been adjusted, it show the function of the risk factors without the influence of the other independent variables.

    _1077608169.unknown

    _1208683399.unknown

    _1208683450.unknown

    _1208683486.unknown

    _1208683415.unknown

    _1208683343.unknown

    _1077608162.unknown

  • The relationship with logistic P

    Comparing the conditions of disease when one risk factor has two different exposure levels (

    ,

    ), the natural logarithm of Odds Ratio is:

    _1077608069.unknown

    _1077608080.unknown

  • We often think that

    is an ineffective parameter, because there is no relationship between

    and

    .

    _1208781326.unknown

    _1208781372.unknown

    _1208781297.unknown

  • II the parametric estimation of logistic regression model parametric estimation Theorythe estimation of likelihood

    _1077608381.unknown

    _1079443675.unknown

    _1081064387.unknown

    _1077608256.unknown

  • 2.Estimation of OR It can show the OR of two different levels c1c0 of one factor.

    _1077608162.unknown

    _1077608169.unknown

    If the independent variable

    only has two levelsthe exposure and the non- exposure, the estimate formula of

    confidence interval of

    is:

    _958218068.unknown

    _1077608532.unknown

    _952641395.unknown

  • e.g.: 16-1 Table 16-1 is a case-control data which is used to study the relations among smokingdrinking and esophagus cancer, please try running logistic regression analysis. Definite every variables code

  • Table16-1 the case-control data of the relation between smoking and esophagus cancer

    stratification

    smoking

    drinking

    case

    positive

    negative

    g

    X1

    X2

    ng

    dg

    ng( dg

    1

    0

    0

    199

    63

    136

    2

    0

    1

    170

    63

    107

    3

    1

    0

    101

    44

    57

    4

    1

    1

    416

    265

    151

  • Results:95 confidence interval of95 confidence interval of

    The OR of smoking and nonsmoking The OR of drinking and no drinking

    logistic

    =-0.9099

    =0.1358

    =0.8856

    =0.1500

    =0.5261

    =0.1572

    95(:

    :

    95(:

    _958218848.unknown

    _1081064688.unknown

    _1101607359.unknown

    _1101607403.unknown

    _1101607439.unknown

    _1081064747.unknown

    _1079156530.unknown

    _1079156560.unknown

    _1078082832.unknown

    _958218846.unknown

    _958218847.unknown

    _958218767.unknown

  • III the hypothesis test of logistic regression model1. Likelihood test2. Wald test comparing the estimations of parameters with zero, the control is its standard error , statistics are:

    Both of are more than 3.84, that is to say that esophagus cancersmoking and drinking have relations with each other. The conclusion is same as above.

  • methodsforward selectionbackward elimination and stepwise regression .Test statisticsit is not F statisticbut one of likelihood Wald test and score test statistics.IV variable selectione.g.: 16-2 In order to discuss the risk factors that relate to coronary heart disease, to take case-control study on 26 coronary heart disease patients and 28 controllers, table 16-2 and table 16-3 show the definition of all factors and the data. Please try using logistic stepwise regression to select the risk factors.

  • Table 16-2 eight probable risk factors of coronary heart disease and valuation

    factors

    variables

    Definition of valuation

    Age

    X1

  • Table 16-3 the case-control data of heart diseases risk factors

    Order

    X1

    X2

    X3

    X4

    X5

    X6

    X7

    X8

    Y

    1

    3

    1

    0

    1

    0

    0

    1

    1

    0

    2

    2

    0

    1

    1

    0

    0

    1

    0

    0

    3

    2

    1

    0

    1

    0

    0

    1

    0

    0

    4

    2

    0

    0

    1

    0

    0

    1

    0

    0

    5

    3

    0

    0

    1

    0

    1

    1

    1

    0

    6

    3

    0

    1

    1

    0

    0

    2

    1

    0

    7

    2

    0

    1

    0

    0

    0

    1

    0

    0

    8

    3

    0

    1

    1

    1

    0

    1

    0

    0

    9

    2

    0

    0

    0

    0

    0

    1

    1

    0

    10

    1

    0

    0

    1

    0

    0

    1

    0

    0

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    51

    2

    0

    1

    1

    0

    1

    2

    1

    1

    52

    2

    1

    1

    1

    0

    0

    2

    1

    1

    53

    2

    1

    0

    1

    0

    0

    1

    1

    1

    54

    3

    1

    1

    0

    1

    0

    3

    1

    1

  • Table 16-4 e.g.16-2 the independent variables which are entering equation and estimations of related parameters Learn how to see the results

    Model

    Coefficient of regression (b)

    Standard error(

    )

    Wald

    P

    Standard coefficient of regression(b)

    constant

    -4.705

    1.543

    9.30

    0.0023

    --

    --

    X1

    0.924

    0.477

    3.76

    0.0525

    0.401

    2.52

    X5

    1.496

    0.744

    4.04

    0.0443

    0.406

    4.46

    X6

    3.136

    1.249

    6.30

    0.0121

    0.703

    23.00

    X8

    1.947

    0.847

    5.29

    0.0215

    0.523

    7.01

    _952031676.unknown

    _1079431735.unknown

    _949249507.unknown

  • Finally there are four risk factors entering the logistic regression model, they are rising age

    history of high blood lipid

    animal fat intake

    and type A character

    Standard coefficient of regression

    can be used to compare the importance of every factor

    is standard error of

    , (=3.1416

    _1208758035.unknown

    _1208758063.unknown

    _1208758089.unknown

    _1208758104.unknown

    _1208758075.unknown

    _1208758051.unknown

    _952546540.unknown

  • ContentLogistic regression Conditional logistic regression Application

  • I Principle 2 conditional logistic regression

    In the paired data, one case and several controls in each group is the most commonly method, that is 1: M paired studyusually

    _952611537.unknown

  • Table 16-5 the data format of 1: M conditional logistic regression * t = 0 is the case and the others are the control.

    Matched group

    Number in group

    Dependent variable

    Risk factors

    i

    t

    Y

    X1

    X2

    Xm

    1

    0

    1

    X101

    X 102

    X 10m

    1

    0

    X 111

    X 112

    X 11m

    2

    0

    X 121

    X 122

    X 12m

    M

    0

    X 1M1

    X 1M2

    X 1Mm

    n

    0

    1

    Xn01

    X n02

    X n0m

    1

    0

    X n11

    X n12

    X n1m

    2

    0

    X n21

    X n22

    X n2m

    M

    0

    X nM1

    X nM2

    X nMm

    _1079161811.unknown

    _1079161851.unknown

    _1079161794.unknown

  • The model of conditional logistic

    means the disease probability of the layer i under the function of a group of risk factors

    means the effect of every layer,

    are the parameter to estimate.

    _956617457.unknown

    _958220609.unknown

    _952111582.unknown

    The difference with the model of non-conditional logistic regression is constant, the

    can be different from each other, but they assume that the ability of causing diseases is the same among different paired groups.

    _952111582.unknown

  • II applied example

    e.g.16-3 Some study about risk factors of larynx cancer in a northern cityit used1:2 paired case-control method. Now 6 probable risk factors and 25 paired data have been selected, the valuation is in the following table 16-6, and the data is in table 16-7.

    _1208758149.unknown

    _1208933214.unknown

    Table 16-6 the risks of larynx cancer and explanation of valuation

    Factors

    variables

    Explanation of valuation

    pharyngitis

    X1

    no=1, occasion=2, often=3

    smoking(cig/day)

    X2

    0=1, 1(4=2, 5(9=3, 10(20=4, 20(=5

    hoarseness

    X3

    no=1, occasion =2, often=3

    Fresh vegetables intake

    X4

    little=1, occasion=2, every day=3

    Fruits intake

    X5

    rare =1, little=2, often=3

    Family cancer history

    X6

    no=0, yes=1

    larynx cancer

    Y

    case=1, control=0

  • Table 16-7 the data table of 1:2 paired case-control study about larynx cancer P344:

  • Table16-8 e.g.16-3 The Estimation of independent variables and related parameters which have entered the equation Using stepwise Six risk factors variable selection four factors enter equationTable16-9 shows the results

    The four entered risk factors are smoking

    hoarseness

    whether often have fresh vegetable or not

    and family cancers

    in all of these, having fresh vegetable is a protecting factor

    _952282733.unknown

    _952645373.unknown

    _952645391.unknown

    _952282923.unknown

    _952282543.unknown

    Entering

    variables

    Coefficient of

    regressionb

    Standard

    errorSb

    Wald

    P

    X2

    1.4869

    0.5506

    7.29

    4.42

    0.0069

    X3

    1.9166

    0.9444

    4.12

    6.80

    0.0424

    X4

    3.7641

    1.8251

    4.25

    0.02

    0.0392

    X6

    3.6321

    1.8657

    3.79

    37.79

    0.0516

    _952031676.unknown

    _1101368128.unknown

  • ContentLogistic regression Conditional logistic regression Application

  • I the application of logistic regression1The analysis of epidemiologic risk factors One feature of logistic regression is that the meaning of parameter is clear, so logistic regression is suitable for epidemiologic study. 3 the application of logistic regression and the notice

  • 2Analysis of clinical experiment The goal of clinical experiment is to assess the effect of some drugs or cure methods, if there are some confounding factors, and they are not balance among teams, the final results will be wrong. So it is necessary to adjust these factors during the process of analysis. when dependent variable is binary, we can use logistic regression to analyze and get the adjusted results.

  • 3Analyze doseresponse of drugs or poisons In the studies about doseresponse of some drugs or poisons, if the date is the logarithm of dose ,the Probability distribution close to normal. The distribution of normal function is very similar to logistic regression, then we can express their relation through the following model.(While P is the positive rate; X is dose.)

  • 4Forecast and discrimination logistic regression is a model of probability so we can use it to predict the probability of something. For example in clinical we can discriminate the probability of some diseases under some index. please refer to the chapter 18 about discrimination.

  • II the notice of application of logistic regression

    1The value form of variable (the same as chapter15)

    2Sample size

    the number of independent variable

    3The evaluation of model

    4Multi-category logistic regression

    _952290181.unknown

  • summaryPurpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variable (risk factor). Logistic regression belong to probability type and nonlinear regression.Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no. 2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categories variable should be measure by number.

  • Implication: Logistic regression can be used to study the quantitative relations between the happening of some disease or phenomena and many risk factorsCategory:1.Between-subjects (non-conditional) logistic regression equation2. Paired (conditional) logistic regression equation

  • ThinkingIn order to analysis the influent factors of the rescue of AMI patients, a hospital collected five years data of AMI patients (there are many related factors ,this case only lists three ones for the limited space), which has 200 cases in total, the data has been shown in the following table, P=0 means successful rescueP=1 means deathX1=1 means shock before rescue X1=0 means no shock before rescue X2=1 means heart failure before rescue X2=0 means no heart failure before rescue X3=1 means that it has been more than 12 hours from the beginning of AMI symptom to rescue X3=0 means the time has not passed 12 hours. which analysis method is the best one? why? which result can we got

  • The data of the rescue risk factor of the AMI patients

    P=0(successfully rescued)P=1(death)X1X2X3NX1X2X3N00035000400134001100101701040111901115100171006101610191106110611161116