sas programming: analyticalstatanalysis.weebly.com/uploads/8/1/4/8/8148217/sas...sas programming:...

70
SAS PROGRAMMING: ANALYTICAL Eng. Mohammad KHALAF Mobile: 00962-79-5880413 Email: [email protected] Webpage: www.statanalysis.weebly.com

Upload: others

Post on 27-Jan-2021

4 views

Category:

Documents


1 download

TRANSCRIPT

  • SAS PROGRAMMING: ANALYTICAL

    Eng. Mohammad KHALAF

    Mobile: 00962-79-5880413

    Email: [email protected]

    Webpage: www.statanalysis.weebly.com

    mailto:[email protected]://www.statanalysis.weebly.com/

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 2

    TABLE OF CONTENTS

    Table of Contents ........................................................................................................... 2

    Graphics ......................................................................................................................... 3

    Univariate ....................................................................................................................... 6

    Correlation ................................................................................................................... 13

    Kernel Density Estimate .............................................................................................. 15

    T-test ............................................................................................................................ 15

    Analysis of each variable separately ............................................................................ 18

    T-test ............................................................................................................................ 29

    One sample t test ...................................................................................................... 29

    Paired t-test ............................................................................................................... 30

    Correlation Test ........................................................................................................... 31

    Independent sample t-test ......................................................................................... 32

    ANOVA ....................................................................................................................... 33

    Regression analysis ...................................................................................................... 34

    ODS in SAS ................................................................................................................. 35

    Appendices ................................................................................................................... 37

    Questionnaire ........................................................................................................... 38

    SAS t-test Commands .............................................................................................. 41

    SAS Simple Linear Regression Example ................................................................. 59

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 3

    GRAPHICS

    To produce simple scatterplot of two variables we use proc gplot as follow:

    data graph;

    input x y;

    datalines;

    20 10

    15 23

    5 14

    ;

    run;

    proc print data=graph;

    run;

    proc gplot;

    plot y * x;

    run;

    Output of analysis part

    Graph output which is displayed on graph output windows as follow:

    To add line between the different points we use the command

    symbol1 i=join;

    proc gplot;

    plot y * x;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 4

    where i indicates (interpolation)

    More additions to graph:

    data graph;

    input x y;

    datalines;

    20 10

    15 23

    5 14

    ;

    run;

    proc print data=graph;

    run;

    symbol 1 v=none i=join;

    symbol1 v=square i=join;

    symbol2 v=circle i=join;

    proc gplot;

    plot y * x;

    run;

    where v indicates value

    data graph;

    input x y sex;

    datalines;

    20 10 M

    15 23 F

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 5

    5 14 M

    ;

    run;

    symbol1 v=none i=join c=red;

    symbol2 v=none i=join c=red;

    proc gplot;

    plot y * x = sex;

    run;

    repeat as

    run;

    symbol1 v=diamond i=join c=red;

    symbol2 v=none i=join c=red;

    proc gplot;

    plot y * x = sex;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 6

    UNIVARIATE

    data water;

    input flag $ 1 Town $ Mortal Hardness;

    datalines;

    Bath 1247 105

    *Birkenhead 1668 17

    Birmingham 1466 5

    *Blackburn 1800 14

    *Blackpool 1609 18

    *Bolton 1558 10

    *Bootle 1807 15

    Bournemouth 1299 78

    *Bradford 1637 10

    Brighton 1359 84

    Bristol 1392 73

    *Burnley 1755 12

    Cardiff 1519 21

    Coventry 1307 78

    Croydon 1254 96

    *Darlington 1491 20

    *Derby 1555 39

    *Doncaster 1428 39

    EastHam 1318 122

    Exeter 1260 21

    *Gateshead 1723 44

    *Grimsby 1379 94

    *Halifax 1742 8

    *Hudders.eld 1574 9

    *Hull 1569 91

    Ipswich 1096 138

    *Leeds 1591 16

    Leicester 1402 37

    *Liverpool 1772 15

    *Manchester 1828 8

    *Middlesbrough 1704 26

    *Newcastle 1702 44

    Newport 1581 14

    Northampton 1309 59

    Norwich 1259 133

    *Nottingham 1427 27

    *Oldham 1724 6

    Oxford 1175 107

    Plymouth 1486 5

    Portsmouth 1456 90

    *Preston 1696 6

    Reading 1236 101

    *Rochdale 1711 13

    *Rotherham 1444 14

    *StHelens 1591 49

    *Salford 1987 8

    *Shef.eld 1495 14

    Southampton 1369 68

    Southend 1257 50

    *Southport 1587 75

    *SouthShields 1713 71

    *Stockport 1557 13

    *Stoke 1640 57

    *Sunderland 1709 71

    Swansea 1625 13

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 7

    *Wallasey 1625 20

    Walsall 1527 60

    WestBromwich 1627 53

    WestHam 1486 122

    Wolverhampton 1485 81

    *York 1378 71

    ;

    run;

    proc print data=water;

    run;

    proc univariate data=water normal;

    var mortal hardness;

    histogram mortal hardness /normal;

    probplot mortal hardness;

    run;

    The meaning of some of the other statistics printed in these displays are as follows:

    Abbreviation Meaning Uncorrected SS Uncorrected sum of squares; simply the sum of squares of the

    observations

    Corrected SS Corrected sum of squares; simply the sum of squares of deviations

    of the observations from the sample mean

    Coeff Variation Coefficient of variation; the standard deviation divided by the mean and multiplied by 100

    Std Error Mean Standard deviation divided by the square root of the number of

    observations

    Range Difference between largest and smallest observation in the sample

    Interquartile Range Difference between the 25% and 75% quantiles (see values of

    quantiles given later in display to confirm)

    Student’s t Student’s t -test value for testing that the population mean is zero

    Pr>|t| Probability of a greater absolute value for

    Sign Test Nonparametric test statistic for testing whether the population

    median is zero

    Pr>|M| Approximation to the probability of a greater absolute value for the Sign test under the hypothesis that the population median is zero

    Signed Rank Nonparametric test statistic for testing whether the population mean

    is zero

    Pr>=|S| Approximation to the probability of a greater absolute value for the

    Sign Rank statistic under the hypothesis that the population

    mean is zero

    Shapiro-Wilk W Shapiro-Wilk statistic for assessing the normality of the data and the

    corresponding P-value (Shapiro and Wilk [1965])

    Kolmogorov-Smirnov D Kolmogorov-Smirnov statistic for assessing the normality of the data and the corresponding P-value (Fisher and Van Belle [1993])

    Cramer-von Mises W-sq Cramer-von Mises statistic for assessing the normality of the data

    and the associated P-value (Everitt [1998])

    Anderson-Darling A-sq Anderson-Darling statistic for assessing the normality of the data and the associated P-value (Everitt [1998])

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 8

    OUTPUTS

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 9

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 10

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 11

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 12

    proc gplot;

    plot mortal*hardness;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 13

    CORRELATION

    proc corr data=water pearson spearman;

    var mortal hardness;

    by town;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 14

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 15

    KERNEL DENSITY ESTIMATE

    proc kde data=water out=bivest;

    var mortal hardness;

    proc g3d data=bivest;

    plot hardness*mortal=density;

    run;

    where KDE (Kernel Density Estimate)

    T-TEST

    data water;

    set water;

    lhardnes=log(hardness);

    if hardness < 100 then T = 1;

    else T=2;

    proc ttest;

    class T;

    var mortal hardness lhardnes;

    proc npar1way wilcoxon;

    class T;

    var hardness;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 16

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 17

    Example for application

    The questionnaire which is considered the source of this data is existed in appendices.

    The following data is part of real data collected through the questionnaire.

    data sasuser.book3;

    input ser p1 p2 p3 p4 p5 q1 q2 q3 q4 q5

    q6 q7 q8 q9 q10 q11 q12 q13 q14;

    datalines;

    1 2 2 2 3 3 4 4 2 2 4 3

    3 3 3 4 4 3 3 3

    2 1 2 2 2 1 4 5 4 4 3 2

    1 5 3 4 4 4 3 1

    3 1 2 1 3 3 4 4 4 5 4 4

    4 4 4 4 4 4 4 4

    4 1 3 1 3 4 5 5 5 5 5 5

    5 5 5 5 5 5 5 5

    5 1 2 2 2 3 4 4 4 4 4 4

    4 4 4 5 4 4 4 5

    6 1 2 2 2 1 2 2 1 2 3 2

    1 2 1 2 3 2 3 2

    7 1 2 2 3 1 3 2 2 3 3 2

    3 2 3 3 3 3 2 3

    8 1 1 1 3 1 2 2 1 1 2 2

    2 2 2 1 1 2 1 1

    9 2 3 2 3 4 1 2 2 2 1 2

    1 2 2 2 1 1 2 3

    10 1 3 2 3 1 4 3 3 2 4 3

    2 5 5 2 4 3 5 4

    11 1 3 2 3 1 5 5 4 3 4 4

    4 4 5 5 5 4 4 4

    12 1 3 2 3 1 5 4 4 4 4 4

    4 4 4 3 3 3 2 2

    13 1 3 2 3 2 4 4 4 4 3 3

    3 4 4 4 4 4 4 4

    14 2 3 1 2 4 3 4 4 3 2 4

    1 5 3 2 3 3 3 2

    15 2 2 2 1 2 4 4 4 3 3 2

    3 3 3 3 3 4 3 4

    16 2 2 1 1 2 4 2 5 3 2 2

    3 5 3 4 4 1 3 2

    17 2 2 2 2 2 4 3 3 2 3 3

    4 4 4 4 5 5 5 3

    18 2 2 4 2 1 2 1 4 1 2 2

    2 1 2 2 3 3 2 1

    19 2 3 2 2 4 4 4 4 4 3 4

    3 4 4 5 4 4 4 5

    20 2 3 1 2 4 4 4 4 3 4 3

    2 4 3 4 4 4 4 3

    run; or

    proc import out=sasuser.book3

    datafile="C:\Users\Mohd

    KHALAF\Desktop\SAS_Training\samples\book3.xls"

    DBMS= Excel Replace;

    GETNAMES=YES;

    Run;

    To see the file contents use the following procedure:

    proc contents data=sasuser.book3;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 18

    run;

    The output will be as follow introducing complete information about the database:

    ANALYSIS OF EACH VARIABLE SEPARATELY

    To analyze the previous data we need first to describe each of the demographic

    variables alone. The second stage will describe the other questions through finding the

    proper analysis to figure out the trends of sample for each of these paragraphs. To

    start our analysis we find the frequencies and percentage for each demographic

    questions, then find the distribution of different demographic on each other and

    testing if that distribution is significant or not.

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 19

    If the frequency will be done for all variables in database we use the following

    command:

    proc freq data=sasuser.book3;

    run;

    Output will be:

    But as the variables of the second part of the questionnaire can be analyzed other type

    of tests and it is not sense to do frequency or any type of analysis for serial not

    variable (ser). Then the frequencies will be made for p1 to p5 only using the following

    procedure:

    proc freq data=sasuser.book3;

    tables p1 p2 p3 p4 p5;

    run;

    The output will be as follow:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 20

    Through the previous tables, it is possible to describe the first five demographic

    variables separately.

    To make our output more readable, the variable labels and value labels should be

    added. To add variable labels, the following program can be used:

    data sasuser.book3;

    set sasuser.book3;

    label p1= "الجنس"

    p2 = "العمر"

    p3= "المستوى التعليمي"

    p4= "المستوى اإلداري"

    p5= "عدد سنوات الخبرة"

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 21

    run;

    proc contents data=sasuser.book3;

    run;

    The output of the contents procedures will be as follow:

    The results for the analysis for the frequency for p1 and p2 will be as follow if using

    the procedure:

    proc freq data=sasuser.book3;

    tables p1 p2;

    run;

    The results will be as follow showing the labels:

    To add value labels for variables, the procedure will be as follow:

    data sasuser.book3;

    set sasuser.book3;

    proc format;

    value p1f 1="ذكر"

    ;"أنثى"=2

    value p2f 1="أقل من 52 سنة"

    "اقل من 52 –25"=2

    "اقل من 52- 35"=3

    "أقل من 22- 45"=4

    ;"فأكثر 55"=5

    value p3f 1="دبلوم متوسط فأقل"

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 22

    "بكالوريوس"=2

    "دبلوم عالي"=3

    "ماجستير"=4

    ;"دكتوراه"=5

    value p4f 1="إدارة عليا"

    "إدارة وسطى"=2

    ;"إدارة إشرافيه"=3

    value p5f 1="أقل من 2 سنوات"

    "أقل من 01 سنوات– 5"=2

    "أقل من 02 سنة – 10"=3

    ;"سنة فأكثر 15"=4

    run;

    proc freq data=sasuser.book3;

    format p1 p1f.

    p2 p2f.

    p3 p3f.

    p4 p4f.

    p5 p5f.;

    tables p1 p2 p3 p4 p5;

    run;

    The output will be follow:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 23

    To have more information about the demographic features of the studied sample,

    crosstabualtion will make it possible to do so and use the following procedure:

    proc freq data=sasuser.book3;

    proc format;

    value p1f 1="ذكر"

    ;"أنثى"=2

    value p2f 1="أقل من 52 سنة"

    "اقل من 52 –25"=2

    "اقل من 52- 35"=3

    "أقل من 22- 45"=4

    ;"فأكثر 55"=5

    value p3f 1="دبلوم متوسط فأقل"

    "بكالوريوس"=2

    "دبلوم عالي"=3

    "ماجستير"=4

    ;"دكتوراه"=5

    value p4f 1="إدارة عليا"

    "إدارة وسطى"=2

    ;"إدارة إشرافيه"=3

    value p5f 1="أقل من 2 سنوات"

    "أقل من 01 سنوات– 5"=2

    "أقل من 02 سنة – 10"=3

    ;"سنة فأكثر 15"=4

    run;

    proc freq data=sasuser.book3;

    format p1 p1f.

    p2 p2f.

    p3 p3f.

    p4 p4f.

    p5 p5f.;

    tables p1*p2 p1*p3 p1*p4 p1*p5/chisqr;

    run;

    The output for this analysis is as follow:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 24

    To analyze the second part of the questionnaire, descriptive statistics will be used

    concentrating on the use of mean and standard deviation for the questions q1-q14; the

    following procedure can be used:

    proc means data=sasuser.book3;

    run;

    Which will give means for all database variables as follow:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 25

    As it is not since to include the first part that has been already analyzed, then the

    following procedure will be followed to get the means for the second part only as

    follow:

    proc means data=sasuser.book3;

    var q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q13 q14_;

    run;

    The output will be:

    If the output needs to be limited to mean or any other output the procedure will be as

    follow:

    proc means N mean std data=sasuser.book3;

    var q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q13 q14_;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 26

    If it was recognized that the name of variable q14 was written wrongly to q14_. The

    name change of variable q14_ to q14 can be done using the following procedure:

    data sasuser.book4;

    set sasuser.book3;

    rename q14_=q14;

    run;

    To read the output with much not necessary decimals makes dealing with output

    disturbing. To minimize the number of decimal to the number preferred, the statement

    can be used (maxdec=2) and can be used with the procedure as follow:

    proc means N mean std maxdec=2 data=sasuser.book4;

    var q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14;

    run;

    The output will be as follow:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 27

    To insure that the mean results are correct, the scale of agreements should be (5) for

    absolutely agree and (1) for absolutely not agree. This indicates that the means are not

    correct for q1 to q14 as the codes given to the agreements are on the contrary order.

    To correct the codes, recode process should be done to change (5 to 1), (4 to 2), (3 to

    3), (2 to 4) and (1 to 5).

    data sasuser.book3;

    set sasuser.book3;

    qq1 = 6- q1;

    qq2 = 6- q2;

    qq3 = 6- q3;

    qq4 = 6- q4;

    qq5 = 6- q5;

    qq6 = 6- q6;

    qq7 = 6- q7;

    qq8 = 6- q8;

    qq9 = 6- q9;

    qq10 = 6- q10;

    qq11 = 6- q11;

    qq12 = 6- q12;

    qq13 = 6- q13;

    qq14 = 6- q14;

    run;

    proc freq data=sasuser.book4;

    tables q1-q9;

    run;

    To have complete and comprehensive analysis, this requires the distribution of results

    in questions q1 to q14 by the demographic variables available. To do so, The

    procedure used is as follow:

    proc sort data=sasuser.book4;

    by p1;

    run;

    proc means data=sasuser.book4;

    var q1;

    by p1;

    run;

    The output for the distribution will be:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 28

    The previous analysis can be done for all variables q1-q14 in one step as follow:

    proc sort data=sasuser.book4;

    by p1;

    run;

    proc means data=sasuser.book4;

    var q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14;

    by p1;

    run;

    The output will be:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 29

    The same analysis will be conducted with p2 up to p3.

    T-TEST

    One sample t test

    There are three type of t-test than can be applied on the running example. The first

    type of t-test is one sample t-test.

    proc ttest h0=3 alpha=0.1 data=sasuser.book4;

    var q1;

    run;

    The output will be:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 30

    The same test can be repeated for q2 to q14 to measure if there is significant

    differences from the hypothetical mean for the variables.

    proc ttest h0=3 alpha=0.1 data=sasuser.book4;

    var q1-q14;

    run;

    Paired t-test

    The second type of t-test that can be applied to the current questionnaire is the paired

    t-test.

    proc ttest data=sasuser.book4;

    paired q1*q4 q1*q2;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 31

    CORRELATION TEST

    To figure out if two variables are correlation to each other or not correlation test is

    used. The procedure of correlation will be:

    proc corr data=sasuser.book4;

    var q1 q2;

    run;

    The output will be:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 32

    The result shows that there is high correlation between the two variables which

    matches the paired sample t-test.

    Independent sample t-test

    The third type of hypothesis testing concerning t-test is the independent sample t-

    test. This test includes two variables. The first variable should be categorical while

    the second variable should be continuous. So, the test can be done between the sex

    (p1) vs q1 to q14. The procedure applied will be as follow:

    proc ttest data=sasuser.book4;

    class p1;

    var q1-q14;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 33

    ANOVA

    To test the effect of educational level, employee position and experience on the

    attitudes for the questions q1 to q14, the analysis of variance will be used. The

    procedure applied in the analysis of variance will be:

    proc anova data=sasuser.book4;

    class p2;

    model q1 = p2;

    run;

    To run the process correctly the tests in general are not hold for each question. A look

    to the questionnaire in Appendix I shows that q1-q7 represent one field, while q8-q14

    represents another field in the survey. So, the mean for each field should be calculated

    in new variable to be used to be tested by the demographic variables. The process can

    be done as follow:

    data sasuser.book4;

    set sasuser.book4;

    q = (q1 + q2 + q3 + q4 + q5 + q6 + q7)/7;

    run;

    and for the q8 to q14:

    data sasuser.book4;

    set sasuser.book4;

    qq = (q8 + q9 + q10 + q11 + q12 + q13)/6;

    run;

    Then the ANOVA analysis can be handled with q and qq with the demographic

    variables.

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 34

    REGRESSION ANALYSIS

    If the effect of q will be measured on qq, this effect can be measured using regression

    analysis as analysis tool. The q variable will be independent variable, while qq will be

    dependent variable in the regression analysis. The procedure is as follow:

    proc reg data=sasuser.book4;

    model qq=q;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 35

    ODS IN SAS

    The Output Delivery System (ODS) provides a way to manage SAS output. The SAS

    output can be directed to be received by other software. It can be received in Rich

    Text Format, HTML, or other forms. The output can be read to other software using

    the following procedure:

    ODS RTF;

    proc means N mean std maxdec=2 data=sasuser.book4;

    var q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q13 q14;

    run;

    ODS RTF Close;

    The output will be move to Microsoft office in Rich Text Format file. The output will

    as follow inside SAS:

    In Microsoft Office it looks like:

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 36

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 37

    APPENDICES

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 38

    Questionnaire

    بسم اهلل الرحمن الرحيم السيد/السيدة المحترم

    تحية طيبة وبعد،،،

    استبانه تهدف إلى قياس أثر التدريب في تحسين أداء العاملين. لزيادة تهدف الدراسة تقديم مقترحات لتحسين البرامج التدريبية التي تعتمدها مؤسستكم الموقرة

    أدائها بما يخدم تحقيقها ألهدافكم.

    وأن حرصكم على تقديم البيانات والمعلوماات المطلوباة بدقاة وموياوعية سيسااهم وف ا فاي مساعدة في التوصل إلى نتاائج أد وتقاديم توصايات ات الالتوصل إلى نتائج أفيل، وبالتالي

    ساتمارة المرفقاة، وبماا يتناساب والبارامج فائدة أكبار. لا ا نر،او، التكارم بالت اير علاى فقارات اف .مركزكمالتدريبية المطبقة في

    نر،و العلم، ب ن البيانات والمعلومات التاي ساتوفرونها لها ل الدراسااااة ستساتخدم فقاط أل ارا البحث العلمي، وستعامل بسرية تامة، وسيتم تزويدكم بنتاائج الدراساة فاي حالاة افنتهااء منهاا

    عليها. عبافطالإ ا ر بتم

    اكرين لكم حسن تعاونكم ،،،

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 39

    ال،زء األول : ( في المكان المناسب.Xالر،اء ويع إ ارة )

    الخصائص ال خصية والوظيفية ال،نس : - ) ( أنثى ) ( كر :العمر - 52اقل من - 52) ( 52اقل من –52سنة ) ( 52( أقل من )

    ف كثر. 22 ) ( 22أقل من - 52) ( المستوى التعليمي: - ) ( دبلوم متوسط ف قل ) ( بكالوريوس ) ( دبلوم عالي

    ) ( ما،ستير ) ( دكتورال المستوى اإلداري : -

    إ رافيه) ( إدارة ) ( إدارة وسطى ) ( إدارة عليا عدد سنوات الخبرة : -

    سنوات 01أقل من – 2) ( سنوات 2) ( أقل من سنة ف كثر 02) ( سنة 02أقل من – 01) (

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 40

    ال،زء الثاني: ( في المكان ال ي ترال مناسبا xويع إ ارة )ير،ى

    :)التدريب(المتغيرات المستقلة الفقرة

    أوف : تحديد افحتيا،ات التدريبيةأواف ب دة

    ير مواف مت كد

    ير مواف

    ير مواف ب دة

    في نجاح عملية التدريبساعد تحديد االحتياجات التدريبية بشكل فاعل -1 يتم توضيح األهداف الخاصة بالبرنامج التدريبي بشكل واضح ودقيق -2 يتم اختيار البرامج التدريبية وفق احتياجات العاملين والعمل -3 يتم تصميم البرامج التدريبية بمنهجية علمية -4تحرص اإلدارة على التعرف على احتياجات الموظفين التدريبية لتحسين -5

    مستوى أدائهم

    تحرص اإلدارة على تحديد مواطن ضعف األداء لدى العاملين -6 تبحث اإلدارة عن أسباب الخطأ في األداء وتعمل على التخلص منه -7

    ثانيا : كفاءة برامج التدريب ُتعد البرامج التدريبية المنفذة من افضل الوسائل لتحسين اداء العمل -1 ساهم التدريب في تخفيض االعباء المتعلقة بالوظيفة داخل القسم -2 ساهم توظيف الطرق العلمية المتطورة في زيادة كفاءة برامج التدريب -3ساهمت البرامج التدريبية في اكتساب العاملين مهارات ومعارف تم تطبيقها -4

    في المؤسسة

    ساعدت البرامج التدريبية في امتالك الموظف لروح المنافسة -5 تتيح البرامج التدريبية للعاملين فرصة الممارسة العملية -6 لخلق كوادر متميزةتسهم الشركة بإعداد برامج تدريبية -7

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 41

    SAS t-test Commands

    This handout illustrates how to read in raw data to SAS, set up missing values and create

    new variables using transformations and recodes. We illustrate independent samples t-tests,

    paired t-tests, and one-sample t-tests.

    Read in Raw Data

    In the first data step, we read in the raw data using an infile and input statement. We don't

    need to tell SAS the column location of each variable, because there is at least one blank

    between variables, so we can use a free-format input statement where the variables are

    simply listed in the order they appear in the raw data file.

    /*Read in the raw data*/

    data owen;

    infile "owen.dat" ;

    input family child age sex race w_rank income_c height weight hemo

    vit_c vit_a head_cir fatfold b_weight mot_age b_order

    m_height

    f_height ;

    run;

    Create a Permanent Dataset

    After reading in the raw data, we create a new permanent SAS dataset in which we set up

    missing values and create new variables using recodes and transformations. Note in setting

    up the missing value codes, a dot (.) is used for the missing value code and no quotes are

    employed, because all of these variables are numeric. Although we used two data steps in

    this example, all of this code could have been accomplished in a single data step.

    libname b510 "c:\documents and settings\kwelch\desktop\b510";

    data b510.owen;

    set owen;

    if height = 999 then height = .;

    if weight = 999 then weight = .;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 42

    if vit_a = 99 then vit_a = .;

    if head_cir = 99 then head_cir = .;

    if fatfold = 99 then fatfold = .;

    if b_weight = 999 then b_weight= .;

    if mot_age = 99 then mot_age = .;

    if b_order = 99 then b_order = .;

    if m_height = 999 then m_height=.;

    if f_height = 999 then f_height=.;

    bwt_g = b_weight*10;

    if bwt_g not=. and bwt_g < 2500 then lowbwt=1;

    if bwt_g >=2500 then lowbwt=0;

    log_fatfold = log(fatfold);

    htdiff = f_height - m_height;

    bmi = weight /(height/100)**2;

    run;

    Basic Descriptive Statistics

    It is always good practice to check a dataset after you have created it. Proc Means is useful

    for numeric variables. Be especially attentive to the number of observations (N) and the

    minimum and maximum value for each variable. Check to see that they are reasonable.

    /*Simple Descriptive Statistics on all Numeric Variables*/

    proc means data=b510.owen;

    run;

    The MEANS Procedure

    The MEANS Procedure

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 43

    Variable N Mean Std Dev Minimum

    Maximum

    -----------------------------------------------------------------------------

    ------

    family 1006 4525.11 1634.03 2000.00 7569.00

    child 1006 1.3359841 0.5716672 1.0000000 3.0000000

    age 1006 44.0248509 16.6610452 12.0000000 73.0000000

    sex 1006 1.4890656 0.5001291 1.0000000 2.0000000

    race 1006 1.2823062 0.4503454 1.0000000 2.0000000

    w_rank 1006 2.2127237 0.9024440 1.0000000 4.0000000

    income_c 1006 1581.31 974.2279710 80.0000000 6250.00

    height 1001 99.0429570 11.4300111 70.0000000 130.0000000

    weight 1000 15.6290800 3.6523446 8.2400000 41.0800000

    hemo 1006 12.4606362 1.1578850 6.2000000 24.1000000

    vit_c 1006 1.1302187 0.6599121 0.1000000 3.5000000

    vit_a 763 36.0380079 8.8951237 15.0000000 78.0000000

    head_cir 999 49.3763764 2.0739057 39.0000000 56.0000000

    fatfold 993 4.4562941 1.6683194 2.6000000 42.0000000

    b_weight 986 325.0517241 59.5162936 91.0000000 544.0000000

    mot_age 981 29.2660550 6.2603025 17.0000000 51.0000000

    b_order 980 2.9479592 2.1939526 1.0000000 16.0000000

    m_height 980 163.7632653 6.3663343 122.0000000 199.0000000

    f_height 975 178.2194872 7.3821354 152.0000000 210.0000000

    bwt_g 986 3250.52 595.1629357 910.0000000 5440.00

    lowbwt 986 0.1075051 0.3099115 0 1.0000000

    log_fatfold 993 1.4599658 0.2396859 0.9555114 3.7376696

    htdiff 972 14.4218107 8.7834139 -12.0000000 56.0000000

    bmi 998 15.8124399 1.6634700 11.0247934 26.2912000

    -----------------------------------------------------------------------------------

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 44

    Descriptives for Subgroups using a Class Statement

    A Class statement can be used with Proc Means to get descriptive statistics for subgroups of

    cases. You don't have to sort the data when using a class statement.

    proc means data=b510.owen;

    class sex;

    var bwt_g bmi fatfold log_fatfold;

    run;

    The MEANS Procedure

    N

    SEX Obs Variable Label N Mean Std Dev

    Minimum Maximum

    --------------------------------------------------------------------------------------

    ----------------------

    1 514 bwt_g 497 3340.56 565.3268435 1360.00

    5170.00

    bmi 510 15.8982386 1.6074313 11.3795135 26.2912000

    FATFOLD FATFOLD 507 4.2518738 0.9720458 2.6000000 10.2000000

    log_fatfold 507 1.4247028 0.2076417 0.9555114 2.3223877

    2 492 bwt_g 489 3159.00 611.1350784 910.0000000

    5440.00

    bmi 488 15.7227732 1.7171565 11.0247934 24.4485835

    FATFOLD FATFOLD 486 4.6695473 2.1489049 2.6000000 42.0000000

    log_fatfold 486 1.4967524 0.2643232 0.9555114 3.7376696

    --------------------------------------------------------------------------------------

    Descriptives for Subgroups using a By Statement

    A By statement is another way to get information for subgroups of cases. You need to sort

    the data first when using a By statment. The By statement is more generally applicable than

    the Class statement and can be used with most SAS procedures (e.g. Proc Reg, Proc Freq). To

    avoid too much output, use a By statement only for variables that have a limited number of

    levels.

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 45

    proc sort data=b510.owen;

    by sex;

    run;

    proc means data=b510.owen;

    by sex;

    var bwt_g bmi fatfold log_fatfold;

    run;

    -------------------------------------------- SEX=1 -----------------------------------

    The MEANS Procedure

    Variable Label N Mean Std Dev Minimum

    Maximum

    --------------------------------------------------------------------------------------

    bwt_g 497 3340.56 565.3268435 1360.00

    5170.00

    bmi 510 15.8982386 1.6074313 11.3795135

    26.2912000

    FATFOLD FATFOLD 507 4.2518738 0.9720458 2.6000000

    10.2000000

    log_fatfold 507 1.4247028 0.2076417 0.9555114

    2.3223877

    --------------------------------------------------------------------------------------

    -------------------------------------------- SEX=2 -----------------------------------

    Variable Label N Mean Std Dev Minimum

    Maximum

    --------------------------------------------------------------------------------------

    bwt_g 489 3159.00 611.1350784 910.0000000

    5440.00

    bmi 488 15.7227732 1.7171565 11.0247934

    24.4485835

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 46

    FATFOLD FATFOLD 486 4.6695473 2.1489049 2.6000000

    42.0000000

    log_fatfold 486 1.4967524 0.2643232 0.9555114

    3.7376696

    --------------------------------------------------------------------------------------

    Boxplots

    Boxplots are a nice way to visualize data when you wish to compare the value of a

    continuous variable for two or more groups. In SAS 9.2, you can use Proc Sgplot to get

    boxplots. Proc Boxplot can be used in earlier versions of SAS, and in SAS 9.2.

    /*Boxplots*/

    proc sgplot data=b510.owen;

    vbox bwt_g / category=sex;

    run;

    proc sgplot data=b510.owen;

    vbox bmi / category=sex;

    run;

    proc sgplot data=b510.owen;

    vbox fatfold / category=sex;

    run;

    proc sgplot data=b510.owen;

    vbox log_fatfold / category=sex;

    run;

    The boxplots show the median, upper and lower quartiles, give an idea of skewness, and

    indicate outliers.

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 47

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 48

    Independent Samples t-test

    An independent samples t-test can be used to compare the mean of a continuous

    variable (e.g., birthweight), for two groups of cases. In this example, we are

    comparing the means of BWT_G, WEIGHT, and LOG_FATFOLD for females vs.

    males. Notice that Proc ttest uses a class statement for an independent samples t-

    test—no sorting of the data is necessary.

    The assumptions for the t-test are that the observations are independent (i.e., the

    values of individuals are not correlated), that the underlying distribution of the

    continuous variable is normal within the two groups, and that the variances in the two

    groups are equal. The t-test is robust to departures from the normality assumption, if

    the sample size is large (e.g. 50 or more cases). The equality of variances is a more

    important assumption. SAS gives a test of equality of variances at the bottom of the t-

    test output. If equality of variances is a reasonable assumption, the F-test for equality

    of variances will not be significant. We often use a somewhat higher alpha level than

    usual for this equality of variances test (e.g., p>.10) to be more conservative (i.e., we

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 49

    don't want to wrongly assume equal variances, when in fact they are unequal). SAS

    produces two different t-test results, the first one assumes equality of variances and

    the second one does not. You can choose the test to use based on the results of the

    equality of variances test. By default, SAS always reports a two-sided p-value for the

    t-test.

    proc ttest data=b510.owen;

    class sex;

    var bwt_g weight log_fatfold;

    run;

    Variable: bwt_g

    SEX N Mean Std Dev Std Err Minimum Maximum

    1 497 3340.6 565.3 25.3584 1360.0 5170.0

    2 489 3159.0 611.1 27.6365 910.0 5440.0

    Diff (1-2) 181.6 588.5 37.4840

    SEX Method Mean 95% CL Mean Std Dev 95% CL

    Std Dev

    1 3340.6 3290.7 3390.4 565.3 532.2

    602.8

    2 3159.0 3104.7 3213.3 611.1 575.1

    652.0

    Diff (1-2) Pooled 181.6 108.0 255.1 588.5 563.6

    615.7

    Diff (1-2) Satterthwaite 181.6 108.0 255.2

    Method Variances DF t Value Pr > |t|

    Pooled Equal 984 4.84

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 50

    Variable: bmi

    SEX N Mean Std Dev Std Err Minimum Maximum

    1 510 15.8982 1.6074 0.0712 11.3795 26.2912

    2 488 15.7228 1.7172 0.0777 11.0248 24.4486

    Diff (1-2) 0.1755 1.6620 0.1052

    SEX Method Mean 95% CL Mean Std Dev 95% CL

    Std Dev

    1 15.8982 15.7584 16.0381 1.6074 1.5145

    1.7126

    2 15.7228 15.5700 15.8755 1.7172 1.6158

    1.8322

    Diff (1-2) Pooled 0.1755 -0.0311 0.3820 1.6620 1.5921

    1.7383

    Diff (1-2) Satterthwaite 0.1755 -0.0314 0.3823

    Method Variances DF t Value Pr > |t|

    Pooled Equal 996 1.67 0.0958

    Satterthwaite Unequal 984.1 1.66 0.0963

    Equality of Variances

    Method Num DF Den DF F Value Pr > F

    Folded F 487 509 1.14 0.1407

    Variable: log_fatfold

    SEX N Mean Std Dev Std Err Minimum Maximum

    1 507 1.4247 0.2076 0.00922 0.9555 2.3224

    2 486 1.4968 0.2643 0.0120 0.9555 3.7377

    Diff (1-2) -0.0720 0.2371 0.0151

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 51

    SEX Method Mean 95% CL Mean Std Dev 95% CL

    Std Dev

    1 1.4247 1.4066 1.4428 0.2076 0.1956

    0.2213

    2 1.4968 1.4732 1.5203 0.2643 0.2487

    0.2821

    Diff (1-2) Pooled -0.0720 -0.1016 -0.0425 0.2371 0.2271

    0.2480

    Diff (1-2) Satterthwaite -0.0720 -0.1017 -0.0424

    Method Variances DF t Value Pr > |t|

    Pooled Equal 991 -4.79

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 52

    N Mean Std Dev Std Err Minimum Maximum

    972 14.4218 8.7834 0.2817 -12.0000 56.0000

    Mean 95% CL Mean Std Dev 95% CL Std Dev

    14.4218 13.8689 14.9747 8.7834 8.4096 9.1923

    DF t Value Pr > |t|

    971 51.19

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 53

    Mean 95% CL Mean Std Dev 95% CL Std Dev

    14.4352 13.6374 15.2331 9.0257 8.4958 9.6266

    DF t Value Pr > |t|

    493 35.55 |t|

    477 36.91 | t |).

    proc ttest data=b510.owen;

    var htdiff;

    run;

    The TTEST Procedure

    Variable: htdiff

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 54

    N Mean Std Dev Std Err Minimum Maximum

    972 14.4218 8.7834 0.2817 -12.0000 56.0000

    Mean 95% CL Mean Std Dev 95% CL Std Dev

    14.4218 13.8689 14.9747 8.7834 8.4096 9.1923

    DF t Value Pr > |t|

    971 51.19 |t|

    971 -2.05 0.0404

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 55

    One-sample t-test using Proc Univariate

    Proc Univariate can also be used to carry out a one-sample t-test, to get more

    information about the distribution of a variable, and to look at a histogram of the

    distribution of the variable.

    proc univariate data=b510.owen;

    var htdiff;

    histogram / normal;

    run;

    The UNIVARIATE Procedure

    Variable: htdiff

    Moments

    N 972 Sum Weights 972

    Mean 14.4218107 Sum Observations 14018

    Std Deviation 8.78341392 Variance 77.1483601

    Skewness 0.31703251 Kurtosis 0.56094005

    Uncorrected SS 277076 Corrected SS 74911.0576

    Coeff Variation 60.9036833 Std Error Mean 0.28172813

    Basic Statistical Measures

    Location Variability

    Mean 14.42181 Std Deviation 8.78341

    Median 15.00000 Variance 77.14836

    Mode 15.00000 Range 68.00000

    Interquartile Range 12.00000

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 56

    Tests for Location: Mu0=0

    Test -Statistic- -----p Value------

    Student's t t 51.19052 Pr > |t| = |M| = |S|

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 57

    Missing Values

    -----Percent Of-----

    Missing Missing

    Value Count All Obs Obs

    . 34 3.38 100.00

    Fitted Normal Distribution for htdiff

    Parameters for Normal Distribution

    Parameter Symbol Estimate

    Mean Mu 14.42181

    Std Dev Sigma 8.783414

    Goodness-of-Fit Tests for Normal Distribution

    Test ----Statistic----- ------p Value------

    Kolmogorov-Smirnov D 0.07149425 Pr > D W-Sq A-Sq

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 58

    10.0 3.0000 3.16541

    25.0 8.0000 8.49749

    50.0 15.0000 14.42181

    75.0 20.0000 20.34613

    90.0 25.0000 25.67821

    95.0 29.0000 28.86924

    99.0 37.0000 34.85509

    One-sample t-test using Proc Univariate with a specified null hypothesis value for the

    mean

    We can also specify a null hypothesis value for the mean when using Proc Univariate by

    using the mu0 option.

    proc univariate data=b510.owen mu0=15;

    var htdiff;

    run;

    Tests for Location: Mu0=15

    Test -Statistic- -----p Value------

    Student's t t -2.0523 Pr > |t| 0.0404

    Sign M -40 Pr >= |M| 0.0071

    Signed Rank S -18300 Pr >= |S| 0.0121

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 59

    SAS Simple Linear Regression Example

    This handout gives examples of how to use SAS to generate a simple linear regression plot, check the correlation

    between two variables, fit a simple linear regression model, check the residuals from the model, and also shows

    some of the ODS (Output Delivery System) output in SAS.

    Read in Raw Data

    We first read in the raw data from the werner2.dat raw dataset, and set up the missing value codes using a data

    step, and then check descriptive statistics for the numeric variables, using Proc Means.

    OPTIONS FORMCHAR="|----|+|---+=|-/\*";

    libname b510 "C:\Users\kwelch\Desktop\B510";

    DATA b510.werner;

    INFILE "C:\Users\kwelch\Desktop\B510\werner2.dat";

    INPUT ID 1-4 AGE 5-8 HT 9-12 WT 13-16

    PILL 17-20 CHOL 21-24 ALB 25-28 1

    CALC 29-32 1 URIC 33-36 1;

    IF HT = 999 THEN HT = .;

    IF WT = 999 THEN WT = .;

    IF CHOL = 600 THEN CHOL = .;

    IF ALB = 99 THEN ALB = .;

    IF CALC = 99 THEN CALC = .;

    IF URIC = 99 THEN URIC = .;

    run;

    /*Check the Data*/

    title "DESCRIPTIVE STATISTICS";

    proc means data=b510.werner;

    run;

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 60

    DESCRIPTIVE STATISTICS

    The MEANS Procedure

    Variable N Mean Std Dev Minimum Maximum

    -------------------------------------------------------------------------------

    ID 188 1598.96 1057.09 3.0000000 3519.00

    AGE 188 33.8191489 10.1126942 19.0000000 55.0000000

    HT 186 64.5107527 2.4850673 57.0000000 71.0000000

    WT 186 131.6720430 20.6605767 94.0000000 215.0000000

    PILL 188 1.5000000 0.5013351 1.0000000 2.0000000

    CHOL 187 235.1550802 44.5706219 50.0000000 390.0000000

    ALB 186 4.1112903 0.3579694 3.2000000 5.0000000

    CALC 185 9.9621622 0.4795556 8.6000000 11.1000000

    URIC 187 4.7705882 1.1572312 2.2000000 9.9000000

    -------------------------------------------------------------------------------

    Correlation

    We now check the correlation between the response (or dependent) variable, CHOL, and the predictor (or

    independent) variable, AGE. It is positive, and significant (r = .369, p

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 61

    Variable N Mean Std Dev Sum Minimum Maximum

    AGE 188 33.81915 10.11269 6358 19.00000 55.00000

    CHOL 187 235.15508 44.57062 43974 50.00000 390.00000

    Pearson Correlation Coefficients

    Prob > |r| under H0: Rho=0

    Number of Observations

    AGE CHOL

    AGE 1.00000 0.36923

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 62

    Simple Linear Regression

    We now fit a linear regression model, with CHOL as the Y (dependent or outcome) variable and AGE as the X

    (independent or predictor) variable, using Proc Reg. We first illustrate the most basic Proc Reg syntax, and then

    show some useful options. The Quit statement is used to tell SAS that there are no more statements coming for

    this run of Proc Reg.

    The output shows that there is a positive relationship between these two variables. When age increases by one

    year, average cholesterol is predicted to increase by 1.62 units, and this is a significant relationship (t(185) = 5.40,

    p

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 63

    Simple Linear Regression Model with no options

    The REG Procedure

    Model: MODEL1

    Dependent Variable: CHOL

    Number of Observations Read 188

    Number of Observations Used 187

    Number of Observations with Missing Values 1

    Analysis of Variance

    Sum of Mean

    Source DF Squares Square F Value Pr > F

    Model 1 50373 50373 29.20 |t|

    Intercept 1 179.96174 10.65564 16.89

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 64

    We now include some diagnostic plots using Proc Reg. We also generate a new dataset called OUTREG1 that

    contains all of the original variables, plus the predicted value for each observation (PREDICT), the residual (RESID)

    and the studentized-deleted residual (RSTUD), and Cook's Distance (COOKD)..

    ods graphics on;

    title "Simple Linear Regression with Diagnostic Plots";

    proc reg DATA=B510.werner;

    MODEL CHOL=AGE / stb clb;

    OUTPUT OUT=OUTREG1 P=PREDICT R=RESID RSTUDENT=RSTUDENT COOKD=COOKD;

    run;quit;

    ods graphics off;

    The partial output below shows the standardized estimate (obtained with the STB option), which shows the

    estimated change in Y (in standard deviation units) when X is increased by one standard deviation. This estimate

    is 0.369. We also see the 95% Confidence limits for the parameter estimate, which are form 1.03 to 2.22.

    Parameter Estimates

    Parameter Standard Standardized

    Variable DF Estimate Error t Value Pr > |t| Estimate

    Intercept 1 179.96174 10.65564 16.89

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 65

    The diagnostic panel shows a series of diagnostic plots for this regression model.

    The residual plot below shows a scatterplot with the residuals on the Y-axis and AGE on the X-axis. We want to

    look for a lack of pattern in these residuals. We can see that there is one low outlier, at about age 25.

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 66

    The fit plot shown below shows the regression model fit, and summarizes some of the statistics for the model.

    Check the output dataset

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 67

    We now check the output dataset, using Proc Print. We also request that Proc Print display the labels for the

    each variable, by using the Label option. We print selected variables for those observations with the absolute

    value of the studentized deleted residuals being greater than or equal to 3, using a Where statement.

    title "Partial Listing of Output Dataset";

    proc print data=outreg1;

    where abs(rstud) >=3;

    VAR ID AGE CHOL PREDICT RESID RSTUD COOKD LCL UCL LCLM UCLM;

    run;

    Partial Listing of Output Dataset

    Obs ID AGE CHOL PREDICT RESID RSTUD COOKD LCL UCL LCLM UCLM

    4 1797 25 50 220.686 -170.686 -4.32214 0.081802 138.358 303.014 212.698 228.674

    182 3134 50 390 261.410 128.590 3.20326 0.094792 178.695 344.126 250.106 272.714

    Check the residuals for normality

    We now check the studentized residuals for normality, using Proc Univariate. This is similar to the output from

    the ODS graphics that was shown in the earlier panel.

    title "Checking Residuals for Normality";

    proc univariate data=outreg1 PLOT NORMAL;

    var rstud;

    histogram / normal;

    qqplot / normal(mu=est sigma=est);

    run;

    The residuals appear to be fairly normally distributed, but there is at least one very low outlier, which we

    identified earlier, when we checked the values in the output dataset.

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 68

    Refit the regression model without the cases in question

    We now refit the model, but without the two outliers being included, by using a Where statement..

    ods graphics on;

    title "Rerun the model without two obs";

    proc reg data=b510.WERNER;

    where id not in (1797, 3134);

    model chol=age;

    run;quit;

    ods graphics off;

    We can see the changes in the parameter estimates from the output below.

    Checking Residuals for Normality

    -4.0 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2

    0

    5

    10

    15

    20

    25

    30

    35

    Perc

    ent

    Studentized Residual without Current Obs

    Checking Residuals for Normality

    -3 -2 -1 0 1 2 3

    -6

    -4

    -2

    0

    2

    4

    Stu

    dentized R

    esid

    ual w

    ithout

    Curr

    ent

    Obs

    Normal Quantiles

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 69

    Dependent Variable: CHOL

    Number of Observations Read 186

    Number of Observations Used 185

    Number of Observations with Missing Values 1

    Analysis of Variance

    Sum of Mean

    Source DF Squares Square F Value Pr > F

    Model 1 38478 38478 25.82 |t|

    Intercept 1 186.70039 9.98091 18.71

  • Mohammad KHALAF- [email protected] www.statanalysis.weebly.com page 70