[ppt]botanical extract - upm edutrain interactive learning 13dec2014 edu5950... · web viewwhat is...

Data Analysis Using SPSS

EDU5950SEM1 2014-15

•Assoc. Prof. Dr. Rohani Ahmad Tarmizi•Institute for Mathematical Research/

•Faculty of Educational Studies•UPM

LEARNING OUTCOMESFirst - students will be able to conceptualize importance of choosing appropriate statistical analyses Second – students will be able to conduct DATA ENTRY proceduresThird - students will be able to conduct descriptive statistical analysis and interpret the findingsFourth – students will be able to conduct test of hypotheses of differences and interpret the findingsFifth – students will be able to conduct test of correlation or relationship and interpret the findings

Statistics ANALYSES Some background► As we all know, human beings are complex

entities complete with knowledge , beliefs, feelings, opinions, attitudes, etc.

► Studying human subjects by examining a single independent variable (IV) and a single dependent variable (DV) is truly impractical since these variables do not co-exist in isolation as part of the human mind or set of behaviors.

► These two variables (an IV and the examined DV) may effect or be affected by several other variables.

► In order, to be able to draw conclusions offer accurate explanations of the phenomenon of interest, the researcher should be willing to examine many variables simultaneously.

Variables

EXTERNAL REWARDS INTRINSIC MOTIVATION

Independent Variables

Dependent Variable

Variables

EXTERNAL REWARDS

TASK INTEREST

TASK STRUCTURE

INTRINSIC MOTIVATION


Dependent Variable

Variables

Spiritual well-being Experience Training

Demography:► Gender► Educational levels

Counseling competency► skills► knowledge► awareness

Level of integration of religious

perspectives


Dependent Variable

VariablesCharacteristic studied that assume different values for different elements

Demography:► Gender► Job tenure► Occupational status

Job characteristic:► Work condition► Job demand► Job control Perceived quality of ICT facilities

Career commitment

Quality of work life


Intervening Variable

Dependent Variable

BASIC CONCEPTSTATISTICAL ANALYSIS

MAJOR GROUPS OF HYPOTHESIS TESTINGS

• GROUP DIFFERENCES

• RELATIONSHIP BETWEEN VARIABLES

• PREDICTION OF GROUP MEMBERSHIP

• STRUCTURAL ANALYSES

Group Differences1. t Test (independent t-test)

Compare differences in mean of interval/ratio DV among groups of a qualitative IV. It analyzes differences between means of two group.

There is significant difference in mean literacy performance between male and female preschoolers.

2. t Test (dependent t-test)Compare differences in mean of interval/ratio DV based on paired or matched scores. It analyzes differences between means that are paired/matched from the group.

There is significant difference in mean literacy performance from pre to post remedial program among preschoolers who undergo the remedial program.

Group Differences3. One-Way Analysis of Variance (ANOVA ) and t Test

Compare differences in mean of interval/ratio DV among groups of a qualitative IV. It analyzes variation between and within each group. Since ANOVA determines the group differences and does not identify which groups are significantly different, post hoc tests are usually conducted.

There are significant differences in mean literacy

performance between preschoolers from the low, middle and high SES group.

Group Differences4. One-Way Analysis of Covariance (ANCOVA)

Assess group differences on a single metric DV after the effect of one or more covariates are statistically removed. Covariates are chosen because of their known relationship with the DV.

Do preschoolers of low, middle and high SES have

different literacy test scores after adjusting for family type?There are significant differences in mean literacy performance between preschoolers from the low, middle and high SES group after adjusting for family type.

Group Differences

5. Factorial Analysis of Variance (factorial ANOVA) Comparing differences of one metric DV among groups of several nonmetric IVs and interactions among the Ivs

Does ethnicity and learning preference (IVs)

significantly affect reading achievement, (DVs) among primary school students?

Relationship and Prediction between Variables

6. Bivariate Correlation and RegressionBivariate Correlation assess the degree of relationship between two metric variables.

What is the relationship between motivation

achievement and CGPA of UPM freshman students?

7. In contrast, Bivariate Regression utilizes the relationship between the IV and DV to predict the score of DV from the IV.

To what extend do motivation achievement scores

(IV) predict CGPA of UPM freshman students?

8. Multiple correlation- degree of relationship between one metric DV and a set of metric IVs.

What is the relationship between motivation achievement, learning preference, locus of control (IVs) wtih CGPA of UPM freshman students?

9. Multiple Regression-Objective: to predict changes in the DV in response

to changes to in several IVs-One metric DV-One or more metric IVs

To what extend do motivation achievement scores, learning preference, locus of control (IVs) predict CGPA of UPM freshman students?

Relationship and Prediction between Variables

No. & Type of DV

No. & Type of IV

Test Purpose of Analysis

1 DV 1 IV (2 categories)

t-test Determine significance of mean group differences

1 DV 1 IV (>2 categories)

One-way ANOVA

Determine significance of mean group differences

1 DV ≥ 2 IVs Factorial ANOVA

Determine significance of mean group differences

Decision-making Tree – Test of Group Differences

Null Hypothesis Significance Testing

• This address: • How likely it is to obtain an

observed (i.e sample) result given a specific assumption about the population.

• The assumption about the population is called the null hypothesis (e.g, there is no difference, there is no relationship, there is no predictive model, etc) and the observed result is what the sample produces (e.g.,there is differences, there is relationship, there is predictive model)

Null Hypothesis Significance Testing

• Statistical tests such as z, t, F (ANOVA), etc., determine how likely the sample result or any result more distant from the null hypothesis would be if the null hypothesis were true.

• This probability is then compared to a set criterion which is the set alpha value or the term Type I error or alpha error rate.

• POWER ANALYSIS focuses on situations for which the expectation is that the null hypothesis is false.

Levels of Measurement• Which statistics you can use to analyze your data are determined by the level of measurement of each variable• Four levels of measurement:

• Nominal - you group a variable into classes with no particular order (race, favorite color, etc)

• Ordinal - categories that represent somewhat ranks but you don’t know how much higher or lower, (weight categories (underweight, normal, overweight, obese)

• Interval - Data that have an inherent order and thus resulted in scores hence the data represent a true magnitude

• Ratio - Data that have an inherent order which resulted in scores and has a true 0 point.

• For purposes of choosing statistical analyses, the distinction between interval and ratio is unimportant!!

Levels of Measurement - Quiz1. IQ scores2. Gender3. Income (as a dollar amount)4. Income (in 6 categories)5. Agreement scores (1=strongly disagree, 2=slightly

disagree, 3=neutral, 4=slightly agree, 5=strongly disagree)

6. Cancer status (has cancer, does not have cancer)7. Practice location (rural, urban)8. Cigarette smoking (no. of cig/day)9. Cigarette smoking (none, up to ½ ppd, ½ ppd-<1

ppd, 1 ppd+)

Statistical Tools For Descriptive Analyses

• Frequency/percentage table, • Pie or bar Charts, • Histogram • Frequency Polygon, • Cross-tabulation• Scatter diagram• Mean, Median, Mode, Maximum,

Minimum• Range, Variance, Standard

Deviation, Coefficient of variation, Standard Scores

Statistical Tools For Inferential Statistics

• PARAMETRIC TESTS: – Test of hypothesis of differences

between means - Z-test, t-test, F-test, MANOVA

– Test of hypothesis of relationship – Pearson r, Point-biserial, Regression

• NON-PARAMETRIC TESTS: – Mann-Whitney, – Kruskal Wallis, – Spearman rho, – Chi-Square, Cramer’s V, Lambda,

dll.

In most research projects, it is likely that you will use quite a variety of different types of statistics, depending on the question you are addressing and the nature (level of measurement) of the data that you have.

It is therefore important that you have a basic understanding of Different statistical tools, Type of objectivesResearch questionsHypotheses to address and the underlying

assumptions and requirements.

• TO DESCRIBE MEASURED VARIABLES

• TO COMPARE MEANS or MEDIANS or FREQUENCIES – test of differences

• TO CORRELATE OR DETERMINE RELATIONSHIP OR ASSOCIATION – test of association or relationship

THREE MAJOR STATISTICAL TECHNIQUES

ACTIVITY 1- DATA ENTRY

INITIAL DATA FILE – VARIABLE VIEW

INITIAL DATA FILE – DATA VIEW

Go to Variable view To define the IVs & DVs . Use separate line for each & give sensible names. Decide specification/format of data: NAME, TYPE,

WIDTH, DECIMALS, LABEL, VALUES, MISSING, COLUMN, ALIGN, MEASURE. For example, String = text, numeric = numbers or others but numeric is generally the best format.

Variable view – use to define or give specifications for the IVs and DVs

Go to Data view To insert data – the

measured and collected responses for variables.

Data is input in columns under appropriate variable names.

Each row designate the respondent of the study.

DATA VIEW – use to input data (respondents by rows and variables by columns)

EXAMPLE OF DATA SET IN SPSS DATA EDITOR – Variable view

EXAMPLE OF DATA SET IN SPSS DATA EDITOR – Data View

DATA TRANSFORMATION• Used when variables need to be

transformed as intended by the researcher or as stated in the objectives.

• TRANSFORM- COMPUTE To compute or sum the scores

• TRANSFORM – RECODERecoding negatively worded scale

itemsCollapsing continuous variablesReplacing missing values

TO RECODE:• CLICK TRANSFORM => RECODEYOU WILL GET RECODE DIALOG BOX• CLICK VARIABLE TO THE EMPTY

RIGHT-HAND BOX• NAME THE NEW VARIABLE AND

LABEL• CLICK CHANGE• CLICK OLD AND NEW VALUES

BUTTON

To COMPUTE a score (TEACHER_EFFICACY)• Click Transform => Compute• You will get a Compute Variable dialog

box• Name your Target Variable• Type in the required Numeric Expression• Click OK

DATA SET WITH NEW VARIABLES - TEACHER_ FACTOR

You should be able to calculate descriptive statistics such as frequencies, descriptives, and crosstabs, bar charts, scattergram, box plot, histogram, etc.

Remember: output appears in a separate window.

ACTIVITY 2- DESCRIPTIVE ANALYSIS

Use the following Menu:

– DESCRIPTIVES STATISTICS FREQUENCIES – DESCRIPTIVE STATISTICS DESCRIPTIVES – CUSTOM TABLES

– DISPLAY DATA HISTOGRAM, BOXPLOT, STEM AND LEAF

TO DESCRIBE MEASURED VARIABLES

Gender

Frequency Percent Valid Percent

Cumulative

Percent

Valid lelaki 22 34.4 34.4 34.4

perempuan 42 65.6 65.6 100.0

Total 64 100.0 100.0

Race

Frequency Percent Valid Percent

Cumulative

Percent

Valid MELAYU 15 23.4 23.4 23.4

CINA 41 64.1 64.1 87.5

INDIA 8 12.5 12.5 100.0

Total 64 100.0 100.0

TO OBTAIN FREQUENCY DISTRIBUTION

Religion

Frequenc

y Percent

Valid

Percent

Cumulative

Percent

Valid ISLAM 25 39.1 39.7 39.7

BUDDHA 24 37.5 38.1 77.8

HINDU 1 1.6 1.6 79.4

KRISTIA

N

13 20.3 20.6 100.0

Total 63 98.4 100.0

Missing System 1 1.6

Total 64 100.0

TO OBTAIN DESCRIPTIVE STATISTICS OF DATA

Descriptive Statistics

N Minimum Maximum Mean Std. DeviationMy teacher wants us to enjoy learning maths 60 1 6 3.75 1.580

My teacher understand our problems in learning maths

36 1 6 3.89 1.833

My teacher try to make mathematics lessons interesting

64 1 6 4.00 1.533

My teacher appreciates it when we try hard, even when our results are not so good

64 1 6 4.16 1.514

My teacher show us step by step and how to solve maths problems

63 2 6 4.25 1.534

My teacher listen carefully to what we say 64 1 6 4.16 1.185

My teacher is friendly to us 64 1 6 3.52 1.491My teacher gives us time to explore new maths problems

63 1 6 3.81 1.216

Valid N (listwise) 34

ReportGender

My teacher wants us to

enjoy learning maths

My teacher understand our

problems in learning maths

My teacher try to make

mathematics lessons

interestinglelaki Mean 3.74 3.75 3.91

N 19 12 22Std. Deviation 1.821 1.913 1.716

perempuan Mean 3.76 3.96 4.05N 41 24 42Std. Deviation 1.480 1.829 1.447

Total Mean 3.75 3.89 4.00N 60 36 64Std. Deviation 1.580 1.833 1.533

TO OBTAIN DESCRIPTIVE -COMPARE MEANS OF DIFFERENT GROUPS

Plot graphs – you should be able to plot bar charts for sets of scores & plot scattergrams of relationships between the two sets of scores.

Remember: Select Graphs then explore the alternatives.

TO OBTAIN DESCRIPTIVE -COMPARE MEANS OF DIFFERENT GROUPS

Report

Gender

My instructor

wants us to enjoy

learning maths

My instructor

understand our

problems in learning maths

My instructor try

to make mathematics

lessons interesting

lelaki Mean 3.71 3.86 3.91

N 21 22 22

Std. Deviation

1.736 1.521 1.716

perempuan Mean 3.74 4.07 4.05

N 42 42 42

Std. Deviation

1.466 1.504 1.447

Total Mean 3.73 4.00 4.00

N 63 64 64

Std. Deviation

1.547 1.501 1.533

Summary of Statistical Tools For Descriptive Analyses

• Frequency/percentage table, • Pie or bar Charts, • Histogram • Frequency Polygon, • Cross-tabulation• Scatter diagram• Mean, Median, Mode, Maximum,

Minimum• Range, Variance, Standard

Deviation, Coefficient of variation, Standard Scores

ACTIVITY 3- COMPARISON OF MEANS OF TWO GROUPS

EXPLORING DIFFERENCES BETWEEN TWO GROUPS

1.t-test t-tests are used when you have two groups (e.g. males and females) or

two sets of data (before and after), and you wish to compare the mean

score on some continuous variable.

There are two main types of t-tests.

Paired sample t-tests (also called repeated measures) are used when you

are interested in changes in scores for subject tested at Time 1, and then

at Time 2 (often after some intervention or event). The samples are

‘related’ because they are the same people tested each time.

Independent sample t-tests are used when you have two different

(independent) groups of people (males and females), and you are

interested in comparing their scores. In this case, you collect information

on only one occasion, but from two different sets of people.

• TO MAKE COMPARISONS BETWEEN GROUPS ON ANY MEASURED VARIABLES AT INTERVAL AND RATIO LEVEL

• CLICK ANALYZE =>COMPARE MEANS

• You will get the following Sub-menus

– MEANS– ONE-SAMPLE T-TEST– INDEPENDENT SAMPLES T-TEST– PAIRED SAMPLES T-TEST– ONE-WAY ANOVA

PURPOSE EXAMPLE OF RESEARCH QUESTION

PARAMETRIC STATISTIC

INDEPENDENT VARIABLE

DEPENDENT VARIABLE

Comparing means of two groups

Is there a difference in instructors’ efficacy in teaching and learning mathematics as perceived by students of different gender?

Independent t-test

One categorical independent variable gender of two levels-males and females

One continuous dependent variablestudents’ perception on instructors’ efficacy in teaching and learning

To Compare Means of Two Groups• Click: Analyze>Compare means>Independent

T-test• You will get a Independent T-test dialog box• Select your variables – Test variables & Group

variables• Click OK

Independent Samples TestLevene's Test for Equality of

Variances t-test for Equality of Means

F Sig. t dfSig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the

DifferenceLower Upper

INSTRUCTORS’ EFFICACY

Equal variances assumed

.883 .351 -.094 60 .926 -.02315 .24740 -.51803 .47173

Equal variances not assumed

-.095 42.237 .925 -.02315 .24347 -.51440 .46811

Group StatisticsGender

N Mean Std. Deviation Std. Error MeanINSTRUCTORS’ EFFICACY

lelaki 21 3.9490 .89190 .19463perempuan 41 3.9721 .93662 .14628

HYPOTHESIS ALPHA VALUE SIGNIFICANT VALUE

(FROM THE SPSS OUTPUT)

EVALUATING DECISION

There is no significant difference in variance of students’ perception on instructors’ efficacy in T&Lof by different gender

0.05 .351 SIG.V > α Fail to reject null hypothesis,

Accept null hypothesis

Therefore , we Choose t from the equal variances assumed row

There is a significant difference in variance of students’ perception on instructors’ efficacy in T&L by different gender

DECISION MATRIX

HYPOTHESIS ALPHA

VALUE

SIGNIFICANT

VALUE (FROM

THE SPSS

OUTPUT)

EVALUATING DECISION CONCLUSION

There is no significant difference in mean students’ perception on instructors’ efficacy in T&L by different gender

0.05

.926 Sig. value lebih besar daripada α

Bermakna kebenaran hipotesis nol adalah besar.

Fail to reject null hypothesis,


There is no significant difference in students’ mean perception on instructors’ efficacy in T&L by gender, t (60) = -.094, p> .05. ( or p=.926)

There is a significant difference in mean students’ perception on instructors’ efficacy in T&L by different gender



INDEPENDENT

VARIABLE

DEPENDENT VARIABLE

Comparing means of two groups

Is there a difference in students’ perception of mathematics instructors’ role in making the students enjoy learning maths with making maths’ lessons interesting

Dependent t-test

- Two continuous dependent variable:students’ perception of mathematics inastructors’ role in making the students enjoy learning maths with making maths’ lessons interesting

Item 1 vs Item 3

To Compare Means of Two Dependent Groups

• Click: Analyze ->Compare means ->Paired Sample T-test• You will get a Paired

Sample T-test dialog box• Select your variables –

Paired variables • Click OK

Paired Samples Correlations

N Correlation Sig.Pair 1 My instructor wants us to

enjoy learning maths with My teacher try to make mathematics lessons interesting

63 .708 .000

Paired Samples TestPaired Differences

t dfSig. (2-tailed)Mean

Std. Deviation

Std. Error Mean

95% Confidence Interval of the DifferenceLower Upper

Pair 1 My instructors wants us to enjoy learning maths with My teacher try to make mathematics lessons interesting

-.238 1.174 .148 -.534 .058 -1.610 62 .112

HYPOTHESIS ALPHA VALUE

SIGNIFICANT VALUE

(FROM THE SPSS OUTPUT)


There is no significant difference in students’ perception of mathematics instructors’ role in making the students enjoy learning maths with making maths’ lessons interesting

0.05 .112 Sig. value lebih besar daripada α

Bermakna kebenaran hipotesis nol adalah besar.

Fail to reject null hypothesis,


There is no significant difference in students’ perception of mathematics instructors’ role in making the students enjoy learning maths with making maths’ lessons interesting, t (62) = -1.160, p> .05. (or p=.112)

There is a significant difference in students’ perception of mathematics instructors’ role in making the students enjoy learning maths with making maths’ lessons interesting

DECISION MATRIX

EXPLORING DIFFERENCES BETWEEN GROUPS

One-way analysis variance One-way analysis variance is similar to a t-test, but is used when you have two or more

groups and you wish to compare their mean scores on a continuous variable.

It is called one-way because you are looking at the impact of only one independent variable

on your dependent variable.

A one-way analysis of variance (ANOVA) will let you know whether your groups differ, but it

won’t tell you where the significant difference is (gp1/gp2, gp3/gp4 etc).

You can conduct post-hoc comparisons to find out which groups are significantly different

from one another.

You could also choose to test differences between specific groups, rather than comparing all

the groups by using planned comparisons. Similar to t-tests, there are two types of one-way

ANOVAs: repeated measures ANOVA (same people on more than two occasions), and

between-groups (or independent samples) ANOVA, where you are comparing the mean

scores of two or more different groups of people.



INDEPENDENT

VARIABLE

DEPENDENT VARIABLE

Comparing means of three groups

Is there a difference in students’ perception of instructors’ efficacy in T&L mathematics byrace?

One-way between groups ANOVA

One categorical independent variable (three levels of race)

One continuous dependent variable students’ perception of instructors’ efficacy in T&L mathematics

To Compare Means of Three or More Groups• Click: Analyze->Compare means->One-Way ANOVA• You will get a One-Way ANOVA

dialog box• Select your variables –> Dependent variables-> Factor or Group variables• Click: Options• Click OK

DescriptivesINSTRUCTORS’_EFFICACY

N MeanStd.

Deviation

Std.

Error

95% Confidence Interval

for Mean

Minimum MaximumLower Bound Upper Bound

MELAYU 14 4.2704 .73282 .19586 3.8473 4.6935 3.07 5.36

CINA 40 3.7339 .96118 .15198 3.4265 4.0413 2.21 5.71

INDIA 8 4.5804 .46673 .16501 4.1902 4.9706 3.86 5.07

Total 62 3.9643 .91443 .11613 3.7321 4.1965 2.21 5.71

ANOVAINSTRUCTORS’ EFFICACY

Sum of Squares df Mean Square F Sig.Between Groups 6.471 2 3.235 4.286 .018Within Groups 44.537 59 .755

Total 51.008 61

TEST OF DIFFERENCES BETWEEN GROUPS – BY RACE


SIGNIFICANT VALUE

(FROM THE SPSS

OUTPUT)


There is no significant difference in mean students’ perception of instructors’ efficacy in T&L mathematics by race?

0.05 .018 Sig. value lebih kecil daripada α

Bermakna kebenaran hipotesis nol adalah kecil.

Reject null hypothesis,

Accept alternative hypothesis

There is significant difference in mean students’ perception of instructors’ efficacy in T&L mathematics by race, F(2,59) = 4.29, p<.05.

There is a significant difference in mean students’ perception of instructors’ efficacy in T&L mathematics by race?

DECISION MATRIX

TEST OF DIFFERENCES BETWEEN GROUPS – BY RELIGION

ANOVATEACHER_FACTOR

Sum of Squares df Mean Square F Sig.

Between Groups 14.849 2 7.424 11.982 .000

Within Groups 35.940 58 .620

Total 50.789 60

Descriptives

TEACHER_FACTOR N Mean Std. Deviation Std. Error 95% Confidence Interval for

MeanMinimum Maximum

Lower Bound Upper Bound

ISLAM 24 4.3929 .98705 .20148 3.9761 4.8097 2.21 5.71

BUDDHA 24 3.3601 .39376 .08038 3.1938 3.5264 2.71 4.14

KRISTIAN 13 4.3242 .91129 .25275 3.7735 4.8749 2.93 5.71

Total 61 3.9719 .92004 .11780 3.7363 4.2075 2.21 5.71


SIGNIFICANT VALUE

(FROM THE SPSS

OUTPUT)


There is no significant difference in mean students’ perception of instructors’ efficacy in T&L mathematics by religion?

0.05 .018 Sig. value lebih kecil daripada α

Bermakna kebenaran hipotesis nol adalah kecil.

Reject null hypothesis,

Accept alternative hypothesis

There is significant difference in mean students’ perception of instructors’ efficacy in T&L mathematics by religion, F(2,58) = 11.98, p<.05.

There is a significant difference in mean students’ perception of instructors’ efficacy in T&L mathematics by religion?

DECISION MATRIX

Pearson Product-Moment Correlation

• A measure of the linear relationship between two or more variables.

• Correlation analysis produces Pearson Correlation Coefficient ( r ).

• It indicates the strength of the relation and the direction (+ve / -ve) of the relationship between the variables.

Significant of Relationship• The significance of the relationship

is expressed in probability levels p (e.g., significant at p =.05)

• The smaller the p-level, the more significant the relationship.

• The larger the correlation (r value), the stronger the relationship.

Example 1 CorrelationsCorrelations

Total life satisfaction

Total Self esteem

Total life satisfaction

Pearson Correlation 1 .488**

Sig. (2-tailed).000

N 436 434

Total Self esteem

Pearson Correlation .488** 1

Sig. (2-tailed).000

N 434 436**. Correlation is significant at the 0.01 level (2-tailed).

Example 2 CorrelationsIntimate

RelationshipFriends Common

SenseAcademic

IntelligenceGeneral

Intimate Relationship Pearson Correlation Sig. (2-tailed) N

1

80

.552** .000 80

.351** .001 80

.218 .052 80

.393** .000 80

Friends Pearson Correlation Sig. (2-tailed) N

.552** .000 80

1

80

.462** .000 80

.244* .029 80

.546** .000 80

Common Sense Pearson Correlation Sig. (2-tailed) N

.351** .001 80

.462** .000 80

1

80

.400** .000 80

.525** .000 80

Academic Intelligence Pearson Correlation Sig. (2-tailed) N

.218 .052 80

.244* .029 80

.400** .000 80

1

80

.261* .019 80

General Pearson Correlation Sig. (2-tailed) N

.393** . 000 80

.546** .000 80

.525** .000 80

.261* .019 80

1

80

**Correlation is significant at the level 0.01 level (2-tailed)*Correlation is significant at the level 0.005 level (1-tailed)

Report the Output of a Pearson Product-Moment Correlation

• Report the value of the correlation coefficient, r, as well as the degrees of freedom (df)

• The degrees of freedom (df) is the number of data points minus 2 (N - 2).

Coefficient of Determination, r2

• How much of the variation in the DV - Y is due to change in the IV - X

• It is sometimes expressed as a percentage when the proportion of variance explained by the correlation.

• Example: r² = 0.36Hence, 36% of the variation in Y is associated with the change in X. 64% of variation is Y is due to other factors.

Regression Analysis

• Regression analysis procedures have as their primary purpose the development of an equation that can be used for predicting values on some DV for all members of a population.

• A secondary purpose is to use regression analysis as a means of explaining causal relationships among variables.

Regression Analysis• The most basic application of regression

analysis is the bivariate situation, to which is referred as simple linear regression, or just simple regression.

• Simple regression involves a single IV and a single DV.

• Goal: to obtain a linear equation so that we can predict the value of the DV if we have the value of the IV.

• Simple regression capitalizes on the correlation between the DV and IV in order to make specific predictions about the DV.

• The correlation tells us how much information about the DV is contained in the IV.

• If the correlation is perfect (i.e r = ±1.00), the IV contains everything we need to know about the DV, and we will be able to perfectly predict one from the other.

• Regression analysis is the means by which we determine the best-fitting line, called the regression line.

• Regression line is the straight line that lies closest to all points in a given scatterplot

• This line sometimes pass through the centroid of the scatterplot.

• 3 important facts about the regression line must be known:– The extent to which points are scattered

around the line– The slope of the regression line– The point at which the line crosses the Y-

axis• The extent to which the points are scattered

around the line is typically indicated by the degree of relationship between the IV (X) and DV (Y).

• This relationship is measured by a correlation coefficient – the stronger the relationship, the higher the degree of predictability between X and Y.

• The degree of slope is determined by the amount of change in Y that accompanies a unit change in X.

• It is the slope that largely determines the predicted values of Y from known values for X.

• It is important to determine exactly where the regression line crosses the Y-axis (this value is known as the Y-intercept).

What you will calculate

1. A linear regression equation.

2. The statistical significance of β1 (null hypothesis significance testing).

3. A measure of effect size.

4. Confidence and prediction intervals.

EXAMPLE USED

• A researcher decided to determine if cholesterol concentration was related to time spent watching TV in otherwise healthy 45 to 65 year old men (an at-risk category of people). They believed that there would be a positive relationship: the more time people spent watching TV, the greater their cholesterol concentration.

• The researcher also wished to be able to predict cholesterol concentration and to know the proportion of cholesterol concentration that time spent watching TV could explain.

Daily time spent watching TV was recorded in the variable timetv Cholesterol concentration recorded in the variable cholesterol.

The following instructions will shown you how to produce a scatterplot in SPSS to establish if a linear relationship exists:

•Click Graphs > Chart Builder... on the main menu, as shown below:

•Select "Scatter/Dot" from the Choose from: box in the bottom-left-hand corner of the Chart Builder dialogue box, as highlighted below:

•Selecting "Scatter/Dot" will present eight different scatter/dot options in the lower-middle section of the Chart Builder dialogue box (as shown above and below).• Drag-and-drop the top-left-hand option (you will see it labelled as "Simple Scatter" if you hover your mouse over the box) into the main chart preview pane, as shown below:

•You will be presented with the screen below, which shows a simple scatterplot in the main chart preview pane with boxes for the y-axis ("Y-Axis?") and x-axis ("X-Axis?") for you to populate with the appropriate variables.

•Drag-and-drop the independent variable, time_tv, from the Variables: box into the "X-axis?" box in the main chart preview screen and do the same for the dependent variable, cholesterol, but into the "Y-axis?" box. You should end up with a screen like below:

•Click on "Y-Axis1 (Point1)" in the Element Properties dialogue box (the box on the right-hand-side) and you will be presented with the following screen:

Uncheck the Minimum option in the -Scale Range- area so that the Custom value is highlighted and has a value of 0 (zero), as shown below:

Click the Apply button to confirm these changes.

Click the OK button in the Chart Builder dialogue box to generate the scatterplot.

For this example, you can conclude from visual inspection of the above scatterplot that there is a linear relationship between cholesterol concentration and time spent watching TV.

Click Analyze > Regression > Linear... on the main menu, as shown below:

Click the Continue button. Click the OK button - This will generate the output.

Determining how well the model fits

The Model Summary table provides the information needed to determine how well the regression model fits the data:

R is the multiple correlation coefficient ("R" column).

As there is only one independent variable, R is simply the absolute value of the Pearson correlation between the dependent variable and the independent variable. It simply indicates the strength of the association between the two variables

•In this example, R = 0.389, which indicates a moderate correlation. However, you will not normally have to report this value.The R2 value ("R Square" column) represents the proportion of variance in the dependent variable that can be explained by our independent variable (technically it is the proportion of variation accounted for by the regression model above and beyond the mean model).

•In this example, R2 = 0.151, which means that the independent variable, time_tv, explains 15.1% of the variability of the dependent variable, cholesterol. However, R2 is based on the sample and is a positively biased estimate of the proportion of the variance of the dependent variable accounted for by the regression model (i.e., it is too large).

•SPSS also prints out an adjusted R2 value ("Adjusted R Square" column), which corrects positive bias to provide a value that would be expected in the population. Adjusted R2 is also an estimate of the effect size, which at 0.143 (14.3%), is indicative of a medium effect size, according to Cohen's (1988) classification.

The ANOVA table informs you whether the regression model results in a statistically significantly better prediction of the dependent variable, cholesterol, than if you just used the mean value.

The general form of the line to predict cholesterol concentration from time spent watching TV, expressed in SPSS variable form (i.e., cholesterol and time-tv), is:

cholesterol = b0 + (b1 x time-tv)

where b0 is the intercept and b1 is the coefficient. You can ascertain these value by inspecting the Coefficients table:

A linear regression established that daily time spent watching TV could statistically significantly predict cholesterol concentration, F(1, 97) = 14.395, p < .0001 and time spent watching TV accounted for 14.3% of the explained variability in cholesterol concentration. The regression equation was: predicted cholesterol concentration = -0.944 + 0.037 x (time spent watching tv).

Y’ = -0.94 + 0.037 X


Mean Std. Deviation NGrade - PMR MATH 2.53 1.468 62

TEACHER_FACTOR 3.9643 .91443 62

Correlations

Grade - PMR MATH

TEACHER_FACTOR

Pearson Correlation

Grade - PMR MATH

1.000 .571

TEACHER_EFF .571 1.000

Sig. (1-tailed) Grade - PMR MATH

. .000

TEACHER_EFF .000 .

N Grade - PMR MATH

62 62

TEACHER_EFF 62 62

Model Summaryb

Model

RR

SquareAdjusted R

Square

Std. Error of the

Estimated

i

m

e

n

s

i

o

n

0

1 .571a

.326 .315 1.215

a. Predictors: (Constant), TEACHER_FACTORb. Dependent Variable: Grade - PMR MATH

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 42.848 1 42.848 29.021 .000a

Residual 88.588 60 1.476

Total 131.435 61

a. Predictors: (Constant), TEACHER_FACTORb. Dependent Variable: Grade - PMR MATH

Coefficientsa

Model Unstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta1 (Constant) -1.101 .692 -1.591 .117

TEACHER_FACTOR .917 .170 .571 5.387 .000a. Dependent Variable: Grade - PMR MATH


Mean Std. Deviation NGrade - PMR MATH 2.53 1.468 62

TEACHER_EFF 3.9643 .91443 62

Race 1.90 .593 62

Correlations

Grade - PMR MATH

TEACHER_FACTOR Race

Pearson Correlation

Grade - PMR MATH 1.000 .571 -.015

TEACHER_EFF .571 1.000 .019

Race -.015 .019 1.000

Sig. (1-tailed)

Grade - PMR MATH . .000 .453

TEACHER_EFF .000 . .440

Race .453 .440 .

N Grade - PMR MATH 62 62 62

TEACHER_EFF 62 62 62

Race 62 62 62

Model Summaryb

Model

RR

SquareAdjusted R

Square

Std. Error of the

Estimate

d

i

m

e

n

s

i

o

n

0

1 .572a .327 .304 1.225

a. Predictors: (Constant), Race, TEACHER_FACTOR

b. Dependent Variable: Grade - PMR MATH

ANOVAb

Model Sum of Squares df

Mean Square F Sig.

1 Regression 42.939 2 21.469 14.313 .000a

Residual 88.497 59 1.500

Total 131.435 61

a. Predictors: (Constant), Race, TEACHER_FACTORb. Dependent Variable: Grade - PMR MATH

Coefficientsa

ModelUnstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta1 (Constant) -.980 .853 -1.150 .255

TEACHER_FACTOR .917 .172 .571 5.349 .000Race -.065 .265 -.026 -.246 .806

a. Dependent Variable: Grade - PMR MATH

Performing the paired t-test

Opens up dialogue box

Use: AnalyzeCompare MeansPaired Samples T-Test

The paired samples t- test dialogue box

Transfer two levels of IV to ‘paired variables boxBoth need to be highlighted

Variables shown in box as pairedClick OK

Output (1)

Mean for each condition

Number of paired scores

SD for each condition

Means suggest difference, but need to look at output of t-test to see if significant

Output (2)

t-value

p valuedfMean difference score

Reporting

There was a significant effect of statistics lecture on depression, t (18) = 5.86, p<.05). Findings indicated that depression scores recorded after the lecture were lower (mean = 13.0, SD= 2.33) than those recorded before the lecture (mean = 13.95, SD = 2.48).

Independent samples t-test

Used when different participants take part in each experimental condition.

Hypothesis: males can eat more chillies than females.

Eight males & eight females were tested on their chilli tolerance in a chilli eating competition.

Use arrow key to put IV here

Use arrow Key to put DV here.Define levelsof DV.

Examine descriptive statistics first.

Group Stati stics

8 5.6250 1.4079 .49788 4.1250 1.1260 .3981

GENDERmalefemale

CHILLIESN Mean Std. Deviat ion

Std. ErrorMean

GENDER

femalemale

Mea

n C

HIL

LIE

S

6.0

5.5

5.0

4.5

4.0

3.5

Results suggest that males could eat more chillies than females. But need to conduct t-test to determine if this difference is significant.

Ind e p e n de nt Sa mple s Te s t

.4 4 3 .5 1 7 2 .3 5 3 1 4 .0 3 4 1 .5 0 0 0 .6 3 7 4 .1 3 3 0 2 .8 6 7 0

2 .3 5 3 1 3 .3 5 5 .0 3 5 1 .5 0 0 0 .6 3 7 4 .1 2 6 7 2 .8 7 3 3

Eq u a l v a ria n c e sa s s u me dEq u a l v a ria n c e sn o t a s s u me d

CHIL L IESF Sig .

L e v e n e 's T e s t fo rEq u a lity o f Va ria n c e s

t d f Sig . (2 -ta ile d )Me a n

Diffe re n c eStd . Erro r

Diffe re n c e L o we r Up p e r

9 5 % Co n fid e n c eIn te rv a l o f th eDiffe re n c e

t-te s t fo r Eq u a lity o f Me a n s

Levene’s test - scores must have equal variance to use standard t-test techniques. Variances equal if p > 0.05

t-value, df & p shown here. Difference is significant if p < 0.05.

Results section

We examined chilli tolerance in males and females. Eight males and eight females were tested on their ability to consume chillies. Males with mean of 5.63 (s= 1.41) and females with mean of 4.13 (s= 1.13). Findings also showed that males ate significantly more chillies than females, t(14) = 2.35, p < 0.05.

The results suggest that males have greater chilli tolerance than females (or that males are foolish enough to try to win chilli eating contests).

Paired samples t-test Used when same or

matched pairs of participants take part in experimental conditions.

Hypothesis: chilli tolerance is more on cold days than on warm days.

Ten participants ate chillies on a warm day then cold day.

Use arrow key to select variables that are to be compared.

Pa ired Samples Test

-2 .3000 2.9 078 .9195 -4.3801 -.2199 -2.501 9 .03 4WARM - COLDPa ir 1Me an Std . Dev ia tion

Std . Erro rMean Lower Upper

95% Con fide nc eInte rv a l o f theDiffe renc e

Pa ired Diffe ren c es

t df Sig . (2 -ta iled )

Mean difference between pairs of scores shown here.

T-value, df & p shown here. Difference is significant if p < 0.05.

Results section

We examined chilli tolerance in warm and cold days. Ten participants were tested on their ability to consume chillies. The mean difference is 2.30 in which more chillies were consume in cold days compared to warm days. Findings also showed that chilli tolerance is more on cold days significantly than warm days, t(9) = -2.501, p < 0.05.

The results suggest that individuals can consume more chillies on cold days than on warm days.

Paired Samples Statistics

,4714 21 ,24276 ,05297,5019 21 ,25522 ,05569

DopplerCath

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Correlations

21 ,888 ,000Doppler & CathPair 1N Correlation Sig.

Paired Samples Test

-,03048 ,11864 ,02589 -,08448 ,02353 -1,177 20 ,253Doppler - CathPair 1Mean Std. Deviation

Std. ErrorMean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Paired Sample T-test

Results section

We examined chilli tolerance based on two type of chillies. 21 participants were tested on their ability to consume both type of chillies. The mean difference is 0.348. Findings also showed that there is no significant difference in chilli tolerance between the two types of chillies, t(20) = -1.77, p > 0.05.

The results suggest that there is no difference in chilli tolerance between the two types of chillies.

Group Statistics

12 25.5673 5.04689 1.4569112 31.1920 7.79554 2.25038

group1.002.00

DVN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

7.236 .013 -2.098 22 .048 -5.62476 2.68082 -11.18443 -.06508

-2.098 18.843 .050 -5.62476 2.68082 -11.23894 -.01057

Equal variancesassumedEqual variancesnot assumed

DVF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

variances are 25.4 and 60.7

Results section

We examined chilli tolerance between two groups of participants. Twelve participants per group were tested on their ability to consume chillies. Group 1 scored mean of 25.07 (s= 5.05) and group 2 scored mean of 31.19 (s= 7.80). Findings also showed that the two groups differ significantly in their chilli consumption, t(22) = 2.10, p < 0.05.

The results suggest that group 2 have greater chilli tolerance than group 1.

Statistical Tools For Inferential Statistics

• PARAMETRIC TESTS: – Test of hypothesis of differences

between means - Z-test, t-test, F-test, MANOVA

– Test of hypothesis of relationship – Pearson r, Point-biserial, Regression

• NON-PARAMETRIC TESTS: – Mann-Whitney, – Kruskal Wallis, – Spearman rho, – Chi-Square, Cramer’s V, Lambda,

dll.

STATISTICAL DECISION

Decision (fail to reject Ho)

1 – α

Decision (fail to reject Ho)

β errorType II error

Decision (reject Ho)α error

Type I error

Decision (reject Ho) 1 – βPower

Reality

H0 : No difference HA : Difference

H0 : No difference

HA : Difference

[ppt]botanical extract - upm edutrain interactive learning 13dec2014 edu5950... · web viewwhat is...

Documents