tugasan 3

TUGASAN 3 PN SITI NUR DIYANA

Why clean your data?

• Screening process

• • Detect errors

• ƒ Missing data

• ƒ Outliers

• • Make sure data meets assumptions for analysis

• ƒ Normality

2 Types of Screening

• 1. Preliminary data screening

• ƒ Screen one variable at a time on the entire data set before any analysis

• 2. In conjunction with statistical analysis

• ƒ Dependent on analysis being performed

Data Cleaning Tips for making your data

suitable for analysis

Steps

• 1. Check for missing data

• 2. Check for normality

• 3. Remove outliers

• 4. Check for normality again

• 5. Transform data

Compute into variable

Statistical methods include diagnostic hypothesis tests for normality, and a rule of thumb that says a variable is reasonably close to normal if its skewness and kurtosis have values between –1.0 and +1.0.

What do we do when Data are Missing?

• Listwise (casewise) deletion:: uses only complete cases

Step 2 Check for normality

• • Still using information from “explore” in SPSS

• • Look at:

• 1. Descriptive table

• 2. Tests of Normality table

• 3. Histogram

• 4. Box plot


• Since the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead.

The null hypothesis for the test of normality states that the actual distribution of the variable is equal to the expected distribution, i.e., the variable is normally distributed. Since the probability associated with the test of normality is < 0.001 is less than or equal to the level of significance (0.01), we reject the null hypothesis and conclude that total hours spent on the Internet is not normally distributed. (Note: we report the probability as <0.001 instead of .000 to be clear that the probability is not really zero.)

Step 3 Remove outliers • • Remove data points highlighted in box plot

• ƒ Not the best method

• • “Schweinle Method”

• ƒ Remove data that is 2.5 SD from mean

• “Schweinle Method”

• 1. SD x 2.5--- 0.61463 x 2.5 ---- 1.53657

• 2. Add that value to the mean

• 1.53657+ 3.8026 = 5.339175

• • Remove any values above 5.339175

Step 3 Remove outliers

• SPSS: Data select cases

• • Select “if condition is satisfied”

• • Variable <= 5.339175

• ƒ SPSS will not analyze data that is over 5.339175

• • Click “continue” and “OK”

Step 3 Remove outliers

Step 4 Check for normality – again!

Step 5 Transform data

• Transform data – square root

Step 5 Transform data

SPSS: transform compute • Target variable: enter new name ƒ Ex: sqrt • Click on “arithmetic” under function group • Click on “sqrt” under functions and special variables • Click on the up arrow to bring sqrt(?) to numeric expression box • Highlight variable to be transformed and click the right arrow to replace the (?) Explore data again to check for normality

Reliability

Creating graphic illustration

Descriptive analysis

Inferential statistic

T-test

Correlation

Two-way anova

Regression

tugasan 3

Documents

sample size

null hypothesis

step 2 check

data

analysis

check

normality

test