data screening
TRANSCRIPT
![Page 1: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/1.jpg)
DATA SCREENINGWei-Jiun, Shen Ph. D.
![Page 2: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/2.jpg)
Anything that can go wrong will go wrong
![Page 3: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/3.jpg)
Why do we need to screen data?
![Page 4: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/4.jpg)
Purpose
Detect and correct data errors Detect and treat missing data Detect and handle insufficiently sampled
variables Conduct transformations and standardizations Detect and handle outliers
![Page 5: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/5.jpg)
First concern
Accuracy of data file Descriptive statistics Graphic representations
Honest correlations Missing data
Pattern or amount Random or not
Outliers
![Page 6: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/6.jpg)
MISSING DATA“blank” part in data set
![Page 7: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/7.jpg)
Why is missing data a problem?
Systematical problem Bias sampling
Demographic variables
Inappropriate measuring procedure Behavioral items
Insufficient amount for analysis Small sample
Misleading research results Biased data in, _______ out
![Page 8: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/8.jpg)
Probability distribution of missingness
Consider the probability of missingness Are certain groups more likely to have missing
values? Respondents in female less likely to report age?
Are certain responses more likely to be missing? Respondents with high SPA less likely to report
anxiety?
Certain analysis methods assume a certain probability distribution
![Page 9: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/9.jpg)
Missing completely at random (MCAR)
Missing data is independent of any other measured variable (y2) and independent of the variable itself (y1)
I.e., SES=y2; depression=y1. If participants dropped out across a range of SES
levels, then the missing on depression would be independent of SES
Little’s MCAR test in MVA indicates whether MCAR or not (want ns)
![Page 10: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/10.jpg)
Missing at random (MAR)
Missing data may be dependent on another measured variable (y2), but is independent of the variable itself (y1). I.e., SES=y2; depression=y1. If participants only from high levels of SES
dropped out , then the missing on depression would be dependent on SES. SES.
MAR can be inferred if Little’s test is significant but missingness predictable from other vars (other than the variable itself) –tested by Separate Variance Test. MNAR indicated if this test reveals missingness related to the DV
![Page 11: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/11.jpg)
Treatment for missing data
Deleting cases or variables Descriptive statistics
Estimating missing data Using missing data correlation matrix Treating missing data as data Repeating analyses with and without missing data
Choosing among methods for dealing with missing data Pattern or amount
![Page 12: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/12.jpg)
Deletion or preservation?
Deletion <5% MCAR/MAR
Preservation MNAR Small sample
Replacement Mean (grand or group) Regression (predict missing value by other IVs) Expectation Maximization (form missing data r matrix by
assumed distribution)
![Page 13: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/13.jpg)
OUTLIERCases with extreme value on variables
![Page 14: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/14.jpg)
Why is outlier a problem?
Systematical problem Bias sampling
Wrong population
Statistical problem ↑error variance ↓statistical power ↑typeⅠ, Ⅱ error ↓normality
Misleading research results Biased data in, _______ out
![Page 15: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/15.jpg)
Influence of outlier
Leverage × discrepancy
![Page 16: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/16.jpg)
Treatment for outlier
Estimating outlier Standardized score (z>2, 2.5, 3) Graphical methods (p-p, q-q plot) Mahalanobis distance (χ2 test)
Deletion or transformation Critical to analysis or not Preservation
Transformation Score alternation
![Page 17: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/17.jpg)
NORMALITY, LINEARITY &
HOMOSCEDASTICITYBasic assumption
![Page 18: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/18.jpg)
Key assumptions in GLM
Normality Linearity Homogeneity of variance Interval level data Independence of observations
![Page 19: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/19.jpg)
Normality
Normal distribution
![Page 20: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/20.jpg)
Test for normality
Skewness & Kurtosis
![Page 21: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/21.jpg)
Test for normality
T-test for skewness & kurtosis score Kolmogorov-Smirnov test & Shaprio-wilk test
Z
w
![Page 22: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/22.jpg)
Test for normality
Plotting cumulative distribution function
![Page 23: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/23.jpg)
Test for normality
P-P plot (probability) & Q-Q plot (quantile)
![Page 24: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/24.jpg)
Linearity
Straight-line relationship between 2 variables
![Page 25: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/25.jpg)
Homoscedasticity
Homogeneity of variance Homogeneity of variance-covariance matrix
![Page 26: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/26.jpg)
Homoscedasticity
Residual
![Page 27: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/27.jpg)
COMMON DATA TRANSFORMATIONS
![Page 28: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/28.jpg)
Data transformations
Direction
Skewness Treatment
+
Moderate New X = SQRT (X)Substantial New X = LG10 (X)
Substantial with zero
New X = LG10 (X+C)
Severe New X = 1/XL-shaped with zero New X = 1 (X+C)
-Moderate New X = SQRT (K-X)
Substantial New X = LG10 (K-X)J-shaped New X = 1 (K-X)C = a constant added to each score so that the smallest score is 1.
K = a constant from which each score is subtracted so that the smallest score is 1; usually equal to the largest score + 1.
![Page 29: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/29.jpg)
PRACTICE
![Page 30: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/30.jpg)
Check list
Descriptive statistics Range Mean & SD Skewness & kurtosis
Missing data (missing value analysis) Normal distribution
Kolmogorov-Smirnov test (n>50) Shapiro-Wilk test (n<50) Skewness & kurtosis PP plot
Outlier (single/multiple: z-score/Mahalanobis distance)
Linearilty Homoscedasticity Multiconllinearity
![Page 31: Data screening](https://reader030.vdocument.in/reader030/viewer/2022021423/58885a1d1a28ab951c8b74d9/html5/thumbnails/31.jpg)
Report
Try