brian lukoff stanford university october 13, 2006
DESCRIPTION
The decision tree method and its applications to faking Evaluating decision tree performance Three studies evaluating the method Study 1: Low-stakes noncognitive assessments Study 2: Experimental data Study 3: Real-world selection Implications and conclusionsTRANSCRIPT
![Page 1: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/1.jpg)
Brian LukoffStanford UniversityOctober 13, 2006
![Page 2: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/2.jpg)
Based on a draft paper that is joint work with Eric Heggestad, Patrick Kyllonen, and Richard Roberts
![Page 3: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/3.jpg)
The decision tree method and its applications to faking
Evaluating decision tree performance Three studies evaluating the method
Study 1: Low-stakes noncognitive assessments
Study 2: Experimental data Study 3: Real-world selection
Implications and conclusions
![Page 4: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/4.jpg)
A technique from machine learning for predicting an outcome variable from (a possibly large number of) predictor variables
Outcome variable can be categorical (classification tree) or continuous (regression tree)
Algorithm builds the decision tree based on empirical data
Is it snowing?
Is it raining?drive
drive walk
Yes No
Yes No
Day Snowing?
Raining?
Method
1 yes yes drive2 yes no drive3 no yes drive4 no yes walk5 no no walk6 no no walk7 no yes drive
TRAINING SET
![Page 5: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/5.jpg)
Is it snowing?
Is it raining?drive
drive walk
Yes No
Yes No
Day
Snowing?
Raining?
Method
1 yes yes drive2 yes no drive3 no yes drive4 no yes walk5 no no walk6 no no walk7 no yes drive Not all cases are accounted for correctly
Wrong decision on Day 4 Need to choose variables predictive enough
of the outcome
TRAINING SET
![Page 6: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/6.jpg)
Is it snowing?
Is it raining?drive
drive walk
Yes No
Yes No
Not all cases are predicted correctly Maybe the decision to drive or walk is
determined by more than just the snow and rain?
Day
Snowing?
Raining?
Method
Prediction
8 yes yes drive drive9 no yes walk drive10 no yes drive drive11 yes no drive drive12 no no walk walk13 no no walk walk14 yes yes drive drive
TEST SET
![Page 7: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/7.jpg)
Ease of interpretationSimplicity of useFlexibility in variable selectionFunctionality to build decision trees
readily available in software (e.g., the R statistical package)
![Page 8: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/8.jpg)
Outcome variable = faking status (“faking” or “honest”) Training set = an experimental data set
where some participants instructed to fake Training set = a data set where some
respondents are known to have faked Outcome variable = lie scale score
Training set = a data set where the target lie scale was administered to some subjects
![Page 9: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/9.jpg)
So far, have used individual item responses only
Other possibilities: Variance of item responses Number of item responses in the highest
(or lowest category) Modal item response
Decision tree method permits some sloppiness in variable selection
![Page 10: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/10.jpg)
Classification trees (dichotomous outcome case, e.g., predicting faking or not faking) Accuracy rate False positive rate Hit rate
Continuous Average absolute error Correlation between actual and predicted
scores
![Page 11: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/11.jpg)
Algorithm can “overfit” to the training data, so performance metrics computed on the training data not indicative of future performance
Thus we will often partition the data: Training set (data used to build tree) Test set (data used to compute
performance metrics)
![Page 12: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/12.jpg)
Training/test set split leaves a lot to the chance selection of the training and test set
Instead, partition the data into k equal subsets Use each subset as a test set for the tree
trained on the rest of the data Average the resulting performance metrics to
get better estimates of performance on new data
Here we will report cross-validation estimates
![Page 13: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/13.jpg)
Data sets Two sets of students (N = 431 and N = 824) that took a
battery of noncognitive assessments as well as two lie scales as part of a larger study
Measures Predictor variables▪ IPIP (“Big Five” personality measure) items▪ Social Judgment Scale items
Outcomes (lie scales)▪ Overclaiming Questionnaire▪ Balanced Inventory of Desirable Responding
Method Build regression trees to predict scores on each lie scale
based on students’ item responses
![Page 14: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/14.jpg)
Varying performance, depending on the items used for prediction and the lie scale used as the outcome
Correlations between actual lie scale scores and predicted scores ranged from -.02 to .49
Average prediction errors ranged from .74 to .95 SD
![Page 15: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/15.jpg)
Low-stakes setting: how much faking was there to detect?
Nonexperimental data set: students with high scores on the lie scales may or may not have actually been faking
![Page 16: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/16.jpg)
Data sets An experimental data set of N = 590 students in two
conditions (“honest” and “faking”) Measures
Predictor variables▪ IPIP (“Big Five” personality assessment) items
Method Build decision trees to classify students as honest or
faking based on their personality test item responses
![Page 17: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/17.jpg)
Decision trees correctly classified students into experimental condition with varying success Accuracy rates of 56% to 71% False positive rates of 25% to 41% Hit rates of 52% to 68%
![Page 18: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/18.jpg)
Two items on a 1-5 scale form a decision tree: Item 19: “I always
get right to work” Item 107: “Do
things at the last minute” (reversed)
Extreme values of either one are indicative of faking
![Page 19: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/19.jpg)
Many successful trees utilized few item responses
Range of tree performance Laboratory—not real-world—data Although an experimental study, still don’t
know: If students in the faking condition really faked If the degree to which they faked is indicative of
how people fake in an operational setting If any of the students in the honest condition
faked
![Page 20: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/20.jpg)
Data set N = 264 applicants for a job
Measures Predictor variables▪ Achievement striving, assertiveness, dependability,
extroversion, and stress tolerance items of the revised KeyPoint Job Fit Assessment
Outcome (lie scale)▪ Candidness scale of the revised KeyPoint Job Fit Assessment
Method Build decision trees predicting the candidness (lie scale)
score from the other item responses
![Page 21: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/21.jpg)
Correlations between actual and predicted candidness (lie scale) scores ranged from .26 to .58
Average prediction errors ranged from .61 to .78 SD
![Page 22: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/22.jpg)
Items are on a 1-5 scale, where 5 indicates the highest level of Achievement Striving
Note that most tests are for extreme item responses
![Page 23: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/23.jpg)
Similar methodology to Study 1, but better results (e.g., stronger correlations)
Difference in results likely due to the fact that motivation to fake was higher in this real-world, high-stakes setting
![Page 24: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/24.jpg)
Wide variety in decision tree quality between groups of variables (e.g., conscientiousness scale vs. openness scale)
Examining trees can give insight into the structure of the assessment
![Page 25: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/25.jpg)
Some decision trees in each study used only a small number of items and achieved a moderate level of accuracy
Use decision trees for real-time faking detection on computer-administered noncognitive assessments
Real-time “warning” system Need to study how this changes the
psychometric properties of the assessment
![Page 26: Brian Lukoff Stanford University October 13, 2006](https://reader036.vdocument.in/reader036/viewer/2022062503/5a4d1b097f8b9ab0599898a4/html5/thumbnails/26.jpg)
Address whether decision trees can be effective in an operational setting—are current decision trees accurate enough to reduce faking?
Comparisons of decision tree faking/honest classification with classifications from IRT mixture models
Develop additional features to be used as predictor variables
Explore other machine learning techniques