the variability of assessment centre validities: subject to purpose? kim dowdeswell, senior research...

The variability of assessment centre validities: Subject to

purpose?

Kim Dowdeswell, Senior Research Consultant & Industrial Psychologist

30th Assessment Centre Study Group Conference, 18th March 2010

©2010 SHL Group Limited

2©2010 SHL Group Limited

Presentation OutlineSignposting the

discussion

• Introduction

• Assessment centre validation research over the years

• Assessment vs. development centre usage trends

• Comparing validities for overall assessment ratings vs. dimension ratings

• Potential differences in validities between assessment centre purposes and approach

• Questions?


IntroductionOn assessment centres

• Assessment centre methods incorporate three features:– The use of multiple assessment techniques– Standardised methods of making inferences from such techniques– Pooled judgements of multiple assessors in rating each candidate’s

behaviour

• For three major purposes (International Task Force on Assessment Center Guidelines, 2009):

– To predict future behaviour for decision-making– To diagnose development needs– To develop candidates on dimensions of interest

AC validation research over the years


An overview of meta-analytic research evidence


AC Validation ResearchThe ‘gold standard’

Gaugler et al. (1987):

• Corrected mean validity coefficient of 0.37– Meta-analysis of 50 assessment centre studies– Relation between overall assessment ratings and various criteria

6

+1 PERFECT PREDICTION

.63

.60

.54

.51

.51

.40

.37

.35

.26

.18

.020

-0.01

Ability and Structured InterviewAbility and Work sampleWork Sample TestsStructured InterviewsAbility TestsPersonality TestsAssessment CentresBio dataReferencesYears Job ExperienceGraphologyRANDOM PREDICTIONAge

The Validity LadderSchmidt & Hunter (1998)

Criterion: Overall Job Performance


AC Validation ResearchThe ‘gold standard’

Gaugler et al. (1987):

• Corrected mean validity coefficient of 0.37– Meta-analysis of 50 assessment centre studies– Relation between overall assessment ratings and various criteria

• Differences observed in validity coefficients i.t.o.:

By Criteria: By AC Purpose:

Job performance 0.36 Promotion 0.30

Potential ratings 0.53 Early identification 0.46

Dimension ratings 0.33 Selection 0.41

Training performance 0.35 Research 0.48

Career advancement 0.36


AC Validation ResearchOver the years

• Validity coefficients of assessment centres seem to be dropping:

• 1987 -> 2007 shows a statistically significant drop:

Study Year Validity coefficient

Gaugler et al. 1987 0.37

Hermelin et al. 2007 0.28

Study Year 95% CI band

Gaugler et al. 1987 0.30 ≤ ρ ≤ 0.42

Hermelin et al. 2007 0.24 ≤ ρ ≤ 0.32


AC Validation ResearchChallenges to conducting AC

validation

• Sampling error – AC time and cost = small samples

• Moderate to severe levels of range restriction– Starting with small samples -> even fewer appointments

• Reliability of supervisor ratings of performance / potential– A notoriously common problem in validation research


AC Validation ResearchChallenges to conducting AC

validation

• Sampling error – AC time and cost = small samples

• Moderate to severe levels of range restriction– Starting with small samples -> even fewer appointments

» Hermelin et al. (2007) put indirect range restriction forward as an explanation for the lower results observed; with cost considerations, ‘modern’ AC participants subject to more pre-selection than previously

• Reliability of supervisor ratings of performance / potential– A notoriously common problem in validation research

Assessment vs. development centre usage trends


The influence of purpose


AC vs. DC Usage TrendsThe influence of purpose

• The use of ACs for development has been increasing over the years

• Popular purposes for utilising ACs in surveyed US organisations (Spychalski et al., 1997):

– Selection (50.0%) / Promotion (45.8%)– Development planning (39.2%)

• Popular purposes for utilising ACs in surveyed SA organisations (Krause et al., 2010):

– Selection & Development (65%)– Selection alone (22%)– Development alone (13%)


AC vs. DC Usage TrendsThe influence of purpose: Selection ACs

• Selection ACs are designed to identify the best candidate for a job

• Features in ‘Selection Centres’ (Spychalski et al., 1997):– Assessors serve many times annually; ample opportunity to keep

skills current– Assessors are almost always asked to compile overall performance

ratings for selection purposes– ‘Selection Centre’ data validated more frequently than in centres used

for other purposes

More frequent likelihood of validation to be prepared for possible legal challenges to selection decisions based on AC data?


AC vs. DC Usage TrendsThe influence of purpose: Development ACs

• Goals of developmental assessment centres vary:– Identifying training needs– Formulating personalised developmental needs & action plans– Developing skills on the basis of immediate feedback and practice

• Features in ‘Development Planning Centres’ (Spychalski et al., 1997):

– Fewer candidate selection mechanisms used, with heavy reliance on supervisor data

– Assessors conduct lengthy discussion sessions, with other assessors and with candidates in feedback sessions

– ‘Development Planning Centres’ infrequently validated


AC vs. DC Usage TrendsThe influence of purpose: Assessor evaluations

• Possible implications of AC focus shifting from selection to development?

• “Assessors may evaluate candidates differently, depending on whether their ratings will serve a selection purpose (i.e., ‘yes/no’ decision) or a developmental purpose (i.e., identification of strengths and weaknesses).”

(Lievens & Klimoski, 2001)

The authors noted (in 2001) that there was no assessment centre research manipulating such variables that they knew of

Comparing validities for overall assessment ratings vs.

dimension ratings


Which yield higher validities?


Comparing ValiditiesOverall assessment vs. dimension ratings

• ‘Selection Centres’ typically use an overall assessment rating (OAR) to inform selection / promotion decisions

– OAR-based validity evidence e.g. Gaugler et al. (1987); Hermelin et al. (2007)

• ‘Development Centres’ typically use dimension ratings to facilitate detailed feedback with participants about their strengths and weaknesses

– Dimension-based validity evidence e.g. Arthur et al. (2003)



Arthur et al. (2003)

• Criterion-related validity of AC dimensions compared to OARs

Dimension Validity Coefficient

OAR Validity Coefficient

Problem Solving 0.39

0.37

(Gaugler et al., 1987)

Influencing Others 0.38

Organizing & Planning 0.37

Communication 0.33

Drive 0.31

Consideration/awareness of others 0.25




• Criterion-related validity of regression-based composite of AC dimensions compared to OARs

Dimension R OAR Validity Coefficient

Problem Solving 0.39

0.37

(Gaugler et al., 1987)

Influencing Others 0.43

Organizing & Planning 0.44

Communication 0.45

Drive 0.45

Consideration/awareness of others 0.45




• The use of OARs may result in an underestimate of criterion-related validity of assessment ratings

• Predictive validity of AC composite scores derived from dimension weights can be enhanced if dimension intercorrelations reduced

• May not need as many dimensions as have typically been used

• AC dimensions -> single composite score in same manner as typically used to combine scores from multipredictor test batteries

Potential differences in validities between AC purposes &

approach


What can we learn about increasing AC validity?


Differences in Approach & PurposeWhat can we learn about increasing validity?

• We have seen validity findings are influenced by:– The criteria used

» E.g. Dimension ratings (0.33) vs. Potential ratings (0.53)

– The purpose of the AC» E.g. Promotion (0.30) vs. Early Identification of Potential (0.46)

– The approach used in compiling candidate scores» E.g. OARs (0.37) vs. individual dimensions (0.25-0.39) vs. regression-based

composite of dimensions (0.45)

• Good research design & methodology is key


Differences in Approach & PurposeWhat can we learn about increasing validity?

• Literature fairly consistent i.t.o. other characteristics of ‘more highly valid’ ACs:

– Using psychologists as assessors rather than managers» Concern: Spychalski et al (1997) found that only 5.7% of cases utilised

psychologists as assessors

– Limit the number of dimensions evaluated» Concern: Krause et al. (2010) found only 20% of SA organisations

evaluated 5 dimensions or less in ACs

– Evaluate the ‘right’ dimensions; critical for success as identified by role analysis

» Further: Arthur et al. (2003) found some dimensions are more valid than others


• When considering several studies reporting on developmental assessment centres, Lievens & Klimoski (2001) noted as a limitation the majority of studies did not relate DC ratings to external criteria

• “However, in assessment centers conducted for developmental purposes other constructs might serve as more relevant criteria”

• “When validating or otherwise evaluating DACs, the appropriate criterion is change in participants’ understanding, behavior, and proficiency on targeted dimensions”

(International Task Force on Assessment Center Guidelines, 2009)

Differences in Approach & PurposeA final point on ACs vs. DCs & validities…


• We should not lose sight of the purpose of the Assessment Centre (Howard, 2009):

– Selection/Promotion:» Help find the right person for the job

– Diagnosis/Development:» Better choice of development activities?

– Succession/Placement:» Help find the right job for the person?

Differences in Approach & PurposeA final point on ACs vs. DCs & validities…

26

Questions…


References

Arthur Jr, W., Day, E.A., McNelly, T.L. & Edens, P.S. (2003). A meta-analysis of the criterion-related validity of assessment center dimensions. Personnel Psychology, 56, 125-154.

Gaugler, B.B., Rosenthal, D.B., Thornton, G.C. & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72, 493-511.

Hermelin, E., Lievens, F. & Robertson, I.T. (2007). The validity of assessment centres for the prediction of supervisory performance ratings: A meta-analysis. International Journal of Selection & Assessment, 15(4), 405-411.

Howard, A. (2009). Making assessment centers work they way they are supposed to. Keynote address at the 29th Assessment Centre Study Group Conference, Stellenbosch, South Africa, March 2009.

International Task Force on Assessment Center Guidelines. (2009). Guidelines and ethical considerations for assessment center operations. International Journal of Selection & Assessment, 17(3), 243-253).


References

Krause et al. (2010). State of the art assessment centre practices in South Africa: survey results, challenges, and syggestions for improvement. Keynote address at the 30th Assessment Centre Study Group Conference, Stellenbosch, South Africa, 18-19 March 2010.

Lievens, F. & Conway, J.M. (2001). Dimension and exercise variance in assessment centre scores: A large-scale evaluation of multitrait-multimethod studies. Journal of Applied Psychology, 86(6), 1202-1222.

Lievens, F. & Klimoski, R.J. (2001). Understanding the assessment center process: Where are we now? In C.L. Cooper & I.T. Robertson (Eds.) International Review of Industrial and Organizational Psychology vol. 16 (pp. 245-286). Chicester: John Wiley & Sons, Ltd.

Spychalski, A.C., Quinones, M.A., Gaugler, B.B. & Pohley, K. (1997). A survey of assessment center practices in organizations in the United States. Personnel Psychology, 50, 71-90.

the variability of assessment centre validities: subject to purpose? kim dowdeswell, senior research...

Documents

assessment centre purposes

overall assessment ratings

assessment center guidelines

ac purpose

small samples moderate

overall job performance

job performance0

validity ladderschmidt