assessment

http://adh.sagepub.com/Human Resources

Advances in Developing

http://adh.sagepub.com/content/8/2/247The online version of this article can be found at:

DOI: 10.1177/1523422305286155

2006 8: 247Advances in Developing Human ResourcesHsin-Chih Chen

AccountabilityAssessment Center: A Critical Mechanism for Assessing HRD Effectiveness and

Published by:

http://www.sagepublications.com

On behalf of:

Academy of Human Resource Development

can be found at:Advances in Developing Human ResourcesAdditional services and information for

http://adh.sagepub.com/cgi/alertsEmail Alerts:

http://adh.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://adh.sagepub.com/content/8/2/247.refs.htmlCitations:

at University of Bucharest on January 6, 2014adh.sagepub.comDownloaded from at University of Bucharest on January 6, 2014adh.sagepub.comDownloaded from

http://adh.sagepub.com/


http://adh.sagepub.com/content/8/2/247

http://adh.sagepub.com/content/8/2/247



http://www.ahrd.org

http://www.ahrd.org

http://adh.sagepub.com/cgi/alerts

http://adh.sagepub.com/cgi/alerts

http://adh.sagepub.com/subscriptions

http://adh.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://www.sagepub.com/journalsPermissions.nav

http://adh.sagepub.com/content/8/2/247.refs.html

http://adh.sagepub.com/content/8/2/247.refs.html





What is This?

- Apr 17, 2006Version of Record >>

at University of Bucharest on January 6, 2014adh.sagepub.comDownloaded from at University of Bucharest on January 6, 2014adh.sagepub.comDownloaded from

http://adh.sagepub.com/content/8/2/247.full.pdf

http://adh.sagepub.com/content/8/2/247.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml

http://online.sagepub.com/site/sphelp/vorhelp.xhtml





Assessment Center: A CriticalMechanism for Assessing HRDEffectiveness and Accountability

Hsin-Chih Chen

The problem and the solution. Assessment center (AC), drivenby job analysis or competency development, has long proven tohave strong content-related and criterion-related validities and someconstruct-related validity to assess behavior, which is one importantdimension for demonstrating training effectiveness and human resourcedevelopment (HRD) accountability.Yet, AC has not been effectivelyutilized in HRD research and practice for such purposes.This articlereviews research and practice of AC, discusses validity issues of ACs,identifies conceptual and evidence-based factors that affect AC valid-ity, and discusses strengths, weaknesses, threats, and opportunities ifAC is applied in HRD.The author suggests more adoption of AC toHRD practice. However, to make AC most useful for HRD and beable to integrate with other HR functions, HRD researchers shouldfocus on how to improve the construct-related validity of ACs, par-ticularly through design and development aspects of ACs andthrough the view of unified concept of construct validity.

Keywords: assessment center; training effectiveness; competency-basedtraining; validity; HRD effectiveness

Competency-based training has been a major human resource development(HRD) activity. A current research trend in HRD is to assess outcomes ofcompetency-based training to determine and document the effectiveness and theaccountability of HRD. HRD reflective practitioners and thought-provokingresearchers have placed considerable attention and efforts in the area of HRD

Advances in Developing Human Resources Vol. 8, No. 2 May 2006 247-264DOI: 10.1177/1523422305286155Copyright 2006 Sage Publications

This article is the author’s independent work and is not funded by the author’s current and formeremployers, Amedisys, Inc. and Louisiana State University, respectively.The opinions expressed inthe article are the author’s and do not necessarily reflect to the views of the author’s employers.Correspondence concerning this article should be addressed to Dr. Hsin-Chih Chen, 11100 MeadRoad, #300, Baton Rouge, LA 70816; e-mail: [email protected].

ADHR286155.qxd 3/30/2006 9:01 PM Page 247

at University of Bucharest on January 6, 2014adh.sagepub.comDownloaded from



effectiveness (e.g., return on investment, transfer of learning, on-the-job per-formance improvement) in recent years. However, relatively few have focusedon mechanisms such as assessment center (AC), which has long been provento have strong content-related and criterion-related validities, as a means ofassessing one important dimension of training outcomes—behavior change.

Thornton and Rupp (2004) defined AC as “a method of evaluating perfor-mance in a set of assessment techniques at least one of which is a simulation”(p. 319). A common practice in ACs is job analysis or competency develop-ment and/or modeling to identify dimension, which is often conceptualized asequivalent to competency. It is through these dimensions (or competencies)that behavioral indicators are determined and then assessed through varioussimulation tactics (e.g., in-basket, leaderless group, oral presentation, fact-finding, etc.). As a result, AC is different from simulation itself because itinvolves a complex process of job analysis and competency development.

AC has been empirically (Arthur, Day, McNelly, & Edens, 2003) and con-ceptually (Thornton, 1992) linked to various human resource (HR) functions(e.g., selection, promotion, training and development, and performance feed-back) in three HRD-related fields: public administration, industrial organiza-tional psychology, and HR management. Most empirical AC research andpractice has largely been used for selection and promotion purposes, but theuse of AC in training and development (or HRD)—commonly termed as devel-opmental AC or development center—is primarily in its conceptual stage(Spychalski, Quinones, Gaugler, & Pohley, 1997; Woehr & Arthur, 2003). It issurprising that the application of AC is scarcely utilized for training effectiveness(Halman & Fletcher, 2000). To the author’s knowledge, and within the literaturethe author can access, it has been recently used in some universities (e.g., TheCalifornia State University Fullerton Business School) and in corporate settings,such as Development Dimensions International (DDI; Mayes, Belloli, Riggio, &Aguirre, 1997; Riggio, Aguirre, Mayes, Belloli, & Kubiak, 1997).

Purposes and ObjectivesThe sparse use of AC in HRD along with the solidly grounded AC research

documented in other fields have led the author into an inquiry on how HRDresearchers and practitioners can utilize AC to demonstrate HRD effectivenessand accountability. Therefore, the purpose of this article is to conduct a criti-cal review of AC literature and to provide implications drawn from the reviewto HRD research and practice. Specifically, this article seeks to answer thefollowing questions:

(a) How have research and practice of the AC mechanism been histori-cally used in organizations around the world?

(b) What validity (content related, construct related, and criterion related)issues have been discussed in AC literature?

Advances in Developing Human Resources May 2006248

ADHR286155.qxd 3/30/2006 9:01 PM Page 248




(c) What are the major conceptual and evidence-based factors that sup-port and hinder AC validity in organizations? and

(d) What implications can be drawn from this study to help advance thefield of HRD to be more effective and accountable?

MethodA literature search was conducted through two electronic databases—

Academic Premier and Business Source Premier—to collect relevant informa-tion. A keyword—assessment center—was used to pair or combine with otherkeywords such as survey, practice, training and development, HR development,training effectiveness, training evaluation, competency-based training, contentvalidity, construct validity, and criterion validity through advanced search inthe database. The condition used for these keyword searches was set to “ABabstract or author-supplied abstract.” Additional references were collectedthrough secondary sources as cited in relevant literature found in the databases.

AC Practice and ResearchAC is apparently inherent from, and closely related to, the theory of per-

formance tests that focuses on assessing behavior change. Two classical liter-atures of AC, Thornton and Byham (1982) and Kraut (1973), vividly describedthe history and development of AC around the world. They found that perfor-mance tests were used as early as the 1990s to assess differences of individualbehaviors and, in some cases, to predict job performance. Several key featuresof AC (e.g., multiple assessors, complex realistic situations, and measurementof individual characteristics) as identified in the Guidelines and EthicalConsiderations for Assessment Center Operations (Joiner, 2000) emerged inmilitary settings around World War I, and the German government used thesefeatures to select capable military leaders in the 1930s. Similar procedureswere conducted to assess military personnel’s leadership potential in GreatBritain (War Office Selection Boards) and in the United States (Office ofStrategic Services and Veterans Administration Clinical Psychology Studies)in World War II, along with other tests (e.g., intellectual and personality tests)in Australia and Canada in the 1970s.

In nonmilitary settings, as described in the two above-mentioned classicaltexts, AC procedures of the type conducted in the Harvard Psychological Clinicwere used to assess effects of individual characteristics and environmental fac-tors on individual behavior in 1938. In 1948, an Australian manufacturing plantconducted group observations to select executive trainees, and in 1950 theBritish Civil Service Commission conducted AC in selecting civil servants forall middle- or high-level jobs. In 1964, American Telephone and TelegraphCompany (AT&T) conducted a large-scale, longitudinal study (the ManagementProgress Study) using multiple assessment procedures to study developmental

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM 249

ADHR286155.qxd 3/30/2006 9:01 PM Page 249





processes in consideration of both characteristics of individuals and organiza-tional settings. This study became a milestone and sparked the success of ACdevelopment in nonmilitary settings.

Since AT&T’s study, research and practice of AC in nonmilitary settingsstarted to grow in the United States and all over the world. Several leadingindustrial companies applied AC techniques for selection and promotion pur-poses. These companies included Caterpillar Tractor, Eastman Kodak, FordMotor Company, General Electric, General Motors, International BusinessMachines, J. C. Penney, Olin, Sears, Shell Oil, Standard Oil of Ohio, Syntex,Unilever, Union Carbide, and many other organizations in Australia, Brazil,Great Britain, Denmark, Germany, France, Finland, Japan, Mexico, theNetherlands, Taiwan, and the United States (Alexander, 1979; Cook &Herche, 1994; Kraut, 1973; Lievens, Harris, van Keer, & Bisqueret, 2003; Lin& Wang, 2000; Shackleton & Newell, 1991; Woodruffe, 1993). At present, themost globally recognized HR consulting firm specializing in AC design,development, and implementation is DDI.

Over the past two decades, the use of AC in industries has accelerated, par-ticularly in Great Britain and the United States. In Great Britain, only 7% oforganizations used AC in 1973 (Gill, Ungerson, & Thakur, 1973). However,the use of AC increased to 21% in 1986 (Robertson & Makin, 1986) and 59%in 1991 (Shackleton & Newell, 1991). In the United States, 44% of metropol-itan police and fire departments used AC for promotion purposes in the 1980s(Fitzgerald & Quaintance, 1982). In 1997, 74% of organizations used AC forselection, promotion, or development purposes (Spychalski et al., 1997).

Reasonably, the practice or operation of AC varies across disciplines and set-tings, so the practice or operation of AC had not concluded its unification until1975. In 1975, the International Congress on the Assessment Center Method heldin Quebec, Canada formed an international task force to develop guidelines forAC practice. These are the well-known Guidelines and Ethical Considerationsfor Assessment Center Operations (Joiner, 2000). The guidelines were set forthand attempted to incorporate existing best practices of AC to guide practice. Overthe years, these guidelines have been revised three times, and the InternationalPublic Management Association published the latest version of the guidelines forHR in 2000 (Joiner, 2000). According to the latest guidelines, 10 key features ofAC were identified. These features include the following:

• conduct a job analysis of relevant behaviors,• classify behaviors into meaningful and relevant dimensions

or competencies,• establish a link between the classified dimensions or competencies

and assessment techniques,• conduct multiple assessments,• develop and implement job-related simulations to elicit behaviors

related to classified dimensions or competencies,

ADHR286155.qxd 3/30/2006 9:01 PM Page 250





• use multiple assessors to observe and evaluate each participant,• train assessors with performance standards,• record specific behavior observations,• report observations made during each exercise before the integration

discussion, and• integrate data through a statistical integration process validated in

accordance with professionally accepted standards.

Validity Issues in AC LiteratureAC literature on investigating validity issues has focused on three streams:

content-related, criterion-related, and construct-related validities. Content-relatedand criterion-related validity issues of AC were raised and discussed by researchersin the late 1970s. Norton (1977) alleged that AC could appropriately be utilizedto select candidates for managerial positions even if empirical validation isabsent. He argued that behaviors solicited in simulations of AC are samplesrather than signs; a sample is validated through content-related validity, whereasa sign is validated through criterion-related validity. Dreher and Sackett (1981)stated that sign and sample are not mutually exclusive and contended that ACshould be considered as both a sample of behavior and a sign of future job per-formance. In terms of sources of containment of content-related validity, Dreherand Sackett (1981) and Sackett and Dreher (1982) argued that the sources orig-inated from inadequate job analysis or lack of fidelity (a close match betweenjob activities and AC dimensions and exercises), whereas Norton (1981) con-tended they were results of poor design and implementation of AC.

Debates about construct-related validities, such as convergent and discrim-inant validities, have centered on issues of design and implementation of AC.

TABLE 1: Sample Dimensions-Exercise Matrix

In- Leaderless Oral Fact- Role-Basket Group Presentation Finding Playing

Oral communication X X XInterpersonal skills X XProblem solving X X X XConflict management X XTeam building X XDecisiveness X X X XGoal setting X XAnalytic skills X XAdaptability XCoaching skills X XCustomer service X XWorkflow design X X

ADHR286155.qxd 3/30/2006 9:01 PM Page 251





As widely discussed in the AC literature, one of the key AC designs was tostructure a dimensions-exercises matrix. Table 1 shows a sample of a dimensions-exercise matrix.

In terms of the design aspect, dimensions are built into a variety of exercisesfor assessors to evaluate AC participants’ behaviors in demonstrating thesedimensions or competencies. A dimension is often assessed through differentexercises, and an exercise includes several different dimensions to be assessed.Accordingly, the ratings for the same dimension across exercises are expectedto highly correlate (convergent validity), whereas ratings of different dimen-sions within an exercise are less or not correlated (divergent validity). Forinstance, in the sample provided in Table 1, using the problem-solving dimensionas an example, to have construct-related validity, one would expect to see thatthe ratings of problem solving in the in-basket, leaderless group, fact-finding,and role-playing exercises are highly correlated. On the other hand, the ratingof problem solving should be correlated lower with other dimensions (i.e., oralcommunication, interpersonal skills, conflict management, team building, deci-siveness, and coaching skills) within the same exercise (i.e., leaderless group).

Over the years, considerable research focusing on investigating construct-related validity has consistently found low to moderate convergent validity ofAC dimensions across exercises and has weakly exhibited discriminant validityof dimensions within exercises (Archambeau, 1979; Bycio, Alvares, & Hahn,1987; Fleenor, 1996; Gorham, 1978; Herriot, 1986; Highhouse & Harris, 1993;Jackson, Stillman, & Atkins, 2005; Joyce, Thayer, & Pond, 1994; Kauffman,Jex, Love, & Libkuman, 1993; Kleinmann & Koller, 1997; Klimoski &Brickner, 1987; Lance et al., 2000; Lance, Lambert, Gewin, Lievens, &Conway, 2004; Lowry, 1995, 1997; Robertson, Gratton, & Sharpley, 1987;Sackett & Dreher, 1982; Sackett & Harris, 1988; Schneider & Schmitt, 1992;Silverman, Dalessio, Woods, & Johnson, 1986; Turnage & Muchinsky, 1982).These research findings have put construct-related validity in doubt, and manysuggested that AC ratings should be based on exercises instead of dimensions.

In contrast, some recent research has reported encouraging findings asopposed to traditional evidence of failure of construct-related validity inAC (Arthur, Woehr, & Maldegen, 2000; Kudisch, Ladd, & Dobbins, 1997;Lievens, 2001; Lievens & Conway, 2001; Reilly, Henry, & Smither, 1990;Russell & Domm, 1995; Thornton, Tziner, Dahan, Clevenger, & Meir, 1997;Woehr & Arthur, 2003). These studies pointed out that construct-related valid-ity may not be as troubling an issue as it has been perceived. The studies sug-gested that issues of development, implementation, design, and methodstogether contributed to construct-related validity of AC.

Issues related to criterion-related validity of AC do not appear as sophisti-cated as do those of construct-related validity. Specifically, evidence showsthat AC has been consistently proved capable of effectively predicting variousoutcome criteria (e.g., selection, promotion, and development). For example,Gaugler, Rosenthal, Thornton, and Benton (1987) conducted a meta-analysis

ADHR286155.qxd 3/30/2006 9:01 PM Page 252





of AC validity to examine the predictive (criterion-related) validity of AC. Thestudy meta-analyzed relationships between AC and its outcome variables from109 research references. The average validity coefficient was .37, which wascalculated by overall AC rating and corrected by sampling error, restriction ofrange, and criterion unreliability. Recently, Arthur et al. (2003) conducted asimilar meta-analysis of criterion-related validity of AC dimensions. The studyexamined 179 research articles including published (87%) and unpublished(13%) references. In contrast to Gaugler et al. (1987) who used overall AC rat-ing as a single predictor, Arthur et al. (2003) collapsed 168 dimensions into sixoverarching constructs as predictors. The results showed a range of estimatedcriterion-related validities from .25 to .39. Although methods varied, both studiessynthesized a large body of knowledge in establishing criterion-related valid-ity that evidently demonstrated the generalizability of AC.

Factors Affecting the Validity of ACThe issues discussed in relation to validity of AC can assist in identifying fac-

tors that support or hinder the success of AC where the success is understoodthrough effective planning, design, development, and implementation to demon-strate the concept of modern, unified construct validity. Two types of factors areincluded in this section: conceptual factors and evidence-based factors.

Conceptual Factors

Caldwell, Thornton, and Gruys (2003) identified 10 common assessmenterrors. The errors include (a) poor planning, (b) inadequate job analysis,(c) weakly defined dimensions, (d) poor exercises, (e) no pretest evaluations,(f) unqualified assessors, (g) inadequate assessor training, (h) inadequate can-didate preparation, (i) sloppy behavior documentation and scoring, and (j) mis-use of results. These common errors are well documented by the Guidelineand Ethical Considerations for Assessment Center Operations. In addition,Thornton (1992) found that the number of dimensions to be assessed in an ACcould also affect assessors’ accuracy in determining participants’ performancescore. The rationales behind these errors (factors) are summarized below.

Poor planning. Planning AC initiatives requires dedicated work and suffi-cient resources and experts available to ensure that effective AC practice is car-ried out. Poor planning may occur when key upper management is notinvolved and not committed to support the effort of preliminary AC planning.

Inadequate job analysis. Job analysis is the previous requirement for ACdesign. In practice, many organizations, particularly within the public sector,rely solely on content-related validity of AC because of limited resources andtime constraints. This has left job analysis extremely critical in producing

ADHR286155.qxd 3/30/2006 9:01 PM Page 253




effective AC. Without adequate job analysis, the process carried forth by ACwill not be justifiable.

Weakly defined dimensions. Sound definition of dimensions to be assessedin AC has been essential. This is because it requires articulation of certain per-formance conditions with the level of effectiveness measures to address behav-iors, which are expected to be demonstrated on a target job.

Poor exercises. Design, development, and administration of AC exercisesare sophisticated processes that require carefully crafted skills for all phases.In design and development phases, building work samples into exercises andensuring the exercises are designed in a way that behaviors of participants canbe elicited to reflect the defined dimensions are essential. In the administrationphase, carefully standardized exercises and clear specification of what roleindividuals should play will make the process effective, as interactive simula-tions are often utilized in AC.

No pretest evaluation. The design of AC should be valid for both contentand construct measures. Specifically, not only should the measurement instru-ments appropriately match the work domain of the position but also the knowl-edge and skills to be observed should be valid. Pretest evaluation should notbe solely based on prior experience. Rather, it should closely align with pro-cedures such as test construction, job analysis, job validation, statistics, andtechnical factors.

Unqualified assessors. Assessor selection should be based upon technicalknowledge and expertise in the profession being assessed. It is inevitable that man-agers may play the assessor role, but the qualified assessors who are also managersof participants must be able to demonstrate objective judgments in ratings.

Inadequate assessor training. Adequate assessor training has been confirmedto have a dramatic impact on the validity of AC because assessor training is crit-ical to ensure that proper classification of dimensions and behaviors are wellunderstood by assessors. Contents of adequate assessor training should includecontext information for effective judgments, detailed information, examples ofeffective and ineffective performance, and knowledge of assessment techniques,evaluation, policies, rating procedures, and feedback procedures.

Inadequate candidate preparation. Candidate’s preparation or readiness toattend AC is also a factor for effective AC. Orientation given to AC candidatesand/or participants will help them grasp an understanding of the “how” and“what” for the process being carried out. Contents to assist candidate prepara-tion may include what the objective is, how individuals are selected, whatoptions candidates may have, what key staff is involved, how the materials and


ADHR286155.qxd 3/30/2006 9:01 PM Page 254




results will be used, when and what kind of feedback will be given, who willhave access to the reports, and who will be the contact person.

Sloppy behavior documentation and scoring. The process of behavior docu-mentation is vital, because description of behavior serves as a foundation for sub-sequent behavioral assessment. On the other hand, the process of behavior scoringis equally critical because it underpins accuracy on which observed behaviors areassessed. AC validity can be achieved when both schema-driven and behavior-driven approaches are taken into account in behavior documentation and scoring,where the schema-driven approach assists in evaluating knowledge globally andthe behavior-driven approach facilitates in providing evidence-based judgment.

Misuse of results. Misuse of AC results has been a practical concern of ACresearchers. AC practitioners should effectively communicate with partici-pants with regard to the results being used and consistently adhere to the stan-dards and criteria set forth at the beginning of AC implementation. By sodoing, positive responses to AC can be expected so the AC process can be con-tinually implemented and sustained.

Number of dimensions to be assessed in an exercise. Mental constraint ofassessors has been a great concern in AC literature, which proposes that toomany dimensions to be assessed in an AC may limit an assessor’s ability to accu-rately classify the desired dimensions candidates exhibit (Lievens & Klimoski,2001). In this case, the assessor may demonstrate inconsistency in rating ofdimensions across exercises due to such a constraint. In addition, some literature(e.g., Sackett & Hakel, 1979) has consistently found that many dimensions ofAC that designers originally identify are highly intercorrelated so they can besubsequently collapsed into a lesser number of global dimensions. Althoughsuggestions vary for appropriate number of dimensions to be assessed in an AC,the rule of thumb is to minimize the dimensions if possible. Five to 10 dimen-sions being assessed in an exercise are deemed adequate (Thornton, 1992).

Evidence-Based Factors

The above-mentioned factors are mostly administrative, descriptive, or con-ceptual in nature. Literature through empirical (evidence-based) examination ofAC issues can also substantially contribute to our understanding of factors affect-ing AC success or validity. Mostly, the empirical literature on AC has focused ondesign, development, and implementation aspects to improve AC validity. Thesefactors include (a) rating approach, (b) transparent dimension, (c) assessor train-ing strategy, (d) assessing technique, and (e) variety of exercises.

Rating approach. Robie, Osburn, Morris, Etchegaray, and Adams (2000)examined the effect of the rating process on the construct-related validity of


ADHR286155.qxd 3/30/2006 9:01 PM Page 255




AC. In their study, AC dimensions were rated using either a within-exerciserating process or a within-dimension rating process, where the former alldimensions are rated within one exercise, and the latter one dimension is ratedacross all exercises. Their finding suggested that the within-exercise ratingprocess results in exercise factors, whereas the within-dimension ratingprocess results in dimension factors, implying that the within-dimension rat-ing approach could enhance construct-related validity of AC.

Transparent dimension. Revealing dimensions to the candidates before theAC is conducted can increase candidates’ readiness for AC assessment. Kolk,Born, and Van der Flier (2003) examined the effect of transparent dimensionthrough two independent studies—one with a student sample and the otherwith actual job applicants. The results showed a significant improvement inconstruct-related validity for the transparent group with actual job applicants.

Assessor training strategy. Although the importance of careful assessortraining has been repeatedly stressed in AC literature, relatively limitedresearch has empirically examined the effect of assessor training strategies.Schleicher, Day, Mayes, and Riggio (2002) used frame-of-reference assessortraining to examine construct validity of AC. The frame-of-reference trainingwas to eliminate bias of raters with a common frame of reference for rating.The results of their study showed that frame-of-reference assessor training iseffective in improving the reliability, accuracy, construct-related validity, andcriterion-related validity of assessment ratings.

Assessing technique. Cognitive demands of assessors may limit assessors’ability to classify dimension ratings in an AC. Similar to controlling the numberof dimensions to be assessed in an AC, using assessing techniques such asbehavior checklists can assist with assessors reducing cognitive demands andimproving construct validity of AC. Some research results suggested that thebehavior checklist technique can improve construct-related validity (Donahue,Truxillo, Cornwell, & Gerrity, 1997; Reilly et al., 1990).

Variety of exercises. Assessing human behavior is a complex undertaking.Therefore, dimensions to be assessed in an AC are inherently complicated. Todetermine the number of exercises to be used in an AC can be a contingentfactor. However, Gaugler et al. (1987) found that AC shows a stronger criterion-related validity as a greater number of different types of exercises are used inan AC. The finding suggested that single or limited number of exercises usedin an AC may not result in solid criterion-related validity.

Figure 1 demonstrates a summary of the factors affecting AC validity reviewedin this article. An overarching factor, systematic planning, drives the whole processof AC. Individual factors are categorized into four groups: the assessment-related,design-related, development-related, and implementation-related factors.


ADHR286155.qxd 3/30/2006 9:01 PM Page 256




Implications for HRD Researchand Practice: Strengths, Weaknesses,Opportunities, and Threats (SWOT) Analysis

The author adopts a SWOT analysis, which has been often used for organi-zational strategic planning, to develop implications of adopting AC for HRDresearch and practice. The SWOT refers to strengths, weaknesses, opportuni-ties, and threats. Although the four components are interrelated, the strengthsand weaknesses are often used for a diagnosis for internal environment withinan organization, whereas the opportunities and threats are associated withthe external environment. In the context of this article, the scope of SWOTanalysis is not focused on using AC in a single organization. Rather, it is con-ceptualized to understand how AC can affect HRD research and practice, par-ticularly for the adoption of AC to assess competency-based training outcomesor effectiveness of HRD interventions.

Strengths of Using AC in HRD

As this article shows, it is clear that existing AC literature from other fieldshas been informative, allowing HRD practitioners and researchers to benefitfrom their learning experiences (e.g., known models, wide adoption in differentdisciplines across the globe, historical development, ethical guidelines, issues ofvalidity, and key factors influencing AC success) to help demonstrate HRDeffectiveness. The author provides six key points explaining why AC has strong


Systematic Planning

Assessment Center Validities (Content-Related, Construct-Related, and Criterion-Related)

Assessment-Related Factors

Job Analysis/Competency Development

Dimension Definitions

Assessor Selection

Assessor Training Strategies

Exercise Development

Development-Related Factors

Exercise Implementation

Implementation-Related Factors

Candidate Preparation

Purposes Articulation and Result Reporting

Behavior Documentation and Scoring

Pre-Test

Variety of Exercises

Assessing Techniques

Rating Approach

Exercise Design

Design-Related Factors

FIGURE 1: Factors Affecting Validity of Assessment Center

ADHR286155.qxd 3/30/2006 9:01 PM Page 257




implications for assessing competency-based training and HRD effectivenessand accountability. First, AC is a behavior-based assessment, which is con-ceptually a more reliable indicator than cognitive assessment for measuringperformance. Second, AC is a valid mechanism, as its content-related andcriterion-related validities have been well established with partial support fromconstruct-related validities. Third, implementation of AC is more controllablebecause one can conduct pre- and post-AC evaluations to compare learning gainsthrough behavior exposure. Fourth, AC can help demonstrate unique characteris-tics of human capital of an organization and match organizational needs becausethe AC development process is rooted in job analysis and/or competency develop-ment. Fifth, AC simulations commonly use work samples, so the behaviorsdemonstrated in AC are closely related to real job performance. Last but not least,AC can measure competencies at individual (e.g., oral presentation), process (e.g.,in-basket), and group and organization levels (e.g., leaderless group discussion).

HRD practice has exercised a multidimensional approach (e.g., on-the-jobperformance, return on investment, organizational results) to assessing effec-tiveness of HRD interventions. AC is conceptually instrumental in integratingwith other dimensions of HRD effectiveness, particularly for learning transfer,and could be used as a moderator to link to these dimensions. For example, ACparticipant ratings can be an evidence of short-term learning transfer, whereasthe on-the-job performance is related to long-term learning transfer. On theother hand, because AC has the potential to assess HRD interventions acrossdifferent levels (e.g., individual, group, and organization), it can be used as anindependent dimension to represent HRD effectiveness.

Weaknesses of Using AC in HRD

A key weakness of AC is the issue of the construct-related validity that isonly partially supported by the literature and has been highly debated over theyears. This is particularly an issue if one conducts a competency developmentand in turn develops a competency-based AC from a system perspective.Specifically, most of the AC literature in assessing construct-related validityhas suffered from receiving low to moderate convergent validity of dimensionsacross the exercises and low discriminant validity within each exercise. Theweakness of construct-related validity of AC has somewhat challenged ACto be used for multiple purposes even if it is conceptually sound to do so(Thornton et al., 1997). For instance, in terms of the review of this article,research focusing on examining whether AC ratings should be based ondimensions or exercises has not come to a steadfast conclusion. The fact is thatmainstream research results are in favor of the exercise-rating approach. Yet,such practice is not suitable for HRD, nor does it have the capacity to integrateHR-related functions. If AC ratings are based on exercises, it will misplace itsability to identify strengths and weaknesses of the dimensions or competencies


ADHR286155.qxd 3/30/2006 9:01 PM Page 258




that individuals demonstrate in the AC. This is because exercise ratings canonly provide information on how well individuals perform in an AC but willbe unable to determine what dimensions are effective predictors for job per-formance. Following this logic, it is also palpable that the adoption of the ACexercise-rating approach is inadequate to link to other HRD-related functions.

A further example is provided by the following. Suppose a leaderless groupsimulation is designed to assess communication, decision making, and negoti-ation competencies for promotion purposes. If one receives a higher score thanthe standard in the simulation, he or she will be promoted. Conversely, if theindividual does not achieve the desired performance level, he or she willrequire competency training. When an exercise-rating approach is adopted inthe given case, the training need becomes undetermined, because the exercise-rating approach is unable to provide adequate information. Moreover, if anexercise-rating approach was best suited for AC, the job analysis/competencydevelopment in an AC design would have become an irrelevant or unnecessaryprocedure, because job analysis/competency development is the process ofidentifying competencies and dimensions rather than exercises.

Threats to and Opportunities for Using AC in HRD

Understandably, the weakness of AC will become a major threat once ACis adopted in HRD practice. Nevertheless, a modern conceptualization of thesevarious validities has been spoken of as a unified, comprehensive concept ofconstruct validity. Messick (1995) stated, “This comprehensive view of [con-struct] validity integrates content, criterion, and consequences into a constructframework for empirically testing rational hypotheses about score meaningand utility” (p. 742). He contended that the unified concept of construct vali-dation is more than an attempt to define or understand the construct. Rather,the construct validation should adhere to meaningful measure and its utility.This unified concept of construct validity has recently been adopted to explainthe construct-related paradox in AC literature. For example, Russell andDomm (1995) asserted that although construct-related validity of assessmenthas been questionable, the consistent criterion-related validity in AC hasimplied that some valid construct must have existed; the problem is that we donot know what the construct is. In addition, Woehr and Arthur (2003) foundthat research in examining construct-related and criterion-related validities hasbeen done separately, and only a few studies have been conducted to assessboth simultaneously. A lack of construct-related and criterion-related validi-ties was found in the few studies that examined both. In other words, whenconstruct-related and criterion-related validities are examined, the findings ofboth validities tend to be consistent. Following this logic, the modern concep-tualization of the unified construct validity seems to be a reasonable explana-tion as to why conventional evidence is lacking for construct-related validity


ADHR286155.qxd 3/30/2006 9:01 PM Page 259




of AC, because to some extent, the three types of validities had been histori-cally considered in isolation. It could be possible that the research showing alack of construct-related validity of AC could also lack criterion-related valid-ity if both are conducted together.

Moreover, in terms of the review of this article, the sources of construct-related validity issues of AC are related to design, development, and imple-mentation. Because HRD has a great tradition in dealing with these issues, thechallenge of the AC construct-related validity issue becomes an opportunityfor HRD to make contributions to AC research and practice. Improvement ofconstruct-related validity of AC not only helps demonstrate the effectivenessof HRD interventions but also has potential to put HRD in a key positionin organizations for designing competency-based assessment systems, (e.g.,competency modeling, performance appraisal, competency-based selection,etc.), further recognizing the accountability of HRD.

Contributions to HRD and Future Research DirectionProven content-related and criterion-related validities and some construct-

related validity documented in the literature have confirmed that AC is avalid mechanism to assess behaviors. Unfortunately, research and practice inHRD has not paid enough attention to this important area. Perhaps it is dueto the nature of the complex AC process and cost-effective concerns causingthe ignorance or hesitant adoption of AC in HRD. However, as discussed, ifAC can enhance effectiveness and accountability of HRD, such concernsturn out to be unnecessary. The author hopes through exposing the wide useof AC in the world, demonstrating evidence of AC validity, identifying influ-ential factors of AC validity, and identifying strengths, weaknesses, threats,and opportunities of using AC in HRD that this article will encourage reflec-tive practitioners of HRD to adopt AC, and provoke researchers of HRD tostudy research initiatives in AC in order to enhance HRD effectiveness andaccountability. Finally, to make AC most appropriate for use in HRD, futureresearch, as done in other fields, should continually focus on how to improvethe construct-related validity of AC, particularly through design and devel-opment perspectives of AC and through the unified concept of constructvalidity.

References

Alexander, L. D. (1979). An exploratory study of the utilization of assessment centerresults. Academy of Management Journal, 22(1), 152-157.

Archambeau, D. J. (1979). Relationships among skill ratings in an assessment center.Journal of Assessment Center Technology, 2, 7-20.

Arthur, W., Jr., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A meta-analysis of thecriterion-related validity of assessment center dimensions. Personnel Psychology, 56,125-154.


ADHR286155.qxd 3/30/2006 9:01 PM Page 260




Arthur, W., Jr., Woehr, D. J., & Maldegen, R. (2000). Convergent and discriminant valid-ity of assessment center dimensions: An empirical re-examination of the assessmentcenter construct-related validity paradox. Journal of Management, 26, 813-835.

Bycio, P., Alvares, K. M., & Hahn, J. (1987). Situation specificity in assessment centerratings: A confirmatory analysis. Journal of Applied Psychology, 72, 463-474.

Caldwell, C., Thornton, G. C., III, & Gruys, M. L. (2003). Ten classic assessment cen-ter errors: Challenges to selection validity. Public Personnel Management, 32, 73-88.

Cook, R., & Herche, J. (1994). Assessment centers: A contrast of usage in diverse envi-ronments. The International Executive, 36(5), 645-656.

Donahue, L. M., Truxillo, D. M., Cornwell, J. M., & Gerrity, M. J. (1997). Assessmentcenter validity and behavioral checklists: Some additional findings. Journal ofSocial Behavior and Personality, 12, 85-108.

Dreher, G. F., & Sackett, P. R. (1981). Some problems with applying content validityevidence to assessment center procedures. The Academy of Management Review, 6,551-560.

Fitzgerald, L. F., & Quaintance, M. K. (1982). Survey of assessment center use in stateand local government. Journal of Assessment Center Technology, 5, 9-21.

Fleenor, J. W. (1996). Constructs and developmental assessment centers: Further trou-bling empirical findings. Journal of Business and Psychology, 3, 319-333.

Gaugler, B., Rosenthal, D., Thornton, G., & Benton, C. (1987). Meta-analysis ofassessment center validity. Journal of Applied Psychology, 72, 492-510.

Gill, D., Ungerson, B., & Thakur, M. (1973). Performance appraisal in perspective:A survey of current practice. London: Institute of Personnel Management.

Gorham, W. A. (1978). Federal executive agency guidelines and their impact on theassessment center method. Journal of Assessment Center Technology, 1, 2-8.

Halman, F., & Fletcher, C. (2000). The impact of development centre participationand the role of individual differences in changing self-assessments. Journal ofOccupational and Organizational Psychology, 73, 423-442.

Herriot, P. (1986). Assessment centers revisited. Guidance and Assessment Review, 2, 7-8.Highhouse, S., & Harris, M. M. (1993). The measurement of assessment center situa-

tions. Journal of Applied Social Psychology, 23, 140-155.Jackson, D. J., Stillman, J. A., & Atkins, S. G. (2005). Rating tasks versus dimensions in

assessment centers: A psychometric comparison. Human Performance, 18(3), 213-241.Joiner, D. A. (2000). Guidelines and ethical considerations for assessment center oper-

ations: International task force on assessment center guidelines. Public PersonnelManagement, 29, 315-331.

Joyce, L. W., Thayer, P. W., & Pond, S. B., III. (1994). Managerial functions: An alterna-tive to traditional assessment center dimensions? Personnel Psychology, 47, 109-121.

Kauffman, J. R., Jex, S. M., Love, K. G., & Libkuman, T. M. (1993). The constructvalidity of assessment center performance dimensions. International Journal ofSelection and Assessment, 1, 213-223.

Kleinmann, M., & Koller, O. (1997). Construct validity of assessment centers:Appropriate use of confirmatory factor analysis and suitable construction princi-ples. Journal of Social Behavior and Personality, 12, 65-84.

Klimoski, R., & Brickner, M. (1987). Why do assessment centers work? The puzzle ofassessment center validity. Personnel Psychology, 40, 243-260.

Kolk, N. J., Born, M. P., & Van der Flier, H. (2003). The transparent assessmentcenter: The effect of revealing dimensions to applicants. Applied Psychology:An International Review, 52(4), 648-668.


ADHR286155.qxd 3/30/2006 9:01 PM Page 261




Kraut, A. (1973). Management assessment in international organizations. IndustrialRelations, 12(2), 172-182.

Kudisch, J. D., Ladd, R. T., & Dobbins, G. H. (1997). New evidence on the constructvalidity of diagnostic assessment centers: The findings may not be so troubling afterall. Journal of Social Behavior and Personality, 12, 129-144.

Lance, C. E., Lambert, T. A., Gewin, A. G., Lievens, F., & Conway, J. M. (2004).Revised estimates of dimension and exercise variance components in assessmentcenter post-exercise dimension ratings. Journal of Applied Psychology, 89, 377-385.

Lance, C. E., Newbolt, W. H., Gatewood, R. D., Foster, M. R., French, N. R., & Smith,D. E. (2000). Assessment center exercise factors represent cross-situational speci-ficity, not method bias. Human Performance, 13(4), 323-353.

Lievens, F. (2001). Assessors and use of assessment center dimensions: A fresh look ata troubling issue. Journal of Organizational Behavior, 22, 203-221.

Lievens, F., & Conway, J. M. (2001). Dimension and exercise variance in assessmentcenter scores: A large-scale evaluation of multitrait-multimethod studies. Journal ofApplied Psychology, 86, 1202-1222.

Lievens, F., Harris, M. M., van Keer, E., & Bisqueret, C. (2003). Predicting cross-cultural training performance: The validity of personality, cognitive ability, anddimensions measured by an assessment center and a behavior description interview.Journal of Applied Psychology, 88, 476-489.

Lievens, F., & Klimoski, R. (2001). Understanding of assessment center process:Where are we now? In C. L. Cooper & I. T. Robertson (Eds.), International reviewof industrial and organizational psychology (Vol. 16). New York: John Wiley.

Lin, T.-Y. S., & Wang, S.-M. (2000, September). The application of assessment centermethod on assessing technological and vocational teachers. Paper presented at theInternational Conference of Scholars on Technology Education, Braunschweig,Germany.

Lowry, P. E. (1995). The assessment center process: Assessing leadership in the publicsector. Public Personnel Management, 24, 443-450.

Lowry, P. E. (1997). The assessment center process: New directions. Journal of SocialBehavior and Personality, 12, 53-62.

Mayes, B. T., Belloli, C. A., Riggio, R. E., & Aguirre, M. (1997). Assessment centers forcourse evaluations: A demonstration. Journal of Social Behavior and Personality, 12,303-320.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences frompersons’ responses and performance as scientific inquiry into scoring meaning.American Psychologist, 9, 741-749.

Norton, S. D. (1977). The empirical and content validity of assessment centers vs. tra-ditional methods for predicting managerial success. Academy of ManagementReview, 2, 442-453.

Norton, S. D. (1981). The assessment center process and content validity: A reply toDreher and Sackett. Academy of Management Review, 6, 561-566.

Reilly, R. R., Henry, S., & Smither, J. W. (1990). An examination of the effects of usingbehavior checklists on the construct validity of assessment center dimensions.Personnel Psychology, 43, 71-84.

Riggio, R. E., Aguirre, M., Mayes, B. T., Belloli, C., & Kubiak, C. (1997). The use ofassessment center methods for students’ outcome assessment. Journal of SocialBehavior and Personality, 12, 273-288.


ADHR286155.qxd 3/30/2006 9:01 PM Page 262




Robertson, I. T., Gratton, L., & Sharpley, D. (1987). The psychometric propertiesof managerial assessment centers: Dimension into exercises won’t go. Journal ofOccupational Psychology, 60, 187-195.

Robertson, I. T., & Makin, P. J. (1986). Management selection in Britain: A survey andcritique. Journal of Occupational Psychology, 59, 45-57.

Robie, C., Osburn, H. G., Morris, M. A., Etchegaray, J. M., & Adams, K. A. (2000).Effects of the rating process on the construct validity of assessment center dimen-sion evaluations. Human Performance, 13(4), 355-370.

Russell, C. J., & Domm, D. R. (1995). Two field tests of an explanation of assessmentcenter validity. Journal of Occupational and Organizational Psychology, 68,25-47.

Sackett, P. R., & Dreher, G. F. (1981). Some misconceptions about content-orientedvalidation: A rejoinder to Norton. Academy of Management Review, 6, 567-568.

Sackett, P. R., & Dreher, G. F. (1982). Constructs and assessment center dimensions:Some troubling empirical findings. Journal of Applied Psychology, 67, 401-410.

Sackett, P. R., & Hakel, M. D. (1979). Temporal stability and individual differencesin using assessment center information to form overall ratings. OrganizationalBehavior and Human Performance, 23, 120-137.

Sackett, P. R., & Harris, M. (1988). A further examination of the constructs underlyingassessment center ratings. Journal of Business and Psychology, 3, 214-229.

Schleicher, D. J., Day, C. V., Mayes, B. T., & Riggio, R. E. (2002). A new frame forframe-of-reference training: Enhancing the construct validity of assessment centers.Journal of Applied Psychology, 87, 735-746.

Schneider, J. R., & Schmitt, N. (1992). An exercise design approach to understandingassessment center dimension and exercise constructs. Journal of Applied Psychology,77, 32-41.

Silverman, W. H., Dalessio, A., Woods, S. B., & Johnson, R. L., Jr. (1986). Influence ofassessment center methods on assessors’ ratings. Personnel Psychology, 39, 565-578.

Shackleton, V., & Newell, S. (1991). Management selection: A comparative surveyof methods used in top British and French companies. Journal of OccupationalPsychology, 64, 23-36.

Spychalski, A. C., Quinones, M., Gaugler, B. B., & Pohley, K. (1997). A survey ofassessment centre practices in organizations in the United States. PersonnelPsychology, 50, 71-90.

Thornton, G. C., III. (1992). Assessment centers in human resource management.Reading, MA: Addison-Wesley.

Thornton, G. C., III, & Byham, W. C. (1982). Assessment centers and managerial per-formance. New York: Academic Press.

Thornton, G. C., III, & Rupp, D. E. (2004). Simulations and assessment centers. InM. Hersen (Ed.), Comprehensive handbook of psychological assessment: Vol. 4.Industrial and organizational assessment. (J. C. Thomas, Vol. Ed., pp. 319-344).Hoboken, NJ: John Wiley.

Thornton, G. C., III, Tziner, A., Dahan, M., Clevenger, J. P., & Meir, E. (1997).Construct validity of assessment center judgments: Analyses of the behavioralreporting method. Journal of Social Behavior and Personality, 12, 109-128.

Turnage, J. J., & Muchinsky, P. M. (1982). Transitional variability in human performancewithin assessment centers. Organizational Behavior and Human Performance, 30,174-200.


ADHR286155.qxd 3/30/2006 9:01 PM Page 263




Woehr, D. J., & Arthur, W., Jr. (2003). The construct-related validity of assessment centerratings: A review and meta-analysis of the role of methodological factors. Journalof Management, 29, 231-258.

Woodruffe, C. (1993). Assessment center: Identifying and developing competence.London: Institute of Personnel Management.

Hsin-Chih Chen, PhD, is a statistician/research analyst at Amedisys, Inc., a leadingprovider of home health care services, where he conducts data-driven research on qualityof services, market analyses, and corporate strategies across all levels. Prior to joiningAmedisys, Inc., he served as a postdoctoral researcher at Louisiana State University. Hehas published a number of research articles in peer-reviewed human resource develop-ment journals, and currently serves as associate editor for the 2006 InternationalConference Proceedings of Academy of Human Resource Development. His recentresearch interests include competency-based development, assessment center, transfer oflearning, and effectiveness, strategy, and philosophy of human resource development. Hisdoctorate was completed in human resource development at Louisiana State University.

Chen, H.-C. (2006). Assessment center: A critical mechanism for assessing HRD effec-tiveness and accountability. Advances in Developing Human Resources, 8(2), 247-264.


ADHR286155.qxd 3/30/2006 9:01 PM Page 264

assessment

Documents