2011.05.16 scale development slides eva/media/corporate/pdf... · scale development in practice:...
TRANSCRIPT
Scale development
Eva Cools Brown bag 16 May 2011
Innovation research at Vlerick: The state of the art chaired by Walter Van Dyck and Marion Debruyne,
Monday 6 June 2011 from 12.30 – 14.00
Getting an A*: some lessons learned and experiences to share chaired by Jan Lepoutre and Katleen De
Stobbeleir, Friday 23 September 2011 from 12.30 – 14.00
Research Brown Bag on Entrepreneurship research at Vlerick chaired by Miguel Meuleman, Thursday 20
October 2011 from 12.30 – 14.00
Epistemological foundations of transdisciplinary research: The case of the management of innovation
chaired by Walter Van Dyck, Thursday 24 November 2011 from 12.30 – 14.00
Creating high-performing and risk aware organisations chaired by Regine Slagmulder and Maria Boicova,
Tuesday 13 December 2011 from 12.30 – 14.00
More information?
Upcoming Research Brown Bags
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
OVERVIEW
Scale development in theory
Scale development in practice
Exchange of best practices
Q&A
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale development in theory
Measurement
= the assignment of numerals to objects or events according to rules (Stevens, 1951)
= the process of linking abstract concepts to empirical indicants(Zeller & Carmines)
How to determine the extent to which a particular empirical indicator (or set of empirical indicators) represents a given theoreticalconcept?
Reliability
Validity
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Reliability
Definition
= the degree to which the measurement agrees with itself
(Kerlinger & Lee)
= has nothing to do with the truthfulness of the measurement,
but with the accuracy with which a measuring instrument
measures whatever it measures (Kerlinger & Lee)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Reliability
4 possibilities (DeVellis; Carmines & Zeller)
Internal consistency reliability (homogeneity): concerned with the homogeneity of items comprising a scale – most used: cronbachalpha
Alternate forms reliability: possible when two strictly parallel forms of a scale exist – compute the correlation between them as long as people complete both parallel forms (with time interval)
Split half reliability: same logic as alternate forms, but split a set of items of single scale in two subsets (different possibilities tosplit)
Test-retest reliability (temporal stability): concerned with howconstant scores remain from one occasion to another
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Validity
Definition
= degree to which any measuring instrument measures what it is
intended to measure (Carmines & Zeller)
= are we measuring what we think we are measuring? (Kerlinger
& Lee)
Different aspects (DeVellis)
Validity is inferred from the manner in which a scale was
constructed (content validity), its ability to predict specific events
(criterion-related validity), or its relationship to measures of other
constructs (construct validity)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Validity
Kinds of validity (DeVellis; Carmines & Zeller; Price)
Content validity: concerns item sampling adequacy, this is the extent to which a specific set of items reflects a content domain
Criterion-related validity: is the degree of correspondencebetween the measure and some other accepted measure, the criterion
Concurrent validity: when criterion and predictor are assessedat the same point in time
Predictive validity: when the measure is expected to be highlyrelated to some future event or behavior
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Validity
Construct validity: concerned with the extent to which the empiricalrelationships based on using the measure are consistent with theory
3 steps (Carmines & Zeller):
1. theoretical relationships between concepts must be specified
2. empirical relationships between the measures of the concepts must beexamined
3. the empirical evidence must be interpreted in terms of how it clarifiesthe construct validity of the particular measure
Convergent validity: evidence of similarity between measures of theoretically related constructs
Discriminant validity: the absence of correlation betweenmeasures of unrelated constructs
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Steps
Step 1: determine clearly what it is youwant to measure
Step 2: generate an item pool
Step 3: determine the format formeasurement
Step 4: have initial item pool reviewed byexperts (cfr. content validity)
Step 5: consider inclusion of validationitems (cfr. construct validity)
Step 6: administer items to a development sample
Step 7: evaluate the items
Step 8: optimize scale length
Some specific advices given
Step 1: Theory first
Step 2: Advices about number and kind of these items (ideally: start from 3 to 4 timesas many items as in the final measure)
Step 3: Advices about response format: best choice is related to purpose of the measure and the theory
Step 6: advices about sample size and representativeness of this initial sample (Nunnally: 300)
Step 7: high inter-item correlations, high item-scale correlations, relatively high variance, item means close to center of the range, high cronbach alpha
Step 8: trade-off between reliability en brevity; cross-validation in large sample
Scale development guidelines from DeVellis (1991)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale development process according to Hinkin (1998)
Step 1: item generation
Key: well-articulated theoretical foundation
Item generation: deductive or inductive
Content validity assessment (pretest)
Advices about item wording, scaling and number of items (min. x 2)
Step 2: questionnaire administration
Advices about sample size (min. 200) and type
Nomological network
Step 3: initial item reduction
Check inter-item correlation (higher than .40)
Exploratory factor analysis (ideally: principal axis – eigenvalue greater than 1 and scree test of percentage variance explained – factor loading higher than .40)
Internal consistency assessment (cronbach alpha)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Hinkin (1998) (cont.)
Step 4: confirmatory factor analysis
Criteria about reporting (minimum: chi-square, degrees of freedom, recommended goodness-of-fit indices) and how to conduct the analyses
Step 5: convergent/discriminant validity
Most used: Multi-Trait Multi-Method (MTMM)
Also check criterion-related validity
Step 6: replication (Back to step 4)
Independent sample to increase generalisability
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
… and one more time (Hinkin, 1995; Schwab, 1980)
Step 1: item generation
Content validity
Inductive or deductive approach
Step 2: scale development
Step 2a: design of the developmental study
Sample type? Sample size? Reverse items? Number of items? Scaling of items?
Step 2b: scale construction
EFA and CFA
Step 2c: reliability assessment
Step 3: scale evaluation
Criterion-related validity
Construct validity
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale development: evaluative criteria
According to Robinson, Shaver & Wrightsman (1991)
Item construction criteria: sampling of relevant content; wording of items, item analysis
Response set criteria: controlling the spurious effects of acquiescence/agreement and social desirability response sets
Psychometric criteria: representative sampling; presentation of normative data; reliability (both test-retest reliability and internalconsistency); and validity (both convergent and discriminant)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale development: references
Cattell, R.B. (1974). How good is the modern questionnaire? General principles of evaluation. Journal of PersonalityAssessment, 38, 115-129.
Clark, L.A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. PsychologicalAssessment, 7, 3, 309-319.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 4, 281-302.
DeVellis, R.F. (1991). Scale development: Theory and applications. Newbury Park, CA: Sage Publications.
Haynes, S.N., Richard, D.C.S., & Kubany, E.S. (1995). Content validity in psychological assessment: A functionalapproach to concepts and methods. Psychological Assessment, 7, 3, 238-247.
Hinkin, T.R. (1995). A review of scale development practices in the study of organizations. Journal ofManagement, 21, 5, 967-988.
Hinkin, T.R. (1998). A brief tutorial on the development of measures for use in survey questionnaires.Organizational Research Methods, 1, 1, 104-121.
Kerlinger, F.N., & Lee, H.B. (2000). Foundations of behavioral research (fourth edition). Fort Worth, TX: HarcourtCollege Publishers (Part 8: Measurement).
Lewis-Beck, M.S. (1994) (Ed.). Basic measurement. Thoasand Oaks, CA: Sage Publications (Part 1: Reliability andvalidity assessment).
Nunnally, J.C., & Bernstein, I.H. (1994). Psychometric theory (third edition). New York: McGraw-Hill.
Robinson, J.P., Shaver, P.R., & Wrightsman, L.S. (1991). Criteria for scale selection and evaluation. In: J.P. Robinson,P.R. Shaver & L.S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (Chapter 1).San Diego, CA: Academic Press.
Schwab, D.P. (1980). Construct validity in organizational behavior. In: B.M. Staw & L.L. Cummings (Eds.). Research inorganizational behavior, volume 2 (pp. 3-43). Greenwich, CT: JAI Press.
Schriesheim, C.A., Powers, K.J., Scandura, T.A., Gardiner, C.C., & Lankau, M.J. (1993). Improving constructmeasurement in management research: Comments and a quantitative approach for assessing the theoreticalcontent adequacy of paper-and-pencil survey-type instruments. Journal of Management, 19, 2, 385-417.
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale development in practice: steps in the paper
Cools, E. & Van den Broeck, H. (2007). Development and validation of the Cognitive Style Indicator. The Journal of Psychology, 141, 4, 359-387.
Step 1: item generation
Content validity
Inductive and/or deductive approach
Step 2: scale development
Design of the developmental study (sample,…)
Scale construction: based on EFA and CFA
Reliability assessment
Step 3: scale evaluation
Construct and criterion-related validity
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Cools & Van den Broeck (2007): research design
Item generation: inductive and deductive approach
Pilot study (N = 15,616)
Three validation studies
Sample 1 (part of career decision survey): N = 5,924
Sample 2 (competence indicator tool): N = 1,580
Sample 3 (MBA students): N = 635
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Sample 1 Sample 2 Sample 3
Scale development
Item analysis Yes Yes Yes
Factor analysis Yes Yes Yes
EFA N = 2,970 N = 763 N = 321
CFA N = 2,954 N = 817 N = 314
Scale evaluation
Construct validity No No Yes
KAI N = 66
REI N = 70
MBTI N = 296
SIMP N = 98
Academic performance N = 443
Criterion-related validity
Hierarchical level N = 5,885
Study/job function N = 2,013 N = 713 N = 233 / N = 446
Cools & Van den Broeck (2007): research design
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Related to content validity: are the items a randomly chosen subset of
the universe of appropriate items? (De Vellis) – difficult to assess
given the lack of well-defined, objective criteria
Content validity consists essentially of judgement. Alone or with
others one judges the representativeness of the items. (Kerlinger &
Lee)
Scale development: item analysis
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Diverse criteria are used: examples
Remove: items with extreme response values and low variability in
responses (Lawson, 2004)
Item-total correlation of > .55 and lack of significant correlation with
Social Desirability Scale (Tziner et al, 1996)
Remove: those items with low inter-item and item-total correlations
(Arnold et al, 2000)
Check item-scale correlation and cronbach alpha (effect on alpha if item
removed) (Scheier & Carver, 1985)
Cronbach alpha and average inter-item correlation between .20 and .40
(Bateson & Crant, 1993)
Standard deviations of more than .40 and reasonably high item-scale
correlation (Towler & Dipbloye, 2003)
Inter-item correlation average of .30 or better (Robinson et al., 1991)
Scale development: item analysis (cont.)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Cools & Van den Broeck (2007): item analysis
Checking mean, standard deviation, item-scale and item-total correlations, average inter-item correlations, Cronbach alpha coefficients (DeVellis, 1991)
Criteria:
Item-total correlation of more than 0.30
Standard deviation of more than 0.40
Average inter-item correlation of 0.30 or better
Reliability: Cronbach alpha of more than 0.70
(Towler & Dipbloye, 2003; Robinson et al., 1991)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale development: factor analysis
Diverse approaches and criteria are used: examples
Factor loading of > .50 (Becker & Bos, 1979; Tziner et al, 1996; …)
Factor loading of minimum .40 (and not loading on more than one factor (Lawson, 2004))
Factor loading of more than .40 and no cross-loadings higher than .30 (Towler& Dipbloye, 2003)
Diverse fit-measures (Rybowiak et al, 1999; Towler & Dipbloye, 2003; Judge et al, 2003)
1.0 eigenvalue criterion and scree plot procedure (Cattell, 1966)
Factor loading of .60 or greater and no secondary loading higher than .40 (Garrison & Pate, 1977)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Cools & Van den Broeck (2007): factor analysis
Two-stage approach
(Gerbing & Hamilton, 1996; Hurley et al., 1997)
Exploratory factor analysis:
Checking eigenvalue-greater-than-one, scree plot, factor loadings and percentage of explained variance
Criteria: primary factor loading of 0.40 and no secondary loadings of more than 0.30 (Towler & Dipbloye, 2003)
Confirmatory factor analysis:
Checking various fit indices, taking into account the large sample sizes (Hair et al., 1998; Kline, 1998; MacCallum & Austin, 2000)
Criteria: RMSR (< 0.05), RMSEA (< 0.08), NNFI and NFI (> 0.85)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale evaluation: construct validity
Considered to be the most important kind of validity, also most used
Most often checked through intercorrelations with other instruments
Other possibility: factor analysis with other questionnaires (Scheier& Carver, 1985)
To be valid, a test has to be related to conceptually similar measures (convergent validity) and unrelated to conceptually dissimilar constructs (discriminant validity) (MTMM: Campbell & Fiske, 1959)
Nomological network: describe the relationship with conceptually similar and dissimilar constructs (Cronbach & Meehl, 1955).
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Cools & Van den Broeck (2007):
convergent and discriminant validity
Measures:
Kirton Adaption-Innovation Inventory (KAI)
(Kirton, 1976)
Rational-Experiential Inventory (REI)
(Pacini & Epstein, 1999)
Myers-Briggs Type Indicator (MBTI)
(Myers & Myers, 1998)
Single-Item Measures of Personality (SIMP)
(Woods & Hampson, 2005)
Academic performance
(Armstrong, 2000)
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Cools & Van den Broeck (2007): hypotheses
Knowing style Planning style Creating style
Category 1: hypothesized as strongly related
KAI - - +
Rationality REI + + -
Sensing MBTI + + -
Intuiting MBTI - - +
Judging MBTI + + -
Perceiving MBTI - - +
Category 2: hypothesized as showing weaker and less significant correlations
Thinking MBTI + + -
Extraversion SIMP - - +
Introversion SIMP + + -
Agreeableness SIMP - - +
Conscientiousness SIMP + + -
Openness - - +
Category 3: hypothesized as independent of cognitive style
Experientiality REI – Feeling MBTI - Emotional stability SIMP - Academic performance
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Scale evaluation: criterion-related validity
Most often used in pyschology or education, for example to analyse validity of certain types of tests or selection procedures
Less often used in organizational research (Price) and in socialsciences, as there is not always a criterion to evaluate the scale with
Depending on the purpose, the same correlation can be used todemonstrate construct and criterion-related validity
For example: link of cognitive style and academic performance at Vlerick or score on selection test
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Cools & Van den Broeck (2007): criterion-related validity
Hierarchical level:
People with management function score significantly higher on knowing and creating style than clerical staff
No significant differences with professional employees
Job function:
People with financial function score significantly higher on knowing style than people with a function in sales and marketing and personnel
Financial employees score significantly lower on creating style than people in sales and marketing
Personnel employees score significantly lower on planning style than people in sales and marketing
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Exchange of best practices
| 30-06-2010 | ELSIN conference|
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School
Conclusion: some recommendations
Theory first!
Try to follow the steps that are recommended in scale developmentand validation as closely as possible
Carefully write up the different steps that you did, with whichsamples and why, how many items were kept/skipped, on whatbasis,…
Keep track of the choices that you made along the process, as thiswill help you to write up and justify them in a later stage
Look at example articles to help you in writing up your developmentand validation work – there is no consistency, which is an advantage and a disadvantage at the same time
© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School|
Thank you for listening!
Innovation research at Vlerick: The state of the art chaired by Walter Van Dyck and Marion Debruyne,
Monday 6 June 2011 from 12.30 – 14.00
Getting an A*: some lessons learned and experiences to share chaired by Jan Lepoutre and Katleen De
Stobbeleir, Friday 23 September 2011 from 12.30 – 14.00
Research Brown Bag on Entrepreneurship research at Vlerick chaired by Miguel Meuleman, Thursday 20
October 2011 from 12.30 – 14.00
Epistemological foundations of transdisciplinary research: The case of the management of innovation
chaired by Walter Van Dyck, Thursday 24 November 2011 from 12.30 – 14.00
Creating high-performing and risk aware organisations chaired by Regine Slagmulder and Maria Boicova,
Tuesday 13 December 2011 from 12.30 – 14.00
More information?
Upcoming Research Brown Bags