2011.05.16 scale development slides eva/media/corporate/pdf... · scale development in practice:...

Scale development

Eva Cools Brown bag 16 May 2011

Innovation research at Vlerick: The state of the art chaired by Walter Van Dyck and Marion Debruyne,

Monday 6 June 2011 from 12.30 – 14.00

Getting an A*: some lessons learned and experiences to share chaired by Jan Lepoutre and Katleen De

Stobbeleir, Friday 23 September 2011 from 12.30 – 14.00

Research Brown Bag on Entrepreneurship research at Vlerick chaired by Miguel Meuleman, Thursday 20

October 2011 from 12.30 – 14.00

Epistemological foundations of transdisciplinary research: The case of the management of innovation

chaired by Walter Van Dyck, Thursday 24 November 2011 from 12.30 – 14.00

Creating high-performing and risk aware organisations chaired by Regine Slagmulder and Maria Boicova,

Tuesday 13 December 2011 from 12.30 – 14.00

More information?

[email protected]

Upcoming Research Brown Bags

© Vlerick Leuven Gent Management School© Vlerick Leuven Gent Management School

OVERVIEW

Scale development in theory

Scale development in practice

Exchange of best practices

Q&A


Scale development in theory

Measurement

= the assignment of numerals to objects or events according to rules (Stevens, 1951)

= the process of linking abstract concepts to empirical indicants(Zeller & Carmines)

How to determine the extent to which a particular empirical indicator (or set of empirical indicators) represents a given theoreticalconcept?

Reliability

Validity


Reliability

Definition

= the degree to which the measurement agrees with itself

(Kerlinger & Lee)

= has nothing to do with the truthfulness of the measurement,

but with the accuracy with which a measuring instrument

measures whatever it measures (Kerlinger & Lee)


Reliability

4 possibilities (DeVellis; Carmines & Zeller)

Internal consistency reliability (homogeneity): concerned with the homogeneity of items comprising a scale – most used: cronbachalpha

Alternate forms reliability: possible when two strictly parallel forms of a scale exist – compute the correlation between them as long as people complete both parallel forms (with time interval)

Split half reliability: same logic as alternate forms, but split a set of items of single scale in two subsets (different possibilities tosplit)

Test-retest reliability (temporal stability): concerned with howconstant scores remain from one occasion to another


Validity

Definition

= degree to which any measuring instrument measures what it is

intended to measure (Carmines & Zeller)

= are we measuring what we think we are measuring? (Kerlinger

& Lee)

Different aspects (DeVellis)

Validity is inferred from the manner in which a scale was

constructed (content validity), its ability to predict specific events

(criterion-related validity), or its relationship to measures of other

constructs (construct validity)


Validity

Kinds of validity (DeVellis; Carmines & Zeller; Price)

Content validity: concerns item sampling adequacy, this is the extent to which a specific set of items reflects a content domain

Criterion-related validity: is the degree of correspondencebetween the measure and some other accepted measure, the criterion

Concurrent validity: when criterion and predictor are assessedat the same point in time

Predictive validity: when the measure is expected to be highlyrelated to some future event or behavior


Validity

Construct validity: concerned with the extent to which the empiricalrelationships based on using the measure are consistent with theory

3 steps (Carmines & Zeller):

1. theoretical relationships between concepts must be specified

2. empirical relationships between the measures of the concepts must beexamined

3. the empirical evidence must be interpreted in terms of how it clarifiesthe construct validity of the particular measure

Convergent validity: evidence of similarity between measures of theoretically related constructs

Discriminant validity: the absence of correlation betweenmeasures of unrelated constructs


Steps

Step 1: determine clearly what it is youwant to measure

Step 2: generate an item pool

Step 3: determine the format formeasurement

Step 4: have initial item pool reviewed byexperts (cfr. content validity)

Step 5: consider inclusion of validationitems (cfr. construct validity)

Step 6: administer items to a development sample

Step 7: evaluate the items

Step 8: optimize scale length

Some specific advices given

Step 1: Theory first

Step 2: Advices about number and kind of these items (ideally: start from 3 to 4 timesas many items as in the final measure)

Step 3: Advices about response format: best choice is related to purpose of the measure and the theory

Step 6: advices about sample size and representativeness of this initial sample (Nunnally: 300)

Step 7: high inter-item correlations, high item-scale correlations, relatively high variance, item means close to center of the range, high cronbach alpha

Step 8: trade-off between reliability en brevity; cross-validation in large sample

Scale development guidelines from DeVellis (1991)


Scale development process according to Hinkin (1998)

Step 1: item generation

Key: well-articulated theoretical foundation

Item generation: deductive or inductive

Content validity assessment (pretest)

Advices about item wording, scaling and number of items (min. x 2)

Step 2: questionnaire administration

Advices about sample size (min. 200) and type

Nomological network

Step 3: initial item reduction

Check inter-item correlation (higher than .40)

Exploratory factor analysis (ideally: principal axis – eigenvalue greater than 1 and scree test of percentage variance explained – factor loading higher than .40)

Internal consistency assessment (cronbach alpha)


Hinkin (1998) (cont.)

Step 4: confirmatory factor analysis

Criteria about reporting (minimum: chi-square, degrees of freedom, recommended goodness-of-fit indices) and how to conduct the analyses

Step 5: convergent/discriminant validity

Most used: Multi-Trait Multi-Method (MTMM)

Also check criterion-related validity

Step 6: replication (Back to step 4)

Independent sample to increase generalisability


… and one more time (Hinkin, 1995; Schwab, 1980)


Content validity

Inductive or deductive approach

Step 2: scale development

Step 2a: design of the developmental study

Sample type? Sample size? Reverse items? Number of items? Scaling of items?

Step 2b: scale construction

EFA and CFA

Step 2c: reliability assessment

Step 3: scale evaluation

Criterion-related validity

Construct validity


Scale development: evaluative criteria

According to Robinson, Shaver & Wrightsman (1991)

Item construction criteria: sampling of relevant content; wording of items, item analysis

Response set criteria: controlling the spurious effects of acquiescence/agreement and social desirability response sets

Psychometric criteria: representative sampling; presentation of normative data; reliability (both test-retest reliability and internalconsistency); and validity (both convergent and discriminant)


Scale development: references

Cattell, R.B. (1974). How good is the modern questionnaire? General principles of evaluation. Journal of PersonalityAssessment, 38, 115-129.

Clark, L.A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. PsychologicalAssessment, 7, 3, 309-319.

Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 4, 281-302.

DeVellis, R.F. (1991). Scale development: Theory and applications. Newbury Park, CA: Sage Publications.

Haynes, S.N., Richard, D.C.S., & Kubany, E.S. (1995). Content validity in psychological assessment: A functionalapproach to concepts and methods. Psychological Assessment, 7, 3, 238-247.

Hinkin, T.R. (1995). A review of scale development practices in the study of organizations. Journal ofManagement, 21, 5, 967-988.

Hinkin, T.R. (1998). A brief tutorial on the development of measures for use in survey questionnaires.Organizational Research Methods, 1, 1, 104-121.

Kerlinger, F.N., & Lee, H.B. (2000). Foundations of behavioral research (fourth edition). Fort Worth, TX: HarcourtCollege Publishers (Part 8: Measurement).

Lewis-Beck, M.S. (1994) (Ed.). Basic measurement. Thoasand Oaks, CA: Sage Publications (Part 1: Reliability andvalidity assessment).

Nunnally, J.C., & Bernstein, I.H. (1994). Psychometric theory (third edition). New York: McGraw-Hill.

Robinson, J.P., Shaver, P.R., & Wrightsman, L.S. (1991). Criteria for scale selection and evaluation. In: J.P. Robinson,P.R. Shaver & L.S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (Chapter 1).San Diego, CA: Academic Press.

Schwab, D.P. (1980). Construct validity in organizational behavior. In: B.M. Staw & L.L. Cummings (Eds.). Research inorganizational behavior, volume 2 (pp. 3-43). Greenwich, CT: JAI Press.

Schriesheim, C.A., Powers, K.J., Scandura, T.A., Gardiner, C.C., & Lankau, M.J. (1993). Improving constructmeasurement in management research: Comments and a quantitative approach for assessing the theoreticalcontent adequacy of paper-and-pencil survey-type instruments. Journal of Management, 19, 2, 385-417.


Scale development in practice: steps in the paper

Cools, E. & Van den Broeck, H. (2007). Development and validation of the Cognitive Style Indicator. The Journal of Psychology, 141, 4, 359-387.


Content validity

Inductive and/or deductive approach

Step 2: scale development

Design of the developmental study (sample,…)

Scale construction: based on EFA and CFA

Reliability assessment

Step 3: scale evaluation

Construct and criterion-related validity


Cools & Van den Broeck (2007): research design

Item generation: inductive and deductive approach

Pilot study (N = 15,616)

Three validation studies

Sample 1 (part of career decision survey): N = 5,924

Sample 2 (competence indicator tool): N = 1,580

Sample 3 (MBA students): N = 635


Sample 1 Sample 2 Sample 3

Scale development

Item analysis Yes Yes Yes

Factor analysis Yes Yes Yes

EFA N = 2,970 N = 763 N = 321

CFA N = 2,954 N = 817 N = 314

Scale evaluation

Construct validity No No Yes

KAI N = 66

REI N = 70

MBTI N = 296

SIMP N = 98

Academic performance N = 443

Criterion-related validity

Hierarchical level N = 5,885

Study/job function N = 2,013 N = 713 N = 233 / N = 446

Cools & Van den Broeck (2007): research design


Related to content validity: are the items a randomly chosen subset of

the universe of appropriate items? (De Vellis) – difficult to assess

given the lack of well-defined, objective criteria

Content validity consists essentially of judgement. Alone or with

others one judges the representativeness of the items. (Kerlinger &

Lee)

Scale development: item analysis


Diverse criteria are used: examples

Remove: items with extreme response values and low variability in

responses (Lawson, 2004)

Item-total correlation of > .55 and lack of significant correlation with

Social Desirability Scale (Tziner et al, 1996)

Remove: those items with low inter-item and item-total correlations

(Arnold et al, 2000)

Check item-scale correlation and cronbach alpha (effect on alpha if item

removed) (Scheier & Carver, 1985)

Cronbach alpha and average inter-item correlation between .20 and .40

(Bateson & Crant, 1993)

Standard deviations of more than .40 and reasonably high item-scale

correlation (Towler & Dipbloye, 2003)

Inter-item correlation average of .30 or better (Robinson et al., 1991)

Scale development: item analysis (cont.)


Cools & Van den Broeck (2007): item analysis

Checking mean, standard deviation, item-scale and item-total correlations, average inter-item correlations, Cronbach alpha coefficients (DeVellis, 1991)

Criteria:

Item-total correlation of more than 0.30

Standard deviation of more than 0.40

Average inter-item correlation of 0.30 or better

Reliability: Cronbach alpha of more than 0.70

(Towler & Dipbloye, 2003; Robinson et al., 1991)


Scale development: factor analysis

Diverse approaches and criteria are used: examples

Factor loading of > .50 (Becker & Bos, 1979; Tziner et al, 1996; …)

Factor loading of minimum .40 (and not loading on more than one factor (Lawson, 2004))

Factor loading of more than .40 and no cross-loadings higher than .30 (Towler& Dipbloye, 2003)

Diverse fit-measures (Rybowiak et al, 1999; Towler & Dipbloye, 2003; Judge et al, 2003)

1.0 eigenvalue criterion and scree plot procedure (Cattell, 1966)

Factor loading of .60 or greater and no secondary loading higher than .40 (Garrison & Pate, 1977)


Cools & Van den Broeck (2007): factor analysis

Two-stage approach

(Gerbing & Hamilton, 1996; Hurley et al., 1997)

Exploratory factor analysis:

Checking eigenvalue-greater-than-one, scree plot, factor loadings and percentage of explained variance

Criteria: primary factor loading of 0.40 and no secondary loadings of more than 0.30 (Towler & Dipbloye, 2003)

Confirmatory factor analysis:

Checking various fit indices, taking into account the large sample sizes (Hair et al., 1998; Kline, 1998; MacCallum & Austin, 2000)

Criteria: RMSR (< 0.05), RMSEA (< 0.08), NNFI and NFI (> 0.85)


Scale evaluation: construct validity

Considered to be the most important kind of validity, also most used

Most often checked through intercorrelations with other instruments

Other possibility: factor analysis with other questionnaires (Scheier& Carver, 1985)

To be valid, a test has to be related to conceptually similar measures (convergent validity) and unrelated to conceptually dissimilar constructs (discriminant validity) (MTMM: Campbell & Fiske, 1959)

Nomological network: describe the relationship with conceptually similar and dissimilar constructs (Cronbach & Meehl, 1955).


Cools & Van den Broeck (2007):

convergent and discriminant validity

Measures:

Kirton Adaption-Innovation Inventory (KAI)

(Kirton, 1976)

Rational-Experiential Inventory (REI)

(Pacini & Epstein, 1999)

Myers-Briggs Type Indicator (MBTI)

(Myers & Myers, 1998)

Single-Item Measures of Personality (SIMP)

(Woods & Hampson, 2005)

Academic performance

(Armstrong, 2000)


Cools & Van den Broeck (2007): hypotheses

Knowing style Planning style Creating style

Category 1: hypothesized as strongly related

KAI - - +

Rationality REI + + -

Sensing MBTI + + -

Intuiting MBTI - - +

Judging MBTI + + -

Perceiving MBTI - - +

Category 2: hypothesized as showing weaker and less significant correlations

Thinking MBTI + + -

Extraversion SIMP - - +

Introversion SIMP + + -

Agreeableness SIMP - - +

Conscientiousness SIMP + + -

Openness - - +

Category 3: hypothesized as independent of cognitive style

Experientiality REI – Feeling MBTI - Emotional stability SIMP - Academic performance


Scale evaluation: criterion-related validity

Most often used in pyschology or education, for example to analyse validity of certain types of tests or selection procedures

Less often used in organizational research (Price) and in socialsciences, as there is not always a criterion to evaluate the scale with

Depending on the purpose, the same correlation can be used todemonstrate construct and criterion-related validity

For example: link of cognitive style and academic performance at Vlerick or score on selection test


Cools & Van den Broeck (2007): criterion-related validity

Hierarchical level:

People with management function score significantly higher on knowing and creating style than clerical staff

No significant differences with professional employees

Job function:

People with financial function score significantly higher on knowing style than people with a function in sales and marketing and personnel

Financial employees score significantly lower on creating style than people in sales and marketing

Personnel employees score significantly lower on planning style than people in sales and marketing


Exchange of best practices

| 30-06-2010 | ELSIN conference|


Conclusion: some recommendations

Theory first!

Try to follow the steps that are recommended in scale developmentand validation as closely as possible

Carefully write up the different steps that you did, with whichsamples and why, how many items were kept/skipped, on whatbasis,…

Keep track of the choices that you made along the process, as thiswill help you to write up and justify them in a later stage

Look at example articles to help you in writing up your developmentand validation work – there is no consistency, which is an advantage and a disadvantage at the same time

Innovation research at Vlerick: The state of the art chaired by Walter Van Dyck and Marion Debruyne,

Monday 6 June 2011 from 12.30 – 14.00

Getting an A*: some lessons learned and experiences to share chaired by Jan Lepoutre and Katleen De

Stobbeleir, Friday 23 September 2011 from 12.30 – 14.00

Research Brown Bag on Entrepreneurship research at Vlerick chaired by Miguel Meuleman, Thursday 20

October 2011 from 12.30 – 14.00

Epistemological foundations of transdisciplinary research: The case of the management of innovation

chaired by Walter Van Dyck, Thursday 24 November 2011 from 12.30 – 14.00

Creating high-performing and risk aware organisations chaired by Regine Slagmulder and Maria Boicova,

Tuesday 13 December 2011 from 12.30 – 14.00

More information?

[email protected]

Upcoming Research Brown Bags

2011.05.16 scale development slides eva/media/corporate/pdf... · scale development in practice:...

Documents