scaling, validity, and reliability - … validity, and reliability. r. s. balkin, 2008 2 measuring a...

27
R. S. Balkin, 2008 1 Scaling, Validity, and Reliability

Upload: hoangthuy

Post on 29-Apr-2018

230 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 1

Scaling, Validity, and Reliability

Page 2: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 2

Measuring a Construct

Not every phenomena of interest can bemeasured through direct observation

Constructs are phenomena that existbut cannot be directly observed

Page 3: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 3

Measuring a Construct

Instruments can be created to measureconstructs

Researchers interested in a particularconstruct (e.g. substance abuse) orcharacteristic (e.g. sex) refer to thesephenomena as variables.

Page 4: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 4

Scaling

Nominal: classified and counted (ethnicity)

Ordinal: Ranked in order (as in a race)

Interval: Equal interval, no true zero

Ratio: Equal intervals, true zero, ratio relationship

Discrete(categorical)

Continuous

Page 5: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 5

Nominal Method of categorization No label is quantitatively higher or lower

than another label—it is simply a label For example:

African American Caucasian Asian Latino/Latina Native American

Page 6: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 6

Ordinal

Denotes an order or ranking You cannot make comparisons beyond

the knowledge of the order Both nominal and ordinal variables are

known as discrete variables

Page 7: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 7

Interval

Denotes equality between levels with notrue zero Fahrenheit temperature

Social science often uses quasi-intervalscales mood

Page 8: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 8

Ratio

Denotes equality between levels with atrue zero Number of words spelled correctly

Interval and ratio variables aresometimes referred to as continuousvariables

Page 9: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 9

Importance

The ability to analyze validity andreliability is the cornerstone toidentifying whether an experimentutilized proper instrumentation Proper procedure Achieved meaningful results

Page 10: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 10

So, who comes up with this stuff?

American Educational ResearchAssociation (AERA)

American Psychological Association(APA)

National Council on Measurement inEducation (NCME)

Standards for Educational andPsychological Testing

Page 11: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 11

1999 Standards “Validity refers to the degree to which

evidence and theory support theinterpretations of test scores entailed byproposed uses of tests. Validity is, therefore,the most fundamental consideration indeveloping and evaluating tests. The processof validation involves accumulating evidenceto provide a sound scientific basis for theproposed score interpretations” (AERA, APA,& NCME, 1999, p. 9).

Page 12: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 12

Evidence based on test content

Logical analyses and experts’ evaluations ofthe content of the measure Items Tasks Formats Wording Processes required of examinees

Addresses whether the content of a measurerepresents a specified content domain

Page 13: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 13

Evidence based on test content

Based on a review of the literature Expert opinion

Page 14: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 14

Evidence based on responseprocesses

Extent to which the tasks or types ofresponses required of the examinees fitthe intended, defined construct e.g. A self-report, paper-pencil inventory

measures an affective construct

Page 15: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 15

Evidence based on internalstructure Constructs are traits that cannot be observed.

They are hypothesized characteristics. Theories are developed related to specific

constructs Tests items load on specifics aspects of the

theory The extent to which the internal components

of a test match the defined construct Principle component analysis Confirmatory factor analysis

Page 16: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 16

Principle Component Analysis andFactor Analysis (Stat Soft, Inc)

Both are data reduction procedures They combine 2 or more correlated

variables in to a single factor But there are differences (we will not go

into the computational aspects)

Page 17: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 17

Evidence based on relations toother variables

Utilizes correlation studies to investigategroup comparisons Convergent and discriminant validity Concurrent and predictive validity

Page 18: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 18

Criterion related evidence

Evidence may be predictive The test is designed and used to predict

other variables (i.e. freshman gpa). ACT, SAT, GRE

This can be demonstrated via correlation orregression analyses

Page 19: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 19

Criterion related evidence

Evidence may be concurrent The test is related to other measure (i.e.

other tests, grades, ratings, etc). K-BIT is a shortened version of the WISC-III,

but has strong criterion related evidence

Often demonstrated through correlationalanalyses

Page 20: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 20

Criterion related evidence

Evidence may be concurrent When we demonstrate that a test

measures similar constructs, we say theevidence is convergent

When we demonstrate that a testmeasures different constructs, we say theevidence is discriminate

Page 21: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 21

The MTMM Assesses convergent and discriminate

evidence Convergent evidence is the degree to

which concepts that should be relatedtheoretically are interrelated in reality.

Discriminant evidence is the degree towhich concepts that should not be relatedtheoretically are, in fact, not interrelated inreality.

Page 22: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 22

Multi-trait method

Page 23: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 23

Evidence based on consequencesof testing

Some benefit should be realized fromthe intended use of scores Placement Effectiveness

Previously not considered as evidencefor validity

Few guidelines on how to measure

Page 24: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 24

Reliability

The extent to which a test measures aconstruct accurately and consistently

Substantiated by high correlations andreduced error

Page 25: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 25

Reliability

Various methods may be utilized toproduce strong correlations: Stability over time (test-retest) Parallel forms Internal consistency

Split half Cronbach’s Coefficient alpha

Inter-rater reliability

Page 26: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 26

Computing Coefficient Alpha (# of items/(# of items - 1)) * ((Variance of

the total score - Sum of the Variances of theitems)/Variance of the total score)

!!

"

#

$$

%

& '(

( 2

22

1 tot

tot

s

ss

n

n i

Page 27: Scaling, Validity, and Reliability - … Validity, and Reliability. R. S. Balkin, 2008 2 Measuring a Construct ... Scaling, Reliability, and Validity.ppt Author: Rick Balkin Created

R. S. Balkin, 2008 27

Types of Instruments Achievement Aptitude Personality Projective Observations

Questionnaire Interest, Attitude,

Values Likert Scale

Interview