scaling, validity, and reliability - … validity, and reliability. r. s. balkin, 2008 2 measuring a...

R. S. Balkin, 2008 1

Scaling, Validity, and Reliability


Measuring a Construct

Not every phenomena of interest can bemeasured through direct observation

Constructs are phenomena that existbut cannot be directly observed


Measuring a Construct

Instruments can be created to measureconstructs

Researchers interested in a particularconstruct (e.g. substance abuse) orcharacteristic (e.g. sex) refer to thesephenomena as variables.


Scaling

Nominal: classified and counted (ethnicity)

Ordinal: Ranked in order (as in a race)

Interval: Equal interval, no true zero

Ratio: Equal intervals, true zero, ratio relationship

Discrete(categorical)

Continuous


Nominal Method of categorization No label is quantitatively higher or lower

than another label—it is simply a label For example:

African American Caucasian Asian Latino/Latina Native American


Ordinal

Denotes an order or ranking You cannot make comparisons beyond

the knowledge of the order Both nominal and ordinal variables are

known as discrete variables


Interval

Denotes equality between levels with notrue zero Fahrenheit temperature

Social science often uses quasi-intervalscales mood


Ratio

Denotes equality between levels with atrue zero Number of words spelled correctly

Interval and ratio variables aresometimes referred to as continuousvariables


Importance

The ability to analyze validity andreliability is the cornerstone toidentifying whether an experimentutilized proper instrumentation Proper procedure Achieved meaningful results

R. S. Balkin, 2008 10

So, who comes up with this stuff?

American Educational ResearchAssociation (AERA)

American Psychological Association(APA)

National Council on Measurement inEducation (NCME)

Standards for Educational andPsychological Testing

R. S. Balkin, 2008 11

1999 Standards “Validity refers to the degree to which

evidence and theory support theinterpretations of test scores entailed byproposed uses of tests. Validity is, therefore,the most fundamental consideration indeveloping and evaluating tests. The processof validation involves accumulating evidenceto provide a sound scientific basis for theproposed score interpretations” (AERA, APA,& NCME, 1999, p. 9).

R. S. Balkin, 2008 12

Evidence based on test content

Logical analyses and experts’ evaluations ofthe content of the measure Items Tasks Formats Wording Processes required of examinees

Addresses whether the content of a measurerepresents a specified content domain

R. S. Balkin, 2008 13

Evidence based on test content

Based on a review of the literature Expert opinion

R. S. Balkin, 2008 14

Evidence based on responseprocesses

Extent to which the tasks or types ofresponses required of the examinees fitthe intended, defined construct e.g. A self-report, paper-pencil inventory

measures an affective construct

R. S. Balkin, 2008 15

Evidence based on internalstructure Constructs are traits that cannot be observed.

They are hypothesized characteristics. Theories are developed related to specific

constructs Tests items load on specifics aspects of the

theory The extent to which the internal components

of a test match the defined construct Principle component analysis Confirmatory factor analysis

R. S. Balkin, 2008 16

Principle Component Analysis andFactor Analysis (Stat Soft, Inc)

Both are data reduction procedures They combine 2 or more correlated

variables in to a single factor But there are differences (we will not go

into the computational aspects)

R. S. Balkin, 2008 17

Evidence based on relations toother variables

Utilizes correlation studies to investigategroup comparisons Convergent and discriminant validity Concurrent and predictive validity

R. S. Balkin, 2008 18

Criterion related evidence

Evidence may be predictive The test is designed and used to predict

other variables (i.e. freshman gpa). ACT, SAT, GRE

This can be demonstrated via correlation orregression analyses

R. S. Balkin, 2008 19


Evidence may be concurrent The test is related to other measure (i.e.

other tests, grades, ratings, etc). K-BIT is a shortened version of the WISC-III,

but has strong criterion related evidence

Often demonstrated through correlationalanalyses

R. S. Balkin, 2008 20


Evidence may be concurrent When we demonstrate that a test

measures similar constructs, we say theevidence is convergent

When we demonstrate that a testmeasures different constructs, we say theevidence is discriminate

R. S. Balkin, 2008 21

The MTMM Assesses convergent and discriminate

evidence Convergent evidence is the degree to

which concepts that should be relatedtheoretically are interrelated in reality.

Discriminant evidence is the degree towhich concepts that should not be relatedtheoretically are, in fact, not interrelated inreality.

R. S. Balkin, 2008 22

Multi-trait method

R. S. Balkin, 2008 23

Evidence based on consequencesof testing

Some benefit should be realized fromthe intended use of scores Placement Effectiveness

Previously not considered as evidencefor validity

Few guidelines on how to measure

R. S. Balkin, 2008 24

Reliability

The extent to which a test measures aconstruct accurately and consistently

Substantiated by high correlations andreduced error

R. S. Balkin, 2008 25

Reliability

Various methods may be utilized toproduce strong correlations: Stability over time (test-retest) Parallel forms Internal consistency

Split half Cronbach’s Coefficient alpha

Inter-rater reliability

R. S. Balkin, 2008 26

Computing Coefficient Alpha (# of items/(# of items - 1)) * ((Variance of

the total score - Sum of the Variances of theitems)/Variance of the total score)

!!

"

#

$$

%

& '(

( 2

22

1 tot

tot

s

ss

n

n i

R. S. Balkin, 2008 27

Types of Instruments Achievement Aptitude Personality Projective Observations

Questionnaire Interest, Attitude,

Values Likert Scale

Interview

scaling, validity, and reliability - … validity, and reliability. r. s. balkin, 2008 2 measuring a...

Documents