© 2011 pearson prentice hall, salkind. measurement, reliability and validity

33
© 2011 Pearson Prentice Hall, Salkind. Measurement , Reliability and Validity

Upload: meredith-winfree

Post on 14-Dec-2015

268 views

Category:

Documents


14 download

TRANSCRIPT

© 2011 Pearson Prentice Hall, Salkind.

Measurement, Reliability and Validity

© 2011 Pearson Prentice Hall, Salkind.

Explain why measurement is important to the research process.

Discuss the four levels of measurement and provide an example of each.

Explain the concept of reliability in terms of observed score, true score, and error.

Describe the two elements that can make up an error score. List methods for increasing reliability. Discuss four ways in which reliability can be examined. Provide a conceptual definition of validity. List the three traditional types of validity. Explain the relationship between reliability and validity.

© 2011 Pearson Prentice Hall, Salkind.

The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very,

Very Important Validity The Relationship Between Reliability and

Validity Closing (and Very Important) Thoughts

© 2011 Pearson Prentice Hall, Salkind.

© 2011 Pearson Prentice Hall, Salkind.

Two definitions◦ Stevens—“assignment of numerals to objects or

events according to rules.”◦ “…the assignment of values to outcomes.”

Chapter foci◦ Levels of measurement◦ Reliability and validity

© 2011 Pearson Prentice Hall, Salkind.

Variables are measured at one of these four levels Qualities of one level are characteristic of the next level

up The more precise (higher) the level of measurement, the

more accurate is the measurement process

  

   

Level of Measurement

For Example Quality of Level

Ratio Rachael is 5’ 10” and Gregory is 5’ 5” Absolute zero

Interval Rachael is 5” taller than Gregory An inch is an inch is an inch

Ordinal Rachael is taller than Gregory Greater than

Nominal Rachael is tall and Gregory is short Different from

© 2011 Pearson Prentice Hall, Salkind.

Qualities Example What You Can Say

What You Can’t Say

Assignment of labels

Gender— (male or female)Preference— (like or dislike)Voting record— (for or against)

Each observation belongs in its own category

An observation represents “more” or “less” than another observation

© 2011 Pearson Prentice Hall, Salkind.

Qualities Example What You Can Say

What You Can’t Say

Assignment of values along some underlying dimension

Rank in collegeOrder of finishing a race

One observation is ranked above or below another.

The amount that one variable is more or less than another

© 2011 Pearson Prentice Hall, Salkind.

Qualities Example What You Can Say

What You Can’t Say

Equal distances between points

Number of words spelled correctlyIntelligence test scoresTemperature

One score differs from another on some measure that has equally appearing intervals

The amount of difference is an exact representation of differences of the variable being studied

© 2011 Pearson Prentice Hall, Salkind.

Qualities Example What You Can Say

What You Can’t Say

Meaningful and non-arbitrary zero

AgeWeightTime

One value is twice as much as another or no quantity of that variable can exist

Not much!

© 2011 Pearson Prentice Hall, Salkind.

Continuous variables◦ Values can range along a continuum◦ E.g., height

Discrete variables (categorical)◦ Values are defined by category boundaries◦ E.g., gender

© 2011 Pearson Prentice Hall, Salkind.

Measurement should be as precise as possible

In psychology, most variables are probably measured at the nominal or ordinal level

But—how a variable is measured can determine the level of precision

© 2011 Pearson Prentice Hall, Salkind.

© 2011 Pearson Prentice Hall, Salkind.

Reliability—tool is consistent Validity—tool measures “what-it-should” Good assessment tools

◦ Rejection of Null hypotheses OR

◦ Acceptance of Research hypotheses

© 2011 Pearson Prentice Hall, Salkind.

Method ErrorObserved Score = True Score + Error Score

Trait Error

© 2011 Pearson Prentice Hall, Salkind.

Observed score ◦ Score actually observed◦ Consists of two components

True Score Error Score

Method ErrorObserved Score = True Score + Error Score

Trait Error

© 2011 Pearson Prentice Hall, Salkind.

True score◦ Perfect reflection of true value for individual◦ Theoretical score

Method ErrorObserved Score = True Score + Error Score

Trait Error

© 2011 Pearson Prentice Hall, Salkind.

Error score ◦ Difference between observed and true score

Method ErrorObserved Score = True Score + Error Score

Trait Error

© 2011 Pearson Prentice Hall, Salkind.

Method error is due to characteristics of the test or testing situation

Trait error is due to individual characteristics Conceptually, reliability =

Reliability of the observed score becomes higher if error is reduced!!

Method ErrorObserved Score = True Score + Error Score

Trait Error

True ScoreTrue Score + Error Score

© 2011 Pearson Prentice Hall, Salkind.

Increase sample size Eliminate unclear questions Standardize testing conditions Moderate the degree of difficulty of the

tests Minimize the effects of external events Standardize instructions Maintain consistent scoring procedures

© 2011 Pearson Prentice Hall, Salkind.

Reliability is measured using a◦ Correlation coefficient◦ r test1•test2

Reliability coefficients◦ Indicate how scores on one test change relative to

scores on a second test◦ Can range from -1.0 to +1.0

+1.00 = perfect reliability 0.00 = no reliability

© 2011 Pearson Prentice Hall, Salkind.

Type of Reliability

What It Is How You Do It What the Reliability Coefficient Looks Like

Test-Retest A measure of stability

Administer the same test/measure at two different times to the same group of participants

rtest1•test1

Parallel Forms

A measure of equivalence

Administer two different forms of the same test to the same group of participants

rform1•form2

Inter-Rater A measure of agreement

Have two raters rate behaviors and then determine the amount of agreement between them

Percentage of agreements

Internal Consistency

A measure of how consistently each item measures the same underlying construct

Correlate performance on each item with overall performance across participants

Cronbach’s alpha

Kuder-Richardson

© 2011 Pearson Prentice Hall, Salkind.

© 2011 Pearson Prentice Hall, Salkind.

A valid test does what it was designed to do A valid test measures what it was designed

to measure

© 2011 Pearson Prentice Hall, Salkind.

Validity refers to the test’s results, not to the test itself

Validity ranges from low to high, it is not “either/or”

Validity must be interpreted within the testing context

© 2011 Pearson Prentice Hall, Salkind.

Type of Validity What Is It? How Do You Establish It?

Content A measure of how well the items represent the entire universe of items

Ask an expert if the items assess what you want them to

Criterion

Concurrent A measure of how well a test estimates a criterion

Select a criterion and correlate scores on the test with scores on the criterion in the present

Predictive A measure of how well a test predicts a criterion

Select a criterion and correlate scores on the test with scores on the criterion in the future

Construct A measure of how well a test assesses some underlying construct

Assess the underlying construct on which the test is based and correlate these scores with the test scores

© 2011 Pearson Prentice Hall, Salkind.

Correlate new test with an established test

Show that people with and without certain traits score differently

Determine whether tasks required on test are consistent with theory guiding test development

© 2011 Pearson Prentice Hall, Salkind.

Convergent validity—different methods yield similar results Discriminant validity—different methods yield different results

Method 1

Paper and Pencil

Method 2

Activity Level Monitor

Method 1

Paper and Pencil

Method 2

Activity Level Monitor

Trait 1

Method 1

Paper and Pencil Moderate Low

Impulsivity Method 2

Activity Level Monitor

Moderate

Trait 2

Method 1

Paper and Pencil

Activity Level

Method 2

Activity Level Monitor

Low

Trait 1

Impulsivity

Trait 2

Activity Level

© 2011 Pearson Prentice Hall, Salkind.

© 2011 Pearson Prentice Hall, Salkind.

A valid test must be reliableBut

A reliable test need not be valid

© 2011 Pearson Prentice Hall, Salkind.

© 2011 Pearson Prentice Hall, Salkind.

You must define a reliable and valid dependent variable or you will not know whether or not there truly is no difference between groups!

Use a test with established and acceptable levels of reliability and validity.

If you cannot do this, develop such a test for your thesis or dissertation (and do no more than that) OR change what you are measuring.

© 2011 Pearson Prentice Hall, Salkind.

Explain why measurement is important to the research process?

Discuss the four levels of measurement and provide an example of each?

Explain the concept of reliability in terms of observed score, true score, and error?

Describe the two elements that can make up an error score?

List methods for increasing reliability? Discuss four ways in which reliability can be examined? Provide a conceptual definition of validity? List the three traditional types of validity? Explain the relationship between reliability and validity?