research methods knowledge base

92
Quality is one of the most important issues in research. We introduce the idea of validity to refer to the quality of various conclusions you might reach based on a research project. Here's where I've got to give you the pitch about validity. When I mention validity, most students roll their eyes, curl up into a fetal position or go to sleep. They think validity is just something abstract and philosophical (and I guess it is at some level). But I think if you can understand validity -- the principles that we use to judge the quality of research -- you'll be able to do much more than just complete a research project. You'll be able to be a virtuoso at research, because you'll have an understanding of why we need to do certain things in order to assure quality. You won't just be plugging in standard procedures you learned in school -- sampling method X, measurement tool Y -- you'll be able to help create the next generation of research technology. Enough for now -- more on this later. Introduction to Validity Validity: the best available approximation to the truth of a given proposition, inference, or conclusion The first thing we have to ask is: "validity of what?" When we think about validity in research, most of us think about research components. We might say that a measure is a valid one, or that a valid sample was drawn, or that the design had strong validity. But all of those statements are technically incorrect. Measures, samples and designs don't 'have' validity -- only propositions can be said to be valid. Technically, we should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference or conclusion that can 'have' validity. Page 1 of 92

Upload: xshan-nnx

Post on 16-Aug-2015

242 views

Category:

Documents


2 download

DESCRIPTION

Research Methods Knowledge Base_Book_Validity

TRANSCRIPT

Quality is one of the most important issues in research. We introduce the idea of validity to refer to the quality of various conclusions you might reach based on a research project.Here's where I've got to give you the pitch about validity. When I mention validity, most students roll their eyes, curl up into a fetal position or go to sleep. hey thin! validity is just something abstract and philosophical "and I guess it is at some level#. $ut I thin! if you can understand validity %% the principles that we use to judge the quality of research %%you'll be able to do much more than just complete a research project. &ou'll be able to be a virtuoso at research, because you'll have an understanding of why we need to do certain things in order to assure quality. &ou won't just be plugging in standard procedures you learned in school %% sampling method ', measurement tool & %% you'll be able to help create the ne(t generation of research technology. )nough for now %% more on this later. Introduction to ValidityValidity:the best available approximation to the truth of agiven proposition,inference, or conclusion he first thing we have to as! is* +validity of what,+ When we thin! about validity in research, most of us thin! about research components. We might say that a measure is a valid one, or that a valid sample was drawn, or that the design had strong validity. $ut all of those statements are technically incorrect. -easures, samples and designs don't 'have' validity %% only propositions can be said to be valid. echnically, we should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. Itis a proposition, inference or conclusion that can 'have' validity.We ma!e lots of different inferences or conclusions while conducting research. -any of these are related to the process of doing research and are not the major hypotheses of the study. .evertheless, li!e the bric!s that go into building a wall, these intermediate process and methodological propositions provide the foundation for the substantive conclusions that we wish to address. /or instance, virtually all social research involves measurement or observation. 0nd, whenever we measure or observe we are concerned with whether we are measuring what we intend to measure or with how our observations are influenced by the circumstances in which they are made. We reach conclusions about the quality of our measures %% conclusions that will play an important role in addressing the broader substantive issues of our study. When we tal! about the validity of research, we are often referring to these to the many conclusions we reach about the quality of different parts of our research methodology.We subdivide validity into four types. )ach type addresses a specific methodological question. In order to understand the types of validity, you have to !now something about how we investigate a research question. $ecause all four validity types are really only operative when studying causal questions, we will use a causal study to set the conte(t.1age 2 of 34he figure shows that there are really two realms that are involved in research. he first, on the top, is the land of theory. It is what goes on inside our heads as researchers. It is were we !eep our theories about how the world operates. he second, on the bottom, is the land of observations. It is the real world into which we translate our ideas %% our programs, treatments, measures and observations. When we conduct research, we are continually flitting bac! and forth between these two realms, between what we thin! about the world and what is going on in it. When we are investigating a cause%effect relationship, we have a theory "implicit or otherwise# of what the cause is "the cause construct#. /or instance, if we are testing a new educational program, we have an idea of what it would loo! li!e ideally. 5imilarly, on the effect side, we have an idea of what we are ideally trying to affect and measure "the effect construct#. $ut each of these, the cause and the effect, has to be translated into real things, into a program or treatment and a measure or observational method. We use the term operationalization to describe the act of translating a construct into its manifestation. In effect, we ta!e our idea and describe it as a series of operations or procedures. .ow, instead of it only being an idea inour minds, it becomes a public entity that anyone can loo! at and e(amine for themselves. It is one thing, for instance, for you to say that you would li!e to measure self%esteem "a construct#. $ut when you show a ten%item paper%and%pencil self%esteem measure that you developed for that purpose, others can loo! at it and understand more clearly what you intend by the term self%esteem.1age 4 of 34.ow, bac! to e(plaining the four validity types. hey build on one another, with two of them "conclusion and internal# referring to the land of observation on the bottom of the figure, one of them "construct# emphasi6ing the lin!ages between the bottom and the top, and the last "e(ternal# being primarily concerned about the range of our theory on the top.Imagine that we wish to e(amine whether use of a World Wide Web "WWW# 7irtual 8lassroom improves student understanding of course material. 0ssume that we too! thesetwo constructs, the cause construct "the WWW site# and the effect "understanding#, and operationali6ed them %% turned them into realities by constructing the WWW site and a measure of !nowledge of the course material. Here are the four validity types and the question each addresses*8onclusion 7alidity* In this study, is there a relationship between the two variables,In the conte(t of the e(ample we're considering, the question might be worded* in this study, is there a relationship between the WWW site and !nowledge of course material, here are several conclusions or inferences we might draw to answer such a question. Wecould, for e(ample, conclude that there is a relationship. We might conclude that there is a positive relationship. We might infer that there is no relationship. We can assess the conclusion validity of each of these conclusions or inferences.Internal 7alidity* Assuming that there is a relationship in this study, is the relationship a causal one,9ust because we find that use of the WWW site and !nowledge are correlated, we can't necessarily assume that WWW site use causes the !nowledge. $oth could, for e(ample, be caused by the same factor. /or instance, it may be that wealthier students who have greater resources would be more li!ely to use have access to a WWW site and would e(cel on objective tests. When we want to ma!e a claim that our program or treatment caused the outcomes in our study, we can consider the internal validity of our causal claim.8onstruct 7alidity* Assuming that there is a causal relationship inthis study, can we claim that the program reflected well our construct of the program and that our measure reflected well our idea of the construct of the measure, In simpler terms, did we implement the program we intended to implement and did we measure the outcome we wanted to measure, In yet other terms, did we operationali6e well the ideas of the cause and the effect, When our research is over, we would li!e to be able to conclude that we did a credible job of operationali6ing our constructs %% we can assess the construct validity of this conclusion.)(ternal 7alidity* Assuming that there is a causal relationship in this study between the constructs of the cause and the effect, can 1age : of 34we generalize this effect to other persons, places or times,We are li!ely to ma!e some claims that our research findings have implications forother groups and individuals in other settings and at other times. When we do, we can e(amine the e(ternal validity of these claims..otice how the question that each validity type addresses presupposes an affirmative answer to the previous one. his is what we mean when we say that the validity types build on one another. he figure shows the idea of cumulativeness as a staircase, along with the !ey question for each validity type./or any inference or conclusion, there are always possible threats to validity %% reasons the conclusion or inference might be wrong. Ideally, one tries to reduce the plausibility ofthe most li!ely threats to validity, thereby leaving as most plausible the conclusion reached in the study. /or instance, imagine a study e(amining whether there is a relationship between the amount of training in a specific technology and subsequent ratesof use of that technology. $ecause the interest is in a relationship, it is considered an issueof conclusion validity. 0ssume that the study is completed and no significant correlation between amount of training and adoption rates is found. ;n this basis it is concluded that there is no relationship between the two. How could this conclusion be wrong %% that is, what are the +threats to validity+, /or one, it's possible that there isn't sufficient statisticalpower to detect a relationship even if it e(ists. 1erhaps the sample si6e is too small or the measure of amount of training is unreliable. ;r maybe assumptions of the correlational test are violated given the variables used. 1erhaps there were random irrelevancies in the study setting or random heterogeneity in the respondents that increased the variability in the data and made it harder to see the relationship of interest. he inference that there is no relationship will be stronger %% have greater conclusion validity %% if one can show that these alternative e(planations are not credible. he distributions might be e(amined to see if they conform with assumptions of the statistical test, or analyses conducted to determine whether there is sufficient statistical power.he theory of validity, and the many lists of specific threats, provide a useful scheme for assessing the quality of research conclusions. he theory is general in scope and applicability, well%articulated in its philosophical suppositions, and virtually impossible to e(plain adequately in a few minutes. 0s a framewor! for judging the quality of evaluations it is indispensable and well worth understanding. 1age < of 34External Validity)(ternal validity is related to generali6ing. hat's the major thing you need to !eep in mind. =ecall that validity refers to the appro(imate truthof propositions, inferences, or conclusions. 5o, external validity refers to the appro(imate truth of conclusions the involve generali6ations. 1ut in more pedestrian terms, e(ternal validity is the degree to which the conclusions in your study would hold for other persons in other places and at other times.In science there are two major approaches to how we provide evidence for a generali6ation. I'll call the first approach the Sampling Model. In the sampling model, you start by identifying the population you would li!e to generali6e to. hen, you draw a fair sample from that population and conduct your research with the sample. /inally, because the sample is representative of the population, you can automatically generali6e your results bac! to the population. here are several problems with this approach. /irst, perhaps you don't !now at the time of your study who you might ultimately li!e to generali6e to. 5econd, you may not be easily able to draw a fair or representative sample. hird, it's impossible to sample across all times that you might li!e to generali6e to "li!e ne(t year#.1age > of 34I'll call the second approach togenerali6ing the Proximal Similarity Model. '1ro(imal' means 'nearby' and 'similarity'means... well, it means 'similarity'. he term proximalsimilarity was suggested by ?onald . 8ampbell as an appropriate relabeling of the term external validity "although he was the first to admit that it probably wouldn't catch on@#. Ander this model, we begin by thin!ing about different generali6ability conte(ts and developing a theory about which conte(ts are moreli!e our study and which are less so. /or instance, we might imagine several settings that have people who are more similar to the people in our study or people who are less similar. his also holds for times and places. When we place different conte(ts in terms of their relative similarities, we can call this implicit theoretical a gradient of similarity. ;nce we have developed this pro(imal similarity framewor!, we are able to generali6e. How, We conclude that we can generali6e the results of our study to other persons, placesor times that are more li!e "that is, more pro(imally similar# to our study. .otice that here, we can never generali6e with certainty %% it is always a question of more or less similar. Threats to External Validity0 threat to e(ternal validity is an e(planation of how you might be wrong in ma!ing a generali6ation. /or instance, you conclude that the results of your study "which was done in a specific place, with certain types of people, and at a specific time# can be generali6edto another conte(t "for instance, another place, with slightly different people, at a slightly later time#. here are three major threats to e(ternal validity because there are three ways you could be wrong %% people, places or times. &our critics could come along, for e(ample, and argue that the results of your study are due to the unusual type of people who were in the study. ;r, they could argue that it might only wor! because of the unusual place you did the study in "perhaps you did your educational study in a college town with lots of high%achieving educationally%oriented !ids#. ;r, they might suggest thatyou did your study in a peculiar time. /or instance, if you did your smo!ing cessation study the wee! after the 5urgeon Beneral issues the well%publici6ed results of the latest smo!ing and cancer studies, you might get different results than if you had done it the wee! before.Improving External ValidityHow can we improve e(ternal validity, ;ne way, based on the sampling model, suggests that you do a good job of drawing a sample from a population. /or instance, you should use random selection, if possible, rather than a nonrandom procedure. 0nd, once selected,1age 3 of 34you should try to assure that the respondents participate in your study and that you !eep your dropout rates low. 0 second approach would be to use the theory of pro(imal similarity more effectively. How, 1erhaps you could do a better job of describing the ways your conte(ts and others differ, providing lots of data about the degree of similarity between various groups of people, places, and even times. &ou might even be able to mapout the degree of pro(imal similarity among various conte(ts with a methodology li!e concept mapping. 1erhaps the best approach to criticisms of generali6ations is simply to show them that they're wrong %% do your study in a variety of places, with different peopleand at different times. hat is, your e(ternal validity "ability to generali6e# will be stronger the more you replicate your study. Construct Validity8onstruct validity refers to the degree to which inferences can legitimately be made from the operationali6ations in your study to the theoretical constructs on which those operationali6ations were based. Ci!e e(ternal validity, construct validity is related to generali6ing. $ut, where e(ternal validity involves generali6ing from your study conte(t to other people, places or times, construct validity involves generali6ing from your program or measures to the concept of your program or measures. &ou might thin! of construct validity as a +labeling+ issue. When you implement a program that you call a +Head 5tart+ program, is your label an accurate one, When you measure what you term +self esteem+ is that what you were really measuring,I would li!e to tell two major stories here. he first is the more straightforward one. I'll discuss several ways of thin!ing about the idea of construct validity, several metaphors that might provide you with a foundation in the richness of this idea. hen, I'll discuss themajor construct validity threats, the !inds of arguments your critics are li!ely to raise when you ma!e a claim that your program or measure is valid. In most research methods te(ts, construct validity is presented in the section on measurement. 0nd, it is typically presented as one of many different types of validity "e.g., face validity, predictive validity, concurrent validity# that you might want to be sure your measures have. I don't see it that way at all. I see construct validity as the overarching quality with all of the other measurement validity labels falling beneath it. 0nd, I don't see construct validity as limited only to measurement. 0s I've already implied, I thin! it is as much a part of the independent variable %% the program or treatment %% as it is the dependent variable. 5o, I'll try to ma!e some sense of the various measurement validity types and try to move you to thin! instead of the validity of any operationali6ation as falling within the general category of construct validity, with a variety of subcategories and subtypes.he second story I want to tell is more historical in nature. ?uring World War II, the A.5. government involved hundreds "and perhaps thousands# of psychologists and psychology graduate students in the development of a wide array of measures that were relevant to the war effort. hey needed personality screening tests for prospective fighter pilots, personnel measures that would enable sensible assignment of people to job s!ills, psychophysical measures to test reaction times, and so on. 0fter the war, these 1age D of 34psychologists needed to find gainful employment outside of the military conte(t, and it's not surprising that many of them moved into testing and measurement in a civilian conte(t. ?uring the early 2E>Fs, the 0merican 1sychological 0ssociation began to become increasingly concerned with the quality or validity of all of the new measures that were being generated and decided to convene an effort to set standards for psychological measures. he first formal articulation of the idea of construct validity came from this effort and was couched under the somewhat grandiose idea of the nomological networ!. he nomological networ! provided a theoretical basis for the idea of construct validity, but it didn't provide practicing researchers with a way to actually establish whether their measures had construct validity. In 2E>E, an attempt was made to develop a method for assessing construct validity using what is called a multitrait%multimethod matri(, or --- for short. In order to argue that your measures had construct validity under the --- approach, you had to demonstrate that there was both convergent and discriminant validity in your measures. &ou demonstrated convergent validity when you showed that measures that are theoretically supposed to be highly interrelated are, in practice, highly interrelated. 0nd, you showed discriminant validity when you demonstrated that measures that shouldn't be related to each other in fact were not. While the --- did provide a methodology for assessing construct validity, it was a difficult one to implement well, especially in applied social research conte(ts and, in fact, has seldom been formally attempted. When we e(amine carefully the thin!ing about construct validity that underlies both the nomological networ! and the ---, one of the !ey themes we can identify in both is the idea of +pattern.+ When we claim that our programs or measures have construct validity, we are essentially claiming that we as researchers understand how our constructs or theories of the programs and measures operate in theory and we claim that we can provide evidence that they behave in practice the way we thin! they should. he researcher essentially has a theory of how the programs and measures related to each other "and other theoretical terms#, a theoretical pattern if you will. 0nd, the researcher provides evidence through observationthat the programs or measures actually behave that way in reality, an observed pattern. When we claim construct validity, we're essentially claiming that our observed pattern %% how things operate in reality %% corresponds with our theoretical pattern %% how we thin! the world wor!s. I call this process pattern matching, and I believe that it is the heart of construct validity. It is clearly an underlying theme in both the nomological networ! and the --- ideas. 0nd, I thin! that we can develop concrete and feasible methods that enable practicing researchers to assess pattern matches %% to assess the construct validity of their research. he section on pattern matching lays out my idea of how we might use this approach to assess construct validity. Measurement Validity Typeshere's an awful lot of confusion in the methodological literature that stems from the wide variety of labels that are used to describe the validity of measures. I want to ma!e two cases here. /irst, it's dumb to limit our scope only to the validity of measures. We really want to tal! about the validity of any operationali6ation. hat is, any time you translate a concept or construct into a functioning and operating reality "the 1age G of 34operationalization#, you need to be concerned about how well you did the translation. his issue is as relevant when we are tal!ing about treatments or programs as it is when we are tal!ing about measures. "In fact, come to thin! of it, we could also thin! of sampling in this way. he population of interest in your study is the +construct+ and the sample is your operationali6ation. If we thin! of it this way, we are essentially tal!ing about the construct validity of the sampling@#. 5econd, I want to use the term construct validity to refer to the general case of translating any construct into an operationali6ation.Cet's use all of the other validity terms to reflect different ways you can demonstrate different aspects of construct validity.With all that in mind, here's a list of the validity types that are typically mentioned in te(ts and research papers when tal!ing about the quality of measurement* Construct validityo Translation validity /ace validity 8ontent validity o Criterion-related validity 1redictive validity 8oncurrent validity 8onvergent validity ?iscriminant validity I have to warn you here that I made this list up. I've never heard of +translation+ validity before, but I needed a good name to summari6e what both face and content validity are getting at, and that one seemed sensible. 0ll of the other labels are commonly !nown, butthe way I've organi6ed them is different than I've seen elsewhere.Cet's see if we can ma!e some sense out of this list. /irst, as mentioned above, I would li!e to use the term construct validity to be the overarching category. Construct validity is the appro(imate truth of the conclusion that your operationali6ation accurately reflects its construct. 0ll of the other terms address this general issue in different ways. 5econd, I ma!e a distinction between two broad types* translation validity and criterion%related validity. hat's because I thin! these correspond to the two major ways you can assureHassess the validity of an operationali6ation. In translation validity, you focus on whether the operationali6ation is a good reflection of the construct. his approach is definitional in nature %% it assumes you have a good detailed definition of the construct and that you can chec! the operationali6ation against it. In criterion-related validity, you e(amine whether the operationali6ation behaves the way it should given your theory of the construct. his is a more relational approach to construct validity. it assumes that your operationali6ation should function in predictable ways in relation to other operationali6ations based upon your theory of the construct. "If all this seems a bit dense, hang in there until you've gone through the discussion below %% then come bac! and re%read this paragraph#. Cet's go through the specific validity types.1age E of 34Translation ValidityI just made this one up today@ "5ee how easy it is to be a methodologist,# I needed a term that described what both face and content validity are getting at. In essence, both of those validity types are attempting to assess the degree to which you accurately translated your construct into the operationali6ation, and hence the choice of name. Cet's loo! at the two types of translation validity. Face ValidityIn face validity, you loo! at the operationali6ation and see whether +on itsface+ it seems li!e a good translation of the construct. his is probably the wea!est way to try to demonstrate construct validity. /or instance, you might loo! at a measure of math ability, read through the questions, and decide that yep, it seems li!e this is a good measure of math ability "i.e., the label +math ability+ seems appropriate for this measure#. ;r, you mightobserve a teenage pregnancy prevention program and conclude that, +&ep, this is indeed a teenage pregnancy prevention program.+ ;f course, if this is all you do to assess face validity, it would clearly be wea! evidence because it is essentially a subjective judgment call. ".ote that just because it is wea! evidence doesn't mean that it is wrong. We need to rely on our subjective judgment throughout the research process. It's just that this form of judgment won't be very convincing to others.# We can improve thequality of face validity assessment considerably by ma!ing it more systematic. /or instance, if you are trying to assess the face validity of a math ability measure, it would be more convincing if you sent the test to a carefully selected sample of e(perts on math ability testing and they all reported bac! with the judgment that your measure appears to be a good measure of math ability. Content ValidityIn content validity, you essentially chec! the operationali6ation against the relevant content domain for the construct. his approach assumes that you have a good detailed description of the content domain, something that's not always true. /or instance, we might lay out all of the criteria that should be met in a program that claims to be a +teenage pregnancy prevention program.+ We would probably include in this domain specification the definition of the target group, criteria for deciding whether the program is preventive in nature "as opposed to treatment%oriented#, and lots of criteria that spell out the content that should be included li!e basic information on pregnancy, the use of abstinence, birth control methods, and so on. hen, armed with these criteria, we could use them as a type of chec!list when e(amining our program. ;nly programs that meet the criteria can legitimately be defined as +teenage pregnancy prevention programs.+ his all sounds fairly straightforward, and for many1age 2F of 34operationali6ations it will be. $ut for other constructs "e.g., self%esteem, intelligence#, it will not be easy to decide on the criteria that constitute the content domain.Criterion-elated ValidityIn criteria-related validity, you chec! the performance of your operationali6ation against some criterion. How is this different from content validity, In content validity, thecriteria are the construct definition itself %% it is a direct comparison. In criterion%related validity, we usually ma!e a prediction about how the operationali6ation will perform based on our theory of the construct. he differences among the different criterion%relatedvalidity types is in the criteria they use as the standard for judgment. Predictive ValidityIn predictive validity, we assess the operationali6ation's ability to predict something it should theoretically be able to predict. /or instance, we might theori6e that a measure of math ability should be able to predict how well a person will do in an engineering%based profession. We could give our measure to e(perienced engineers and see if there is a high correlation between scores on the measure and their salaries as engineers. 0 high correlation would provide evidence for predictive validity %% it would show that our measure can correctly predict something that we theoretically thin! it should be able to predict.Concurrent ValidityIn concurrent validity, we assess the operationali6ation's ability to distinguish between groups that it should theoretically be able to distinguish between. /or e(ample, if we come up with a way of assessing manic%depression, our measure should be able to distinguish between people who are diagnosed manic%depression and those diagnosed paranoidschi6ophrenic. If we want to assess the concurrent validity of a new measure of empowerment, we might give the measure to both migrant farm wor!ers and to the farm owners, theori6ing that our measure should show that the farm owners are higher in empowerment. 0s in any discriminating test, the results are more powerful if you are able to show that you can discriminate between two groups that are very similar.Convergent ValidityIn convergent validity, we e(amine the degree to which the operationali6ation is similar to "converges on# other operationali6ations that it theoretically should be similar to. /or instance, to show the convergent validity of a Head 5tart program, we might gather evidence that shows that the program is similar to other Head 5tart programs. ;r, to 1age 22 of 34show the convergent validity of a test of arithmetic s!ills, we might correlate the scores on our test with scores on other tests that purport to measure basic math ability, where high correlations would be evidence of convergent validity.Discriminant ValidityIn discriminant validity, we e(amine the degree to which the operationali6ation is not similar to "diverges from# other operationali6ations that it theoretically should be not be similar to. /or instance, to show the discriminant validity of a Head 5tart program, we might gather evidence that shows that the program is not similar to other early childhood programs that don't label themselves as Head 5tart programs. ;r, to show the discriminant validity of a test of arithmetic s!ills, we might correlate the scores on our test with scores on tests that of verbal ability, where low correlations would be evidence of discriminant validity.Idea of Construct Validity8onstruct validity refers to the degree to which inferences can legitimately be made from the operationali6ations in your study to the theoretical constructs on which those operationali6ations were based. I find that it helps me to divide the issues into two broad territories that I call the +land of theory+ and the +land of observation.+ he land of theoryis what goes on inside your mind, and your attempt to e(plain or articulate this to others. It is all of the ideas, theories, hunches and hypotheses that you have about the world. In the land of theory you will find your idea of the program or treatment as it should be. &ouwill find the idea or construct of the outcomes or measures that you believe you are tryingto affect. he land of observation consists of what you see happening in the world aroundyou and the public manifestations of that world. In the land of observation you will find your actual program or treatment, and your actual measures or observational procedures. 1resumably, you have constructed the land of observation based on your theories. &ou developed the program to reflect the !ind of program you had in mind. &ou created the measures to get at what you wanted to get at.1age 24 of 348onstruct validity is an assessment of how well you translated your ideas or theories into actual programs or measures. Why is this important, $ecause when you thin! about the world or tal! about it with others "land of theory# you are using words that represent concepts. If you tell someone that a special type of math tutoring will help their child do better in math, you are communicating at the level of concepts or constructs. &ou aren't describing in operational detail the specific things that the tutor will do with their child. &ou aren't describing the specific questions that will be on the math test that their child will do better on. &ou are tal!ing in general terms, using constructs. If you based your recommendation on research that showed that the special type of tutoring improved children' math scores, you would want to be sure that the type of tutoring you are referring to is the same as what that study implemented and that the type of outcome you're saying should occur was the type they measured in their study. ;therwise, you would be mislabeling or misrepresenting the research. In this sense, construct validity canbe viewed as a +truth in labeling+ !ind of issue.here really are two broad ways of loo!ing at the idea of construct validity. I'll call the first the +definitionalist+ perspective because it essentially holds that the way to assure construct validity is to define the construct so precisely that you can operationali6e it in a straightforward manner. In a definitionalist view, you have either operationali6ed the construct correctly or you haven't %% it's an eitherHor type of thin!ing. )ither this program is a +ype 0 utoring 1rogram+ or it isn't. )ither you're measuring self esteem or you aren't.he other perspective I'd call +relationalist.+ o a relationalist, things are not eitherHor or blac!%and%white %% concepts are more or less related to each other. he meaning of terms 1age 2: of 34or constructs differs relatively, not absolutely. he program inyour study might be a +ype 0 utoring 1rogram+ in some ways, while in others it is not. It might be more that type of program than another program. &our measure might be capturing a lot of the construct of self esteem, but it may not capture all of it. here may be another measure that is closer to the construct of self esteem than yours is. =elationalism suggests that meaning changes gradually. It rejects the idea that we can rely on operational definitions as the basis for construct definition.o get a clearer idea of this distinction, you might thin! about how the law approaches the construct of +truth.+ -ost of you have heard the standard oath that a witness in a A.5. court is e(pected to swear. hey are to tell +the truth, the whole truth and nothing but the truth.+ What does this mean, If we only had them swear to tell the truth, they might choose to interpret that as +ma!e sure that what you say is true.+ $ut that wouldn't guarantee that they would tell everything they !new to be true. hey might leave some important things out. hey would still be telling the truth. hey just wouldn't be telling everything. ;n the other hand, they are as!ed to tell +nothing but the truth.+ his suggeststhat we can say simply that 5tatement ' is true and 5tatement & is not true. .ow, let's see how this oath translates into a measurement and construct validity conte(t. /or instance, we might want our measure to reflect +the construct, the whole construct, and nothing but the construct.+ What does this mean, Cet's assume that we have five distinct concepts that are all conceptually related to each other %% self esteem, self worth, self disclosure, self confidence, and openness. -ost people would say that these conceptsare similar, although they can be distinguished from each other. If we were trying to develop a measure of self esteem, what would it mean to measure +self esteem, all of self esteem, and nothing but self esteem,+ If the concept of self esteem overlaps with the others, how could we possibly measure all of it "that would presumably include the part that overlaps with others# and nothing but it, We couldn't@ If you believe that meaning is relational in nature %% that some concepts are +closer+ in meaning than others %% then the legal model discussed here does not wor! well as a model for construct validity. In fact, we will see that most social research methodologists have "whether they've thought about it or not@# rejected the definitionalist perspective in favor of a relationalist one. In order to establish construct validity you have to meet the following conditions* &ou have to set the construct you want to operationali6e "e.g., self esteem# within a semantic net "or +net of meaning+#. his means that you have to tell us what your construct is more or less similar to in meaning. &ou need to be able to provide direct evidence that you control the operationali6ation of the construct %% that your operationali6ations loo! li!e what they should theoretically loo! li!e. If you are trying to measure self esteem, you have to be able to e(plain why you operationali6ed the questions the way you did.1age 2< of 34If all of your questions are addition problems, how can you argue that your measure reflects self esteem and not adding ability, &ou have to provide evidence that your data support your theoretical vie! of the relations among constructs. If you believe that self esteem is closer in meaningto self worth than it is to an(iety, you should be able to show that measures of selfesteem are more highly correlated with measures of self worth than with ones of an(iety. Convergent " #iscriminant Validity8onvergent and discriminant validity are both considered subcategories or subtypes of construct validity. he important thing to recogni6e is that they wor! together %% if you can demonstrate that you have evidence for both convergent and discriminant validity, then you've by definition demonstrated that you have evidence for construct validity. $ut,neither one alone is sufficient for establishing construct validity.I find it easiest to thin! about convergent and discriminant validity as two inter%loc!ing propositions. In simple words I would describe what they are doing as follows*measures of constructs that theoretically should $e related to eachother are% in fact% o$served to $e related to each other &that is% youshould $e a$le to sho! a correspondence or convergence $et!eensimilar constructs'andmeasures of constructs that theoretically should not $e related toeach other are% in fact% o$served to not $e related to each other &thatis% you should $e a$le to discriminate $et!een dissimilar constructs'o estimate the degree to which any two measures are related to each other we typically use the correlation coefficient. hat is, we loo! at the patterns of intercorrelations among our measures. 8orrelations between theoretically similar measures should be +high+ whilecorrelations between theoretically dissimilar measures should be +low+.he main problem that I have with this convergent%discrimination idea has to do with my use of the quotations around the terms +high+ and +low+ in the sentence above. he question is simple %% how +high+ do correlations need to be to provide evidence for convergence and how +low+ do they need to be to provide evidence for discrimination, 0nd the answer is %% we don't !now@ In general we want convergent correlations to be as high as possible and discriminant ones to be as low as possible, but there is no hard and fast rule. Well, let's not let that stop us. ;ne thing that we can say is that the convergent correlations should always be higher than the discriminant ones. 0t least that helps a bit.1age 2> of 34$efore we get too deep into the idea of convergence and discrimination, let's ta!e a loo! at each one using a simple e(ample.Convergent Validityo establish convergent validity, you need to show that measures that should be related are in reality related. In the figure below, we see four measures "each is an item on a scale# that all purport to reflect the construct of self esteem. /or instance, Item 2 might bethe statement +I feel good about myself+ rated using a 2%to%> Ci!ert%type response format.We theori6e that all four items reflect the idea of self esteem "this is why I labeled the top part of the figure Theory#. ;n the bottom part of the figure "Observation# we see the intercorrelations of the four scale items. his might be based on giving our scale out to a sample of respondents. &ou should readily see that the item intercorrelations for all item pairings are very high "remember that correlations range from %2.FF to I2.FF#. his provides evidence that our theory that all four items are related to the same construct is supported..otice, however, that while the high intercorrelations demonstrate the the four items are probably related to the same construct, that doesn't automatically mean that the construct is self esteem. -aybe there's some other construct that all four items are related to "more 1age 23 of 34about this later#. $ut, at the very least, we can assume from the pattern of correlations that the four items are converging on the same thing, whatever we might call it.#iscriminant Validityo establish discriminant validity, you need to show that measures that should not be related are in reality not related. In the figure below, we again see four measures "each is an item on a scale#. Here, however, two of the items are thought to reflect the construct ofself esteem while the other two are thought to reflect locus of control. he top part of the figure shows our theoretically e(pected relationships among the four items. If we have discriminant validity, the relationship between measures from different constructs should be very low "again, we don't !now how low +low+ should be, but we'll deal with that later#. here are four correlations between measures that reflect different constructs, and these are shown on the bottom of the figure ";bservation#. &ou should see immediately that these four cross%construct correlations are very low "i.e., near 6ero# and certainly much lower than the convergent correlations in the previous figure.0s above, just because we've provided evidence that the two sets of two measures each seem to be related to different constructs "because their intercorrelations are so low# doesn't mean that the constructs they're related to are self esteem and locus of control. $ut the correlations do provide evidence that the two sets of measures are discriminated from each other.Putting It (ll Together;J, so where does this leave us, I've shown how we go about providing evidence for convergent and discriminant validity separately. $ut as I said at the outset, in order to 1age 2D of 34argue for construct validity we really need to be able to show that both of these types of validity are supported. Biven the above, you should be able to see that we could put both principles together into a single analysis to e(amine both at the same time. his is illustrated in the figure below.he figure shows si( measures, three that are theoretically related to the construct of self esteem and three that are thought to be related to locus of control. he top part of the figure shows this theoretical arrangement. he bottom of the figure shows what a correlation matri( based on a pilot sample might show. o understand this table, you needto first be able to identify the convergent correlations and the discriminant ones. here are two sets or bloc!s of convergent coefficients "in green#, one :(: bloc! for the self esteem intercorrelations and one :(: bloc! for the locus of control correlations. here arealso two :(: bloc!s of discriminant coefficients "shown in red#, although if you're really sharp you'll recogni6e that they are the same values in mirror image "?o you !now why, &ou might want to read up on correlations to refresh your memory#.How do we ma!e sense of the patterns of correlations, =emember that I said above that we don't have any firm rules for how high or low the correlations need to be to provide evidence for either type of validity. $ut we do !now that the convergent correlations should always be higher than the discriminant ones. ta!e a good loo! at the table and you will see that in this e(ample the convergent correlations are always higher than the discriminant ones. I would conclude from this that the correlation matri( provides evidence for both convergent and discriminant validity, all in one analysis@1age 2G of 34$ut while the pattern supports discriminant and convergent validity, does it show that the three self esteem measures actually measure self esteem or that the three locus of control measures actually measure locus of control. ;f course not. hat would be much too easy.5o, what good is this analysis, It does show that, as you predicted, the three self esteem measures seem to reflect the same construct "whatever that might be#, the three locus of control measures also seem to reflect the same construct "again, whatever that is# and thatthe two sets of measures seem to be reflecting two different constructs "whatever they are#. hat's not bad for one simple analysis.;J, so how do we get to the really interesting question, How do we show that our measures are actually measuring self esteem or locus of control, I hate to disappoint you, but there is no simple answer to that "I bet you !new that was coming#. here's a number of things we can do to address that question. /irst, we can use other ways to address construct validity to help provide further evidence that we're measuring what we say we're measuring. /or instance, we might use a face validity or content validity approach to demonstrate that the measures reflect the constructs we say they are "see the discussionon types of construct validity for more information#.;ne of the most powerful approaches is to include even more constructs and measures. he more comple( our theoretical model "if we find confirmation of the correct pattern inthe correlations#, the more we are providing evidence that we !now what we're tal!ing about "theoretically spea!ing#. ;f course, it's also harder to get all the correlations to giveyou the e(act right pattern as you add lots more measures. 0nd, in many studies we simply don't have the lu(ury to go adding more and more measures because it's too costlyor demanding. ?espite the impracticality, if we can afford to do it, adding more constructs and measures will enhance our ability to assess construct validity using approaches li!e the multitrait%multimethod matri( and the nomological networ!.1erhaps the most interesting approach to getting at construct validity involves the idea of pattern matching. Instead of viewing convergent and discriminant validity as differences of kind, pattern matching views them as differences in degree. his seems a more reasonable idea, and helps us avoid the problem of how high or low correlations need to be to say that we've established convergence or discrimination. K 1revious Home .e(t LThreats to Construct Validity$efore we launch into a discussion of the most common threats to construct validity, let's recall what a threat to validity is. In a research study you are li!ely to reach a conclusion that your program was a good operationali6ation of what you wanted and that your measures reflected what you wanted them to reflect. Would you be correct, How will yoube critici6ed if you ma!e these types of claims, How might you strengthen your claims. 1age 2E of 34he !inds of questions and issues your critics will raise are what I mean by threats to construct validity.I ta!e the list of threats from the discussion in 8oo! and 8ampbell "8oo!, .?. and 8ampbell, ?.. Quasi%)(perimentation* ?esign and 0nalysis Issues for /ield 5ettings. Houghton -ifflin, $oston, 2EDE#. While I love their discussion, I do find some of their terminology less than straightforward %% a lot of what I'll do here is try to e(plain this stuff in terms that the rest of us might hope to understand. Inade)uate Preoperational Explication of Constructshis one isn't nearly as ponderous as it sounds. Here, preoperational means before translating constructs into measures or treatments, and explication means explanation %% in other words, you didn't do a good enough job of defining "operationally# what you mean by the construct. How is this a threat, Imagine that your program consisted of a new type of approach to rehabilitation. &our critic comes along and claims that, in fact, your program is neither new nor a true rehabilitation program. &ou are being accused of doing a poor job of thin!ing through your constructs. 5ome possible solutions* thin! through your concepts better use methods "e.g., concept mapping# to articulate your concepts get e(perts to critique your operationali6ations Mono-*peration +ias-ono%operation bias pertains to the independent variable, cause, program or treatment in your study %% it does not pertain to measures or outcomes "see -ono%method $ias below#.If you only use a single version of a program in a single place at a single point in time, you may not be capturing the full breadth of the concept of the program. )very operationali6ation is flawed relative to the construct on which it is based. If you conclude that your program reflects the construct of the program, your critics are li!ely to argue that the results of your study only reflect the peculiar version of the program that you implemented, and not the actual construct you had in mind. 5olution* try to implement multiple versions of your program. Mono-Method +ias-ono%method bias refers to your measures or observations, not to your programs or causes. ;therwise, it's essentially the same issue as mono%operation bias. With only a single version of a self esteem measure, you can't provide much evidence that you're really measuring self esteem. &our critics will suggest that you aren't measuring self esteem %% that you're only measuring part of it, for instance. 5olution* try to implement multiple measures of !ey constructs and try to demonstrate "perhaps through a pilot or side study# that the measures you use behave as you theoretically e(pect them to. 1age 4F of 34Interaction of #ifferent Treatments&ou give a new program designed to encourage high%ris! teenage girls to go to school and not become pregnant. he results of your study show that the girls in your treatment group have higher school attendance and lower birth rates. &ou're feeling pretty good about your program until your critics point out that the targeted at%ris! treatment group inyour study is also li!ely to be involved simultaneously in several other programs designed to have similar effects. 8an you really label the program effect as a consequenceof your program, he +real+ program that the girls received may actually be the combination of the separate programs they participated in. Interaction of Testing and Treatment?oes testing or measurement itself ma!e the groups more sensitive or receptive to the treatment, If it does, then the testing is in effect a part of the treatment, it's inseparable from the effect of the treatment. his is a labeling issue "and, hence, a concern of construct validity# because you want to use the label +program+ to refer to the program alone, but in fact it includes the testing. estricted ,eneraliza$ility (cross Constructshis is what I li!e to refer to as the +unintended consequences+ treat to construct validity. &ou do a study and conclude that reatment ' is effective. In fact, reatment ' does cause a reduction in symptoms, but what you failed to anticipate was the drastic negative consequences of the side effects of the treatment. When you say that reatment ' is effective, you have defined +effective+ as only the directly targeted symptom. his threat reminds us that we have to be careful about whether our observed effects "reatment ' is effective# would generali6e to other potential outcomes. Confounding Constructs and -evels of ConstructsImagine a study to test the effect of a new drug treatment for cancer. 0 fi(ed dose of the drug is given to a randomly assigned treatment group and a placebo to the other group. .o treatment effects are detected. 1erhaps the result that's observed is only true for that dosage level. 5light increases or decreases of the dosage may radically change the results.In this conte(t, it is not +fair+ for you to use the label for the drug as a description for your treatment because you only loo!ed at a narrow range of dose. Ci!e the other construct validity threats, this is essentially a labeling issue %% your label is not a good description for what you implemented. The .Social. Threats to Construct ValidityI've set aside the other major threats to construct validity because they all stem from the social and human nature of the research endeavor. 1age 42 of 34/ypothesis ,uessing-ost people don't just participate passively in a research project. hey are trying to figureout what the study is about. hey are +guessing+ at what the real purpose of the study is. 0nd, they are li!ely to base their behavior on what they guess, not just on your treatment.In an educational study conducted in a classroom, students might guess that the !ey dependent variable has to do with class participation levels. If they increase their participation not because of your program but because they thin! that's what you're studying, then you cannot label the outcome as an effect of the program. It is this labelingissue that ma!es this a construct validity threat. Evaluation (pprehension-any people are an(ious about being evaluated. 5ome are even phobic about testing and measurement situations. If their apprehension ma!es them perform poorly "and not your program conditions# then you certainly can't label that as a treatment effect. 0nother formof evaluation apprehension concerns the human tendency to want to +loo! good+ or +loo!smart+ and so on. If, in their desire to loo! good, participants perform better "and not as a result of your program@# then you would be wrong to label this as a treatment effect. In both cases, the apprehension becomes confounded with the treatment itself and you have to be careful about how you label the outcomes. Experimenter Expectancieshese days, where we engage in lots of non%laboratory applied social research, we generally don't use the term +e(perimenter+ to describe the person in charge of the research. 5o, let's relabel this threat +researcher e(pectancies.+ he researcher can bias the results of a study in countless ways, both consciously or unconsciously. 5ometimes the researcher can communicate what the desired outcome for a study might be "and participant desire to +loo! good+ leads them to react that way#. /or instance, the researcher might loo! pleased when participants give a desired answer. If this is what causes the response, it would be wrong to label the response as a treatment effect. The 0omological 0et!or12hat is the 0omological 0et3he nomological net!or1 is an idea that was developed by Cee 8ronbach and 1aul -eehl in 2E>> "8ronbach, C. and -eehl, 1. "2E>>#. 8onstruct validity in psychological tests, Psychological Bulletin, >4, E by 8ampbell and /is!e "8ampbell, ?. and /is!e, ?. "2E>E#. 8onvergent and discriminantvalidation by the multitrait%multimethod matri(. >3, 4, G2%2F>.# in part as an attempt to provide a practical methodology that researchers could actually use "as opposed to the nomological networ! idea which was theoretically useful but did not include a methodology#. 0long with the ---, 8ampbell and /is!e introduced two new types of validity %% convergent and discriminant %% as subcategories of construct validity. Convergent validity is the degree to which concepts that should be related theoretically are interrelated in reality. #iscriminant validity is the degree to which concepts that should not be related theoretically are, in fact, not interrelated in reality. &ou can assess both convergent and discriminant validity using the ---. In order to be able to claim that your measures have construct validity, you have to demonstrate both convergence and discrimination.1age 4< of 34he --- is simply a matri( or table of correlations arranged to facilitate the interpretation of the assessment of construct validity. he --- assumes that you measure each of several concepts "called traits by 8ampbell and /is!e# by each of severalmethods "e.g., a paper%and%pencil test, a direct observation, a performance measure#. he --- is a very restrictive methodology %% ideally you should measure each concept by each method.o construct an ---, you need to arrange the correlation matri( by concepts within methods. he figure shows an --- for three concepts "traits 0, $ and 8# each of which is measured with three different methods "2, 4 and :# .ote that you lay the matri( out in bloc!s by method. )ssentially, the --- is just a correlation matri( between your measures, with one e(ception %% instead of 2's along the diagonal "as in the typical correlation matri(# we substitute an estimate of the reliability of each measure as the diagonal.$efore you can interpret an ---, you have to understand how to identify the differentparts of the matri(. /irst, you should note that the matri( is consists of nothing but correlations. It is a square, symmetric matri(, so we only need to loo! at half of it "the figure shows the lower triangle#. 5econd, these correlations can be grouped into three !inds of shapes* diagonals, triangles, and bloc!s. he specific shapes are* The elia$ility #iagonal &monotrait-monomethod' 1age 4> of 34)stimates of the reliability of each measure in the matri(. &ou can estimatereliabilities a number of different ways "e.g., test%retest, internal consistency#. here are as many correlations in the reliability diagonal as there are measures %% in this e(ample there are nine measures and nine reliabilities. he first reliability in the e(ample is the correlation of rait 0, -ethod 2 with rait 0, -ethod 2 "hereafter, I'll abbreviate this relationship 02%02#. .otice that this is essentially the correlation of the measure with itself. In fact such a correlation would always be perfect "i.e., rM2.F#. Instead, we substitute an estimate of reliability. &ou could also consider these values to be monotrait%monomethod correlations. The Validity #iagonals &monotrait-heteromethod' 8orrelations between measures of the same trait measured using different methods. 5ince the --- is organi6ed into method bloc!s, there is one validity diagonal in each method bloc!. /or e(ample, loo! at the 02%04 correlation of .>D. his is the correlation between two measures of the same trait "0# measured with two different measures "2 and 4#. $ecause the two measures are of the same trait or concept, we would e(pect them to be strongly correlated. &ou could also consider these values to be monotrait%heteromethod correlations. The /eterotrait-Monomethod Triangles hese are the correlations among measures that share the same method of measurement. /or instance, 02%$2 M .>2 in the upper left heterotrait%monomethod triangle. .ote that what these correlations share is method, not trait or concept. If these correlations are high, it is because measuring different things with the same method results in correlated measures. ;r, in more straightforward terms, you've got a strong +methods+ factor. /eterotrait-/eteromethod Triangles hese are correlations that differ in both trait and method. /or instance, 02%$4 is .44 in the e(ample. Benerally, because these correlations share neither trait nor method we e(pect them to be the lowest in the matri(. The Monomethod +loc1s hese consist of all of the correlations that share the same method of measurement. here are as many bloc!s as there are methods of measurement. The /eteromethod +loc1s 1age 43 of 34hese consist of all correlations that do not share the same methods. hereare "J"J%2##H4 such bloc!s, where J M the number of methods. In the e(ample, there are : methods and so there are ":":%2##H4 M ":"4##H4 M 3H4 M: such bloc!s.Principles of Interpretation.ow that you can identify the different parts of the ---, you can begin to understand the rules for interpreting it. &ou should reali6e that --- interpretation requires the researcher to use judgment. )ven though some of the principles may be violated in an ---, you may still wind up concluding that you have fairly strong construct validity. In other words, you won't necessarily get perfect adherence to these principles in applied research settings, even when you do have evidence to support construct validity. o me, interpreting an --- is a lot li!e a physician's reading of an (%ray. 0 practiced eye can often spot things that the neophyte misses@ 0 researcher who is e(perienced with ---can use it identify wea!nesses in measurement as well as for assessing construct validity.o help ma!e the principles more concrete, let's ma!e the e(ample a bit more realistic. We'll imagine that we are going to conduct a study of si(th grade students and that we want to measure three traits or concepts* 5elf )steem "5)#, 5elf ?isclosure "5?# and Cocus of 8ontrol "C8#. /urthermore, let's measure each of these three different ways* a 1aper%and%1encil "1N1# measure, a eacher rating, and a 1arent rating. he results are arrayed in the ---. 0s the principles are presented, try to identify the appropriate coefficients in the --- and ma!e a judgement yourself about the strength of constructvalidity claims.he basic principles or rules for the --- are* 8oefficients in the reliability diagonal should consistently be the highest in the matri(. 1age 4D of 34hat is, a trait should be more highly correlated with itself than with anything else@ his is uniformly true in our e(ample. 8oefficients in the validity diagonals should be significantly different from 6ero and high enough to warrant further investigation. his is essentially evidence of convergent validity. 0ll of the correlations in our e(ample meet this criterion. 0 validity coefficient should be higher than values lying in its column and row in the same heteromethod bloc!. In other words, "5) 1N1#%"5) eacher# should be greater than "5) 1N1#%"5? eacher#, "5) 1N1#%"C8 eacher#, "5) eacher#%"5? 1N1# and "5) eacher#%"C8 1N1#. his is true in all cases in our e(ample. 0 validity coefficient should be higher than all coefficients in the heterotrait%monomethod triangles. his essentially emphasi6es that trait factors should be stronger than methods factors. .ote that this is not true in all cases in our e(ample. /or instance, the "C8 1N1#%"C8 eacher# correlation of .E. here are several reasons. /irst, in its purest form, --- requires that you have a fully%crossed measurement design %% each of several traits is measured by each of several methods. While 8ampbell and /is!e e(plicitly recogni6ed that one could have an incomplete design, they stressed the importance of multiple replication of the same trait 1age 4G of 34across method. In some applied research conte(ts, it just isn't possible to measure all traits with all desired methods "would you use an +observation+ of weight,#. In most applied social research, it just wasn't feasible to ma!e methods an e(plicit part of the research design. 5econd, the judgmental nature of the --- may have wor!ed against its wider adoption "although it should actuallybe perceived as a strength#. many researchers wanted a test for construct validity that wouldresult in a single statistical coefficient that could be tested %% the equivalent of a reliability coefficient. It was impossible with --- to quantify the degree of construct validity in a study. /inally, the judgmental nature of --- meant that different researchers could legitimately arrive at different conclusions.( Modified MTMM -- -eaving out the Methods4actor0s mentioned above, one of the most difficultaspects of --- from an implementation point of view is that it required a design that included all combinations of both traits and methods. $ut the ideas of convergent and discriminant validity do not require the methods factor. o see this, we have to reconsiderwhat 8ampbell and /is!e meant by convergent and discriminant validity. 2hat is convergent validity3It is the principle that measures of theoretically similar constructs should be highly intercorrelated. We can e(tend this idea further by thin!ing of a measure that has multiple items, for instance, a four%item scale designed to measure self%esteem. If each of the items actually does reflect the construct of self%esteem, then we would e(pect the items to be highly intercorrelated as shown in the figure. hese strong intercorrelations are evidence in support of convergent validity. (nd !hat is discriminant validity3 It is the principle that measures of theoretically different constructs should not correlate highly with each other. We can see that in the e(ample that shows two constructs %% self%esteem and locus of control %% each measured in two instruments. We would e(pect that, because these are measures of different constructs, the cross%construct correlations would be low, as shown in the figure. hese low correlations are evidence for validity. /inally, we can put this all together to see how we can address both convergent and discriminant 1age 4E of 34validity simultaneously. Here, we have two constructs %% self%esteem and locus of control %% each measured with three instruments. he red and green correlations are within%construct ones. hey are a reflection of convergent validity and should be strong. he blue correlations are cross%construct and reflect discriminant validity. hey should be uniformly lower than the convergent coefficients.he important thing to notice about this matri( is that it does not explicitly include a methods factor as a true --- would. he matri( e(amines both convergent and discriminant validity "li!e the ---# but it only e(plicitly loo!s at construct intra% and interrelationships. We can see in this e(ample that the --- idea really had two major themes. he first was the idea of loo!ing simultaneously at the pattern of convergence and discrimination. his idea is similar in purpose to the notions implicit in the nomological networ! %% we are loo!ing at the pattern of interrelationships based upon our theory of the nomological net. he second idea in --- was the emphasis on methods as a potential confounding factor.While methods may confound the results, they won't necessarily do so in any given study.0nd, while we need to e(amine our results for the potential for methods factors, it may bethat combining this desire to assess the confound with the need to assess construct validity is more than one methodology can feasibly handle. 1erhaps if we split the two agendas, we will find that the possibility that we can e(amine convergent and discriminant validity is greater. $ut what do we do about methods factors, ;ne way to deal with them is through replication of research projects, rather than trying to incorporate a methods test into a single research study. hus, if we find a particular outcome in a study using several measures, we might see if that same outcome is obtained when we replicate the study using different measures and methods of 1age :F of 34measurement for the same constructs. he methods issue is considered more as an issue of generali6ability "across measurement methods# rather than one of construct validity.When viewed this way, we have moved from the idea of a --- to that of the multitrait matri( that enables us to e(amine convergent and discriminant validity, and hence construct validity. We will see that when we move away from the e(plicit consideration of methods and when we begin to see convergence and discrimination as differences of degree, we essentially have the foundation for the pattern matching approach to assessing construct validity. Pattern Matching for Construct Validityhe idea of using pattern matching as a rubric for assessing construct validity is an area where I have tried to ma!e a contribution "rochim, W., "2EG>#. 1attern matching, validity, and conceptuali6ation in program evaluation. )valuation =eview, E, >, >D>%3F< and rochim, W. "2EGE#. ;utcome pattern matching and program theory. )valuation and 1rogram 1lanning, 24, :>>%:33.#, although my wor! was very clearly foreshadowed, especially in much of ?onald . 8ampbell's writings. Here, I'll try to e(plain what I meanby pattern matching with respect to construct validity.The Theory of Pattern Matching0 pattern is any arrangement of objects or entities. he term +arrangement+ is used here to indicate that a pattern is by definition non%random and at least potentially describable. 0ll theories imply some pattern, but theories and patterns are not the same thing. In general, a theory postulates structural relationships between !ey constructs. he theory can be used as the basis for generating patterns of predictions. /or instance, )M-84 can be considered a theoretical formulation. 0 pattern of e(pectations can be developed from this formula by generating predicted values for one of these variables given fi(ed values of the others. .ot all theories are stated in mathematical form, especially in applied socialresearch, but all theories provide information that enables the generation of patterns of predictions.1age :2 of 341attern matching always involves an attempt to lin! two patterns where one is a theoretical pattern and the other is an observed or operational one. he top part of the figure shows the realm of theory. he theory might originate from a formal tradition of theori6ing, might be the ideas or +hunches+ of the investigator, or might arisefrom some combination of these. he conceptuali6ation tas! involves the translation of these ideas into a specifiable theoretical pattern indicated by the top shape in the figure. he bottom part of the figure indicates the realm of observation. his is broadly meant to include direct observation in the form of impressions, field notes, and the li!e, as well as more formal objective measures. he collection or organi6ation of relevant operationali6ations "i.e., relevant to the theoretical pattern# is termed the observational pattern and is indicated by the lower shape in the figure. he inferential tas! involves the attempt to relate, lin! or match these two patterns as indicated by the double arrow in the center of the figure. o the e(tent that the patterns match, one can conclude that the theory and any other theorieswhich might predict the same observed pattern receive support.It is important to demonstrate that there are no plausible alternative theories that account for the observed pattern and this tas! is made much easier when the theoretical pattern of interest is a unique one. In effect, a more comple( theoretical pattern is li!e a unique fingerprint which one is see!ing in the observed pattern. With more comple( theoretical patterns it is usually more difficult to construe sensible alternative patterns that would also predict the same result. o the e(tent that theoretical and observed patterns do not match, the theory may be incorrect or poorly formulated, the observations may be inappropriate or inaccurate, or some combination of both states may e(ist.0ll research employs pattern matching principles, although this is seldom done consciously. In the traditional two%group e(perimental conte(t, for instance, the typical theoretical outcome pattern is the hypothesis that there will be a significant difference between treated and untreated groups. he observed outcome pattern might consist of the averages for the two groups on one or more measures. he pattern match is accomplishedby a test of significance such as the t%test or 0.;70. In survey research, pattern matching forms the basis of generali6ations across different concepts or population subgroups. In qualitative research pattern matching lies at the heart of any attempt to conduct thematic analyses.1age :4 of 34While current research methods can be described in pattern matching terms, the idea of pattern matching implies more, and suggests how one might improve on these current methods. 5pecifically, pattern matching implies that more complex patterns, if matched, yield greater validity for the theory. 1attern matching does not differ fundamentally from traditional hypothesis testing and model building approaches. 0 theoretical pattern is a hypothesis about what is e(pected in the data. he observed pattern consists of the data that are used to e(amine the theoretical model. he major differences between pattern matching and more traditional hypothesis testing approaches are that pattern matching encourages the use of more comple( or detailed hypotheses and treats the observations from a multivariate rather than a univariate perspective.Pattern Matching and Construct ValidityWhile pattern matching can be used to address a variety of questions in social research, the emphasis here is on its use in assessing construct validity. he accompanying figure shows the pattern matching structure for an e(ample involving five measurement constructs %% arithmetic, algebra, geometry, spelling, and reading. In this e(ample, we'll use concept mapping to develop the theoretical pattern among these constructs. In the concept mapping we generate a large set of potential arithmetic, algebra, geometry, spelling, and reading questions. We sort them into piles of similar questions and develop a map that shows each question in relation to the others. ;n the 1age :: of 34map, questions that are more similar are closer to each other, those less similar are more distant. /rom the map, we can find the straight%line distances between all pair of points "i.e., all questions#. his is the matri( of interpoint distances. We might use the questions from the map in constructing our measurement instrument, or we might sample from these questions. ;n the observed side, we have one or more test instruments that contain a number of questions about arithmetic, algebra, geometry, spelling, and reading. We analy6e the data and construct a matri( of inter%item correlations.What we want to do is compare the matri( of interpoint distances from our concept map "i.e., the theoretical pattern# with the correlation matri( of the questions "i.e., the observed pattern#. How do we achieve this, Cet's assume that we had 2FF prospective questions on our concept map, 4F for each construct. 8orrespondingly, we have 2FF questions on our measurement instrument, 4F in each area. hus, both matrices are 2FF(2FF in si6e. $ecause both matrices are symmetric, we actually have ".".%2##H4 M "2FF"EE##H4 M EEFFH4 M F unique pairs "e(cluding the diagonal#. If we +string out+ thevalues in each matri( we can construct a vector or column of F numbers for each matri(. he first number is the value comparing pair "2,4#, the ne(t is "2,:# and so on to ".%2, .# or "EE, 2FF#. .ow, we can compute the overall correlation between these two columns, which is the correlation between our theoretical and observed patterns, the +pattern matching correlation.+ In this e(ample, let's assume it is %.E:. Why would it be a negative correlation, $ecause we are correlating distances on the map with the similarities in the correlations and we e(pect that greater distance on the map should be associated with lower correlation and less distance with greater correlation.he pattern matching correlation is our overall estimate of the degree of construct validity in this e(ample because it estimates the degree to which the operational measuresreflect our theoretical e(pectations.1age :< of 34(dvantages and #isadvantages of Pattern Matchinghere are several disadvantages of the pattern matching approach to construct validity. he most obvious is that pattern matching requires that you specify your theory of the constructs rather precisely. his is typically not done in applied social research, at least not to the level of specificity implied here. $ut perhaps it should be done. 1erhaps the more restrictive assumption is that you are able to structure the theoretical and observed patterns the same way so that you can directly correlate them. We needed to quantify bothpatterns and, ultimately, describe them in matrices that had the same dimensions. In most research as it is currently done it will be relatively easy to construct a matri( of the inter%item correlations. $ut we seldom currently use methods li!e concept mapping that enableus to estimate theoretical patterns that can be lin!ed with observed ones. 0gain, perhaps we ought to do this more frequently.here are a number of advantages of the pattern matching approach, especially relative tothe multitrait%multimethod matri( "---#. /irst, it is more general and flexible than ---. It does not require that you measure each construct with multiple methods. 5econd, it treats convergence and discrimination as a continuum. 8oncepts are more or less similar and so their interrelations would be more or less convergent or discriminant. his moves the convergentHdiscriminant distinction away from the simplistic dichotomous categorical notion to one that is more suitably post%positivist and continuousin nature. hird, the pattern matching approach does ma!e it possible to estimate the overall construct validity for a set of measures in a specific conte(t. .otice that we don't estimate construct validity for a single measure. hat's because construct validity, li!e 1age :> of 34discrimination, is always a relative metric. 9ust as we can only as! whether you have distinguished something if there is something to distinguish it from, we can only assess construct validity in terms of a theoretical semantic or nomological net, the conceptual conte(t within which it resides. he pattern matching correlation tells us, for our particular study, whether there is a demonstrable relationship between how we theoretically e(pect our measures will interrelate and how they do in practice. /inally, because pattern matching requires a morespecific theoretical pattern than we typically articulate, it requires us to specify what we thin! about the constructs in our studies. 5ocial research has long been critici6ed for conceptual sloppiness, for re%pac!aging old constructs in new terminology and failing to develop an evolution of research around !ey theoretical constructs. 1erhaps the emphasis on theory articulation in pattern matching would encourage us to be more careful about the conceptual underpinnings of our empirical wor!. 0nd, after all, isn't that what construct validity is all about, elia$ility=eliability has to do with the quality of measurement. In its everyday sense, reliability is the +consistency+ or +repeatability+ of your measures. $efore we can define reliability precisely we have to lay the groundwor!. /irst, you have to learn about the foundation of reliability, the true score theory of measurement. 0long with that, you need to understand the different types of measurement error because errors in measures play a !ey role in degrading reliability. With this foundation, you can consider the basic theory of reliability, including a precise definition of reliability. here you will find out that we cannot calculate reliability %% we can only estimate it. $ecause of this, there a variety of different types of reliability that each have multiple ways to estimate reliability for that type. In the end, it's important to integrate the idea of reliability with the other major criteria for the quality of measurement %% validity %% and develop an understanding of the relationships between reliability and validity in measurement. True Score TheoryTrue Score Theory is a theory about measurement. Ci!e all theories, you need to recogni6e that it is not proven %% it is postulated as a model of how the world operates. Ci!e many very powerful model, the true score theory is a very simple one. )ssentially, true score theory maintains that every measurement is an additive composite of two components* true a$ility "or the true level# of the respondent on that measureO and random error. We observe the measurement %% the score on the test, the total for a self%esteem instrument, the scale value for a person's weight. We don't observe what's on the right side of the equation "only Bod !nows what those values are@#, we assume that there are two components to the right side.1age :3 of 34he simple equation of 5 6 T 7 e5 has a parallel equation at the level of the variance or variability of a measure. hat is, across a set of scores, we assume that*var&5' 6 var&T' 7 var&e5'In more human terms this means that the variability of your measure is the sum of the variability due to true score and the variability due to random error. his will have important implications when we consider some of the more advanced models for adjusting for errors in measurement.Why is true score theory important, /or one thing, it is a simple yet powerful model for measurement. It reminds us that most measurement has an error component. 5econd, true score theory is the foundation of reliability theory. 0 measure that has no random error "i.e., is all true score# is perfectly reliableO a measure that has no true score "i.e., is all random error# has 6ero reliability. hird, true score theory can be used in computer simulations as the basis for generating +observed+ scores with certain !nown properties.&ou should !now that the true score model is not the only measurement model available. measurement theorists continue to come up with more and more comple( models that they thin! represent reality even better. $ut these models are complicated enough that they lie outside the boundaries of this document. In any event, true score theory should give you an idea of why measurement models are important at all and how they can be used as the basis for defining !ey research ideas. Measurement Errorhe true score theory is a good simple model for measurement, but it may not always be an accurate reflection of reality. In particular, it assumes that any observation is composed of the true value plus some random error value. $ut is that reasonable, What ifall error is not random, Isn't it possible that some errors are systematic, that they hold across most or all of the members of a group, ;ne way to deal with this notion is to revise the simple true score model by dividing the error component into two subcomponents, random error and systematic error. here, we'll loo! at the differences between these two types of errors and try to diagnose their effects on our research.1age :D of 342hat is andom Error3=andom error is caused by any factors that randomly affect measurement of the variable across the sample. /or instance, each person's mood can inflate or deflate their performance on any occasion. In a particular testing, some children may be feeling in a good mood and others may be depressed. If mood affects their performance on the measure, it may artificially inflate the observed scores for some children and artificially deflate them for others. he important thing about random error is that it does not have any consistent effects across the entire sample. Instead, it pushes observed scores up or down randomly. his means that if we could see all of the random errors in a distribution they would have to sum to F %% there would be as many negative errors as positive ones. he important property of random error is that it adds variability to the data but does not affect average performance for the group. $ecause of this, random error is sometimes considered noise.1age :G of 34elia$ility " ValidityWe often thin! of reliability and validity as separate ideas but, in fact, they're related to each other. Here, I want to show you two ways you can thin! about their relationship.;ne of my favorite metaphors for the relationship between reliability is that of the target. hin! of the center of the target as the concept that you are trying to measure. Imagine that for each person you are measuring, you are ta!ing a shot at the target. If you measurethe concept perfectly for a person, you are hitting the center of the target. If you don't, you are missing the center. he more you are off for that person, the further you are from the center. he figure above shows four possible situations. In the first one, you are hitting the targetconsistently, but you are missing the center of the target. hat is, you are consistently and systematically measuring the wrong value for all respondents. his measure is reliable, 1age :E of 34but no valid "that is, it's consistent but wrong#. he second, shows hits that are randomly spread across the target. &ou seldom hit the center of the target but, on average, you are getting the right answer for the group "but not very well for individuals#. In this case, youget a valid group estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. he third scenario shows acase where your hits are spread across the target and you are consistently missing the center. &our measure in this case is neither reliable nor valid. /inally, we see the +=obin Hood+ scenario %% you consistently hit the center of the target. &our measure is both reliable and valid "I bet you never thought of =obin Hood in those terms before#.0nother way we can thin! about the relationship between reliability and validity is shownin the figure below. Here, we set up a 4(4 table. he columns of the table indicate whether you are trying to measure the same or different concepts. he rows show whether you are using the same or different methods of measurement. Imagine that we have two concepts we would li!e to measure, student verbal and math ability. /urthermore, imagine that we can measure each of these in two ways. /irst, we can use a written, paper%and%pencil e(am "very much li!e the 50 or B=) e(ams#. 5econd, we canas! the student's classroom teacher to give us a rating of the student's ability based on their own classroom observation. he first cell on the upper left shows the comparison of the verbal written test score with the verbal written test score. $ut how can we compare the same measure with itself, We could do this by estimating the reliability of the written test through a test%retest correlation, parallel forms, or an internal consistency measure "5ee ypes of =eliability#. What we are estimating in this cell is the reliability of the measure.he cell on the lower left shows a comparison of the verbal written measure with the verbal teacher observation rating. $ecause we are trying to measure the same concept, weare loo!ing at convergent validity "5ee -easurement 7alidity ypes#.he cell on the upper right shows the comparison of the verbal written e(am with the math written e(am. Here, we are comparing two different concepts "verbal versus math# 1age < of 34control group, you no%longer have a single group design. 0nd, you will still have to deal with threats two major types of threats to internal validity* the multiple%group threats to internal validity and the social threats to internal validity. Minimizing Threats to ValidityBood research designs minimi6e the plausible alternative e(planations for the hypothesi6ed cause%effect relationship. $ut such e(planations may be ruled out or minimi6ed in a number of ways other than by design. he discussion which follows outlines five ways to minimi6e threats to validity, one of which is by research design* 2. +y (rgument; he most straightforward way to rule out a potential threat to validity is to simply argue that the threat in question is not a reasonable one. 5uch an argument may be made either a priori or a posteriori, although the former will usually be more convincing than the latter. /or e(ample, depending on the situation, one might argue that an instrumentation threat is not li!ely because the same test is used for pre and post test measurements and did not involve observerswho might improve, or other such factors. In most cases, ruling out a potential threat to validity by argument alone will be wea!er than the other approaches listed below. 0s a result, the most plausible threats in a study should not, e(cept inunusual cases, be ruled out by argument only. 4. +y Measurement or *$servation; In some cases it will be possible to rule out a threat by measuring it and demonstrating that either it does not occur at all or occurs so minimally as to not be a strong alternative e(planation for the cause%effect relationship. 8onsider, for e(ample, a study of the effects of an advertising campaign on subsequent sales of a particular product. In such a study, history "i.e.,the occurrence of other events which might lead to an increased desire to purchasethe product# would be a plausible alternative e(planation. /or e(ample, a change in the local economy, the removal of a competing product from the mar!et, or similar events could cause an increase in product sales. ;ne might attempt to minimi6e such threats by measuring local economic indicators and the availabilityand sales of competing products. If there is no change in these measures coincident with the onset of the advertising campaign, these threats would be considerably minimi6ed. 5imilarly, if one is studying the effects of special mathematics training on math achievement scores of children, it might be useful to observe everyday classroom behavior in order to verify that students were not receiving any additional math training to that provided in the study. :. +y #esign; Here, the major emphasis is on ruling out alternative e(planations by adding treatment or control groups, waves of measurement, and the li!e. his topic will be discussed in more detail below. > of 34comparison group#, while the other factor would be attrition "i.e., dropout vs. non%dropout group#. he dependent measure could be the pretest or other available pre%program measures. 0 main effect on the attriti