technical adequacy

38
Technical Adequacy Session One Part Three

Upload: river

Post on 22-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Technical Adequacy. Session One Part Three . Reliability. We all have friends, some are reliable and some are not With your partner, discuss what a reliable friend is, List three qualities you would use?. Reliability. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Technical Adequacy

Technical Adequacy

Session OnePart Three

Page 2: Technical Adequacy

Reliability

• We all have friends, some are reliable and some are not

• With your partner, discuss what a reliable friend is,

• List three qualities you would use?

Page 3: Technical Adequacy

Reliability

• In Laymen’s term, reliability is being able to depend that the results are accurate for that test.

• If you did it again, would you get the same score?

• There are many factors that affect reliability

Page 4: Technical Adequacy

Error in measurement

• Two types of error in measurement– Systematic-Bias – Random

Page 5: Technical Adequacy

Bias

• Generally bias refers to raising a persons score because they were advantaged in some way

• However, the groups that was not advantaged, was affected negatively by the bias

• Boys score better on multiple choice questions than girls, the boys were advantaged, the girls were disadvantaged

Page 6: Technical Adequacy

Random error

• Random error – is very different• It is hard to predict who it is affecting, • hard to predict by how much,• Hard to predict by what magnitude

• Reliable test try and eliminate most types of error

Page 7: Technical Adequacy

Reliability Coefficient

• We can measure how reliable tests are by the reliability coefficient– A test free from error- has perfect 1.0– A test filled with error –has a 0– Since every test has error then a reliability

around .85 or above

Page 8: Technical Adequacy

Types of reliability

• Item reliability• Stability• Inter-rater reliability or interobserver

agreement

Page 9: Technical Adequacy

Item reliability

• Item reliability affects the prediction of understanding of the knowledge in several ways– Imagine a study trying to predict how the population of a

country or state will vote in the next election– The prediction is only as good as the sample it selects, if it

select from one area, it will not be representative of the population

– This same concepts applies to developing a test– Test developers cannot possible select all the items they

need to test, the more accurate the representative is of the total knowledge, the more reliable the test

Page 10: Technical Adequacy

Item reliability

• Your goal is for the student performance on the sample items would be the same as if he/she took all of the items ( if that were a possibility)

• The goal of the test is to be able to generalize the students ability to what they know of the entire realm of knowledge in that area

• When we over estimate their ability, our test is unreliable

Page 11: Technical Adequacy

Item reliability

• There are two main approaches to determining item reliability – Alternate form reliability-– Internal consistency

Page 12: Technical Adequacy

Item reliability

• Alternate form reliability-– two forms of a test are developed, each from the

same knowledge base but each with different questions

– You then test a large sample with the test– Half take one form , half the other– They should have similar scores– Scores from the test are correlated and form the

correlation coefficient

Page 13: Technical Adequacy

Item reliability

• Internal consistency– There are many ways to test internal consistency– On popular way is to develop a test that can be

split with a similar level of difficulty– Administer the test and see how the students did – Say the test was split by first half an second half,

grade half of the class on the first half and the other half on the second half and compare scores.

– Can also do if for specific items

Page 14: Technical Adequacy

Stability

• In many cases, we expect out tests to produce information that when tested later, will yield the same results

• A child tested for colorblindness- should reveal being colorblind later in life since the problem is not curable, if not the test was unreliable because it is unstable

Page 15: Technical Adequacy

Stability

• A test should produce similar results• I you give a set of students a test and then

wait a while, then readminister the test, it should produce similar results

• The more similar the results, the more stable and the more reliable

Page 16: Technical Adequacy

Stability

• Stability is not affected by, interventions. • If you test a child and it shows he is weak in a

certain area, then you provide and intervention and the child does better on the next test, that is not considered a weakness in stability

Page 17: Technical Adequacy

Inter-rater reliability

• Inter observer/inter-rater reliability• The concept is simple and easy to understand- – It is analogous to a piece of music, a book or a

movie,– Two people see, read or watch the same thing and

have a different opinion– Watch the next clip, what do you think?

Page 18: Technical Adequacy
Page 19: Technical Adequacy

Inter-rater reliability

– Now Watch the next clip, an count how many people test the mattress

– Do people have similar answers

Page 20: Technical Adequacy
Page 21: Technical Adequacy

Inter-rater reliability

– Inter-rater reliability needs to be developed in several places and can be measured in several ways

– Different raters/observers need to be trained on what to watch, need to have a clear criteria for what is a positive incident of what you are observer

– If you are looking for out of seat behavior, is it standing, squirming, leaning over, or being two feet from the desk

Page 22: Technical Adequacy

Inter-rater reliability

– Inter-rater reliability can be measured in several ways, by comparing two people scores from the same

– Or by doing an item by item analysis and comparing the difference observation

Page 23: Technical Adequacy

Standard Error of Measurement

– Standard error

Page 24: Technical Adequacy
Page 25: Technical Adequacy

Standard Error of Measurement– Imagine you gave a test to a kindergarten student on his letter

sound recognition– You developed 100 test of ten items– After giving the child about ten of these test, the scores would

be about the same.– Some of the test he would know the sounds, some he would

not, but the average would be accurate– SEM tries to predict what that error between the test would

be if you only gave him one test, remember it could be a test he scored well on, or it could be a test he scored poorly on

– It is a similar concept to Standard Deviation, but related specifically to error

Page 26: Technical Adequacy

Estimate of True Scores

• This is more of a conceptual concept, that a statistical unit

• Imagine you take a fifty question test and you do not know ten answers questions

• You guess on them and being a very lucky person, you get 8 right- These eight answers are really not your true score

• If you are unlucky, you get a lower score

Page 27: Technical Adequacy

Confidence Intervals

• Given the fact that true scores are difficult to obtain, the concept of confidence intervals was created.

• When it is combined with SEM it relays very accurate scores

• The level of confidence tells us how certain the score is within the range

Page 28: Technical Adequacy

Confidence Intervals

• If a child has a score of 90 ± 5 ( SEM) the we are saying the child score is somewhere between 85 and 95.

• If we say that a child has a score of 90 ± 5 ( SEM) with a 95% confidence level, we are saying that there is only a 5% chance that the child score is somewhere above or below 85 and 95.

• The lower the confidence, the smaller the range the child score is somewhere between 88 and 92. at a 80% confidence level

Page 29: Technical Adequacy

Validity

• This refers to the degree to which the evidence and theory support the interpretation of the test scores by the proposed uses of tests

• Often test are interpreted for uses they were not designed.

• Therefore, Validity is a fundamental consideration

Page 30: Technical Adequacy

Validity

• The fundamental question that you need to ask, is, Does the testing process lead to the correct inferences about a specific person.

Page 31: Technical Adequacy

Validity

• First assume you give an IQ test in English to a non English speaking person

• You give a test that measures cultural items a that a person was not exposed to

• You use a test designed for national standards that does not align to a local standards ( social studies)

Page 32: Technical Adequacy

Validity

• Content validity- Is the content of the measure representative of the domain of content it is suppose to assess? Experts look at the content and compare it to what they feel it should contain.

Page 33: Technical Adequacy

Validity

• Appropriateness of included items-– Should the questions be here– Do they represent what it is trying to measure

( different than content validity) are the questions from a too high of a grade level, like middle school stuff on an elementary test

– Is the presentation of the items appropriate, are the questions worded properly?

Page 34: Technical Adequacy

Validity

• Content not included- is there important content missing that should be there?

• How are the items measured– Are the multiple choice,– Open ended where you must show work

Page 35: Technical Adequacy

Validity

• Criterion Reference Validity- references a tests ability to describe a test takers ability in two ways– Present- Concurrent Criterion Referenced Validity– Future- Predictive Criterion Referenced Validity

Page 36: Technical Adequacy

Validity

• Concurrent Criterion Referenced Validity-– Is the test/assessment a good predictor of what

the students currently know based on the criterion of the knowledge base?

– If a child takes an achievement test. Is it a valid measure of how well he did in fourth grade?

Page 37: Technical Adequacy

Validity

• Predictive Criterion Referenced Validity– Does the test have the ability to predict what it

say it will predict– A reading readiness test- if a students scores high,

does he learn to read easily?• If a child scores poorly, does he struggle to learn to

read?

Page 38: Technical Adequacy

Validity

• Construct Validity refers to the extent to which a procedure or test measures a theoretical trait or characteristic

• construct validity refers to whether a scale measures or correlates with the theorized psychological construct ( such as intelligence) that it purports to measure.