measurement and evaluation ppp

MEASUREMENT AND

EVALUATION

KEY CONCEPTS

Measurement: the process by which information about the attributes or characteristics of things are determined and differentiated.

It is also a systematic procedure of determining the quantity or extent of all the measurable dimensions in the educative process.

Measurable Dimensions: intelligence, interest, aptitudes, values, health, personality traits, and scholastic achievements.

Evaluation: is a process of summing up the results of measurements, giving them some meaning based on some value judgments.

Test: is a type of measuring instrument designed to measure any quality, ability, skill, knowledge or attitude of students.

Quiz: is a relatively short test given periodically to measure achievement in material recently taught or on any small, newly completed unit of work.

Ex: 5 to 10 minute test or a 10-item test.

Item: is a part of a test that elicits a specific response. Ex: multiple choice question, a true-false question, and the like.

Assessment: A process of gathering and organizing data into an interpretable form to have a basis for decision making.

PRINCIPLES OF

EVALUATION

Principle Application in Classroom Testing and Measurement

1.Significance:Evaluation is an essential component of the teaching-learning process.

The evaluation of learning outcomes necessitates careful planning and the use of appropriate measuring instruments.

2. Continuity:Evaluation is a

continuous process. It takes place before, during and after instruction.

Placement, formative, diagnostic and summative evaluation should be conducted.

3. Scope:Evaluation should

be comprehensive and as varied as the scope of objectives.

The areas to be evaluated should include cognitive, (thinking skills, knowledge, and abilities) psychomotor, (physical and motor skills) and affective (social skills, attitudes and values).

4. Congruency:Evaluation must

be compatible with the stated objectives.

The lesson objectives should be clearly stated. Appropriate evaluation measures should match these objectives.

5. Validity:There must be a close

relationship between what an evaluation instrument actually measures and what it is supposed to measure.

Using a combination of evaluation procedures is likely to yield results that will provide a reliable picture of the learner’s performance.

6. Objectivity: Although effective

evaluation should use all the available information, it is generally believed that this information is more worthwhile if it is objectively obtained.

The data and information needed for evaluation should be obtained in an unbiased manner.

7. Reliability:Evaluation

instruments should be consistent in measuring what it does measure.

The classroom teacher should construct and use tests that will enable him/her to achieve specific lesson objectives consistently.

8. Diagnostic value:Effective evaluation

should distinguish not only between levels of learners’ performance but also between the processes, which result in acceptable performance.

Provisions should be made for diagnosticevaluation to determine the strengths as well as the weaknesses and learning problems of the students.

9. Participation:Evaluation should be a

cooperative effort of school administrators, teachers, students, and parents.

School administrators, teachers, students, and parents should be involved in the evaluation program. Specifically, students as well as their parents should be oriented on the evaluation policies of the school.

10. Variety:Evaluation procedures are

of different types, namely: standardized tests and teacher-made tests; systematic observation and recording; rating scales, inventories, checklists, questionnaires; sentence completion and sociometry.

Using a combination of evaluation procedures is likely to yield results that will provide a reliable picture of the learner’s performance.

TYPES OF

EVALUATION

When conducted

Type Purpose Sample Measures

Prior to Instruction

Placement or Pre-test assessment evaluation (not graded)

Determine entry knowledge and skills of learners.

Place learners in appropriate learning groups.

Serve as basis in planning for a relevant instruction.

Pre-testAptitude TestReadiness Test

During Instruction

Formative evaluation (Usually not graded)

Reinforces successful learning

Provides continuous feedback to both the students and teachers concerning learning, success and failures.

Identifies learning errors that are in need of correction.

Teacher-made testsHomeworkClassroom performanceObservation

Diagnostic Test (usually not graded)

Determines recurring or persistent difficulties.

Searches for the underlying causes of these problems that do not respond to first aid treatment.

Plan for detailed remedial instruction.

After Instruction

Summative Evaluation

Determine the extent to which instructional objectives have been attained.

Achievement Tests

MODES OF

ASSESSMENT

Mode Description Examples Advantages Disadvantages

TRADITIONAL

Paper and pencil test which usually assesses low level thinking skills.

Standardized and Teacher-made tests

- Scoring is objective- Administration is easy because students can take the test at the same time.

-Preparation of the instrument is time consuming-Prone to cheating

PERFORMANCE

A mode of assessment that requires actual demonstration of skills or creation of products of learning.

Practical testOral testProjects

-Preparation of the instrument is relatively easy- Measures behavior that cannot be deceived

-Scoring tends to be subjective without rubrics-Administration is time consuming

PORTFOLIO

A process of gathering multiple indicators of students’ progress to support course goals in dynamic, ongoing, and collaborative process.

Working portfoliosShow portfoliosDocumentary portfolios

-Measures students growth and development-Intelligence fair

-Development is time consuming- Rating tends to be subjective without rubrics

EVALUATION

MEASURES

Purposes of

Test provides useful data for making the following decisions:Instructional: identifying areas of specific weaknesses of learners.

Grading: identifying learners who pass or fail in a given subject.

Selection: Accepting or rejecting applicants for admission into a group, program, or institution.

Counseling and Guidance: Identifying learners who need assistance in personal and academic concerns.

Curriculum: assessing the strengths and weaknesses of a curriculum program.

Administrative Policy: determining the budget allocation for a particular school program.

Types of

Main points of

Comparison

Types of Test

Purpose

Psychological Educational

- Aims to measure students’ intelligence or mental ability in a large degree without reference to what the student has learned.

- Aims to measure the result of instructions and learning.

- Administered before the instructional process.

- Administered after the instructional process.

Scope of Content

Survey Mastery

- Covers a broad range of objectives

-Covers a specific objective

- Measures general achievement in certain subjects

- Measures fundamental skills and abilities

-Is constructed by trained professionals

- Is typically constructed by teachers

INTERPRETATION

Norm-referenced Criterion-referenced

-Results is interpreted by comparing one student with another student

-Results are interpreted by comparing a student based on predefined standard.

-Some will really pass -All or none may pass

-There is competition for a limited percentage of high score

-There is no competition for a limited percentage of high score

-Describes pupil’s performance compared to the others

-Describes pupil’s mastery of course objective

Language Mode

Verbal Non-verbal

-Words are used by students in attaching meaning to or responding to test item

-Students do not use words in attaching meaning to or in responding to test items (e.g. graphs, number and 3-d subjects)

CONSTRUCTION

Standardized Informal

-Constructed by a professional item writer

-Constructed by a classroom teacher

-Covers a broad range of content covered in a subject area

-Covers a narrow range of content

-Uses mainly multiple choice -Various types of items are used

-Items written are screened and the best items were chosen for the final instrument

-Teacher picks or writes item as needed for the test

-Can be scored by a machine -Scored by a teacher

-Interpretation of results is usually norm-referenced

-Interpretation of results is usually criterion-referenced

Manner of

Administration

Individual Group

-Mostly given orally or requires actual demonstration of skill

-This is a paper and pencil test

-One-on-one situation thus, many opportunities for clinical observation

-Loss the rapport. Insight and knowledge about each examinee

-Chance to follow-up examinee’s response in order to clarify/ comprehend it more clearly

-Same amount of time needed to gather information from one student

Effect of

Biases

Objective Subjective

-Scorer’s personal judgment does not affect the scoring

-Affected by scorer’s personal opinion, bias or judgment

- Only one answer satisfies the requirement of statement

-Several answers are possible.

-Little or no disagreement on what is the correct answer

-Possible disagreement on what is the correct answer

Time Limit and Level of Difficulty

Power Speed-Consists of series of items arranged in ascending order of difficulty

-Consists of items approximately equal in difficulty

-Measures a student’s ability to answer more difficult items

-Measures student’s speed or rate and accuracy in responding

FORMAT

Selective Supply

-There are choice for the answer

-There are no choices for the answer

-Multiple choice, True-False, Matching type

-Short answer, completion, restricted and extended-response essay

-Can be answered quickly -Preparation of items is relatively easy for only few questions are needed

-Test constructor has a prone to guessing

-Lessen the chance to guess the correct answer

-Time consuming to construct

-Time consuming to score

-Guessing is a problem -Bluffing is a problem

Classification of

Teacher-made Tests

Objective Test

Recall Types (Supply Test)

-Simple Recall-Completion/Fill in the

blanks-Identification

-Labeling-Enumeration

Recognition Types (Selective Test)

-Alternative response-Multiple choice-Matching type

Rearrangement of

Elements

ESSAY EXAMINATION

Unrestricted or Uncontrolled type

Restricted or Controlled type

Objective Test: generally calls for single words, phrases, numbers, letters, and other symbols as responses to items.

Objective Tests are classified as follows:Simple Recall: is defined as one in which each item appears as a direct question, a stimulus word or phrase, or a specified direction. The response is recalled rather than selected from a list given by the teacher. The question should ask only for an important aspect of a fact.

Example: Answer the following questions. Write your answer at the spaces provided for at the left._____________ 1. Who was known as the “Tagalog Joan of Arc” because of her exploits during the revolution._____________ 2. The Katipunan in Cavite was divided into two factions-the Magdiwang and the Magdalo. While Magdiwang in Cavite was led by Mariano Alvarez, who led the Magdalo faction?____________ 3. What is the primary source of the objectives of education in the Philippines at present?

Completion Test (Fill-in the Blanks): is defined as a series of sentences in which certain important words or phrases have been omitted for the pupils to fill. A sentence may contain one or more blanks and the sentences may be disconnected or organized into paragraph.

Example: Supply the missing word or words to complete the meaning of the statement. Write your answer at the left.

______ 1. A thermometer measures ____. ______ 2. _____ is the process that occurs when dry ice, (CO2 solid) is changed to CO2 (gas).

______ 3. Radium was discovered by ____.

Identification Test: a form of completion test in which a term is defined, describe, explained or indicated by a picture, diagram, or a concrete object and the term referred to is supplied by the pupil or student.

Example: Identify the following. Write the answers at the left.________1. The conqueror of Magellan.________ 2. The first Filipino cannon maker.________ 3. The Rajah who led the fight against Legaspi in 1571.

Labeling Test: is a type of test in which the names of parts of a diagram, map, drawing, or picture are to be indicated.

Give the name of the island indicated.

1._______ 2._______3._______4._______5._______

Enumeration: An enumeration test is a type of completion test in which there are two or more responses to an item.

Alternative Response: Made up of items each of which admits only one of only two possible responses. Varieties of this test are true or false, right or wrong, correct or incorrect, yes or no, fact or opinion, and the like.

Example: Before each statement, write True if the statement is true and write False if the statement is false.______1. Andy gained knowledge and skills by talking and sharing his experiences to his parents, teachers, older siblings, and older cousins. This situation illustrates Psychosocial Theory._____2. In Piaget’s theory of cognitive development, a child between birth to two years, that is during the sensorimotor period, does not see things in abstract forms.____ 3. Law of Readiness by Edward Lee Thorndike explains that any connection is strengthened in proportion to the number of times it occurs and in proportion to the average vigor and duration of the connection.

Multiple Choice Test: made up of items each of which presents two or more responses, only one of which is correct or definitely better than the others. Each item must be in the form of a complete sentence, question, incomplete statement, or a stimulus word or phrase.

Example: Select the letter of the correct answer. Write at the left of the statement the letters that correspond to the correct options._______1. A student collapsed in her social studies class. It was found out that she did not eat her lunch. What principle is shown in this situation?

a. Physiological need c. Safety needb. Security need d. Psychological

need______ 2. Which of the following refers to the repetition of facts and skills which the teacher wishes to reinforce for mastery?

a. Drill c. Recitation b. Review d. Mastery

- The main advantage of the multiple choice test is its superior capacity to measure higher levels of knowledge, judgment, reasoning and understanding compared to other types of objective tests.

Matching Type of Test: A matching type test is composed of two columns; one is called stimulus and the other, the response column. Each item in one column is to be matched with another item to which it corresponds in the other column.

Example: Match the items in Column B with the items in Column A. Write the letter of your answer on the space provided before each number.

Column A Column B____1. Patron Saint of Teachers a. Paolo Freire____ 2. Wrote the “Pedagogy ofthe Oppressed” b. Edward Lee Thorndike____3. Authored the laws c. Friedrich August Froebelof learning____ 4. Father of Kindergarten d. St. John Baptiste de la Salle

e. Herbert Spencer

Rearrangement of Elements: consists of ordering items on some basis. Ordering measures memory of relationship and concepts of organization in many subjects.

Example: Arrange the following events in their chronological order. Write their corresponding letters at the left._____1. a. Execution of Dr. Jose Rizal_____2. b. Declaration of the first

Philippine Independence_____3. c. The Edsa Revolution_____4. d. World War II

ADVANTAGES OF

OBJECTIVE EXAMINATIO

1. The sampling of the objective examination is more representative and so measurement is more extensive. This is so because more items are include in the test.

2. Handicaps such as poor vocabulary, poor handwriting, poor spelling, poor grammar and the like do not adversely affect the ability to make a reply.

3. Scoring is not subjective because the responses are single words, phrases, numbers, letters, and other symbols with definite value points and hence, the personal element of the scorer is removed.

DISADVANTAGES OF

OBJECTIVE EXAMINATIO

1. Generally, it measures factual knowledge only. It hardly measures higher levels of knowledge or complex ideas.

2. It does not help in nor encourage the development of the ability of the students to organize and express their ideas.

3. It encourages memory work even without understanding.4. It is easy to cheat in an objective examination.5. It is hard to prepare.

Essay Examinations: The essay type test is a type of examination in which the subject or examinee is made to discuss, enumerate, compare, state, explain, analyze or criticize.

Classification of Essay

Examinations or Questions

Unrestricted or Uncontrolled Type: In this type, the students have very wide latitude of expression. They have a wide freedom of organizing their ideas in the way they want.Example: Discuss the economic condition of the country.

Restricted or Controlled Type: in this type, the student is limited in organizing his response. There are guides in making a response.Example: Give and discuss the causes of the Philippine Revolution starting with the remote causes and followed by the immediate ones.

ADVANTAGES OF

ESSAY EXAMINATIO

1. The essay examination measures the higher levels of knowledge. It measures the ability to interpret, evaluate, apply principles, create, organize thoughts and ideas, contrast, etc.

2. The essay test helps students organize their thoughts and ideas logically.

3. The essay test is easy to prepare. It can be prepared in a short time.

4. It is harder to cheat in an essay test.

DISADVANTAGES OF ESSAY

EXAMINATION

1. Essay tests are usually not well prepared. Some questions are vague.2. There is difficulty of giving the right weight to each question.

3. Its usability is low. It takes a long time scoring it because no one except the teacher handling the subject on which the test is based can check the test papers. It cannot be mechanically scored.

4. Sampling is limited. Few questions can be included and hence, the coverage is limited.

5. Scoring is subjective. The causes of subjectivity are:Different standards of excellence of different teachers scoring the papers or different standards of the same teacher checking the papers at different times.The physical and mental conditions of the person checking a paper may also affect the scoring.

THE CLASSROOM TESTING PROCESS

1. Specifying the objectives2. Preparing the Table of Specification3. Determining the item format, number, of the test item, and difficulty level of the test4. Writing the test items that match the objectives5. Editing, revising, and finalizing test items6. Administering the test7. Scoring8. Tabulating and analyzing the results9. Assigning grades

GENERAL SUGGESTIONS

IN CONSTUCTING

WRITTEN EXAMINATION

1. Prepare a table of specifications or a test blue print and use it as a guide for writing test items.

2. Match the test item with the instructional objectives.

3. Keep the vocabulary level of the test items as simple as possible. Ensure that the test directions are clear and direct.

4. State each test item so that only one answer is correct.5. See to it that one test item does not provide help or give clues in answering other test items

GUIDELINES IN WRITING

SPECIFIC TEST ITEMS

Completion Test (Fill-in the Blanks)

1. Omit only words that are essential to the meaning of the statement or sentence.Example: The founder of Katipunan was ____________________.

2. Do not omit so many words in a statement. The statement may lose its meaning.Example: Wrong : _______ was _____________ the first ________ of the __________.

3. Make the blanks equal in length to avoid clues. Long blanks suggest long answer, short ones suggest short answer.Ex: 1. The brain of Katipunan was __________. 2. The brain of revolution was __________. 3. The founder of La Liga Filipina was __________.

4. Answer should be written before the number for easy checking.

5. Avoid equivocal questions. Equivocal questions admit two or more interpretations.Example: Wrong: Rizal was born in ___________. The answer to this may refer to place or time.

Better: Rizal was born in the year _____________.

Identification: 1. The definition, description, or explanation of the term may be given by means of a phrase or incomplete statement if it is not indicated by a picture, diagram, or complete object.Example: Identify the following:______ 1. The hero of the Battle of Mactan______ 2. The longest Filipino revolt______ 3. A triangle with a right angle

2. The statement should be so phrased that there is only one response.Example: Identify the following:Wrong: ____________ (1.) Manuel L.

QuezonBetter: ____________ ( 2.) The first

President of the Commonwealth.

Labeling Test:

1. Make a diagram map, drawing or picture to be labeled very clear and recognizable especially the parts to be labeled.

2. The parts to be labeled should be indicated by arrows so that the labels can be written in a vertical column in a definite place and not on the face of the diagram, map, drawing or picture.

Alternative Response Test (True or False)1. Avoid the use of absolute modifiers such as all, none, no, always, never, nothing, only, alone, and the like unless they are a part of a fact or truth. These terms tend to make the statement false.

Example:(1) All Filipinos are hardworking. (This is of course false)(2) All players in athletics are strong. (Again, this is false)(3) All first place winners in the Olympics received gold medals. (This is true because this is a fact)

2. Vague qualifiers such as usually, seldom, much, little, few, small, large, and the like should be used only when they are a part of a fact or truth.

Example:(1) Some Filipinos are

thrifty. (Of course this is true)

(2) Many stars are already very old. (True. This is a fact)

3. The number of true statements and false statements should be approximately equal.

4. The correct responses should not follow a pattern; otherwise the students may be able to give the right symbols although they do not know the real answers.

5. Start with a false statement since it is common observation that the first statement in this type of test is always positive.

Multiple Choice Test

CONSTRUCTING/

IMPROVING THE MAIN

1. The main stem of the test item may be constructed in question form, completion form, or direction form.Example: Question Form: Which is the same as four hundred seventy?

a. b. c. d.Completion Form: Four hundred seventy is the same as ___________.

a. b. c. d.Direction Form: Add: 22 + 43

a. b. c. d.

* Three alternatives (for

grades I-III) or four alternatives (for

grades IV-VI) should be provided

in each case.

2. The questions should not be trivial. There should be a consensus on its answer.

Example: Trivial Question:What time does the sun

rise in the morning? a. 4 o’clock c. 6 o’clockb. 5 o’clock d. 7 o’clock

3. Each question should have only one answer, not several possible answers.

4. Highlight negative words in the stem for emphasis.Example: One of the strengths of the Filipino character is “pakikipagkapwa-tao”. This is manifested in all of the following EXCEPT-a. “Malasakit” c. “Lakas ng Loob”b. “Pakikiramay” d. “Pakikiramdam”

CONSTRUCTING/

IMPROVING ALTERNATIVE

1. Alternatives should be as closely related to each other as possible.

Example:

Poor alternatives: 74 + 23 = ______.

a. 87 b. 97c. 100

-Pupils’ mistakes should be anticipated. Such possible mistakes should be given among the alternatives.

2. Alternatives should be arranged in natural order.

Example: Poor: Pedro is ten years old. How many trips has the earth made around the sun since he was born?

a. 365b. 12 c. 10d. 30

Improved Alternatives: a. 10 a. 365b. 12 b. 30c. 30 c. 12d. 365 d.

3. Alternatives should have grammatical parallelism. Example:

Poor: Clay can be used for:

a. making hollow blocks

b. making potsc. garden soil

Improved: Clay can be used for:

a. making hollow blocks

b. growing vegetables

c. making pots

4. Arrangement of correct answers should not follow any pattern.

Example:Poor:

1. b 1. a 1. a2. c 2. a 2. b3. b 3. b 3. a4. c 4. b 4. b5. b 5. c 5. a6. c 6. c 6. b

Matching Type:1. Use only homogenous material in a single matching exercise.

Example: Poor:Column A Column B

___ 1. Quezon a. MIMAROPA___2. Pampanga b. Cagayan___3. Camarines Norte c. Daet__4. Region IV-A d. CALABARZON__5. Region IV-B e. Lucena

f. San Fernando

2. There should be two columns, written side by side, the stimulus column or question column should be written to the left side and the response column to the right. There should be a short blank before each stimulus question where to write the symbol of the response.

Example: Column A Column B (Stimulus) (Response)

___1. Proponent of a. Jean Piaget Psychosocial Theory___2. Major contributor b. Erik Ericson on the Theory of Cognitive Development

3. Directions should be clear in stating what items in the response column should be matched within the stimulus column and vice versa. In addition, there should be an unequal number of responses and premises (stimuli).

Direction: Match the capital towns in column B with the provinces in column A and write the letter of symbol of each town in the spaces provided before each number in column A.

Column A Column B_____1. Cagayan a. Malolos_____2. Pampanga b. Pasig_____3. Rizal c. Vigan_____4. Bulacan d. Lucena_____5. Ilocos Sur e. Tuguegarao_____6. Isabela f. Lingayen_____7. Zambales g. San Fernado_____8. Pangasinan h. Daet_____9. Batangas i. Laog_____10. Ilocos Norte j. Bangued

k. Ilaganl. Ibam. Batangas

Rearrangement of Elements

When this type of test is used the basis of arrangement should be tested clearly. The bases are:

Chronological Order: This is arranging items in the order in which they occur.Example: Arrange the following Presidents in their chronological order. Write your answer at the right.Garcia 1. ___________Marcos 2. ___________Magsaysay 3. ___________Macapagal 4. ___________Osmeña 5. ___________Quezon 6. ___________Quirino 7. ___________Roxas 8. ___________

Geographical Order: This is arranging things according to their geographical location.Example: Arrange the following provinces from north to south. Write your answer at the right.

Bulacan 1. ___________Cagayan 2. ___________Nueva Ecija 3. ___________Sorsogon 4. ___________Nueva Viscaya 5. ___________Batangas 6. ___________Isabela 7. ___________Albay 8. ___________Camarines Sur 9. ___________Romblon 10. __________

Arrangement According to Magnitude: The basis of this arrangement is size which may be height, width, and distance.

Example: List the following biological classifications from the most general to the most specific.

Family 1. __________Genus 2. __________Phylum 3. __________Order 4. __________Class 5. __________Specie 6. __________Subphylum 7. __________

Alphabetical Order: This is arranging words according to the alphabet or according to their appearance in the dictionary.

Example: Arrange the following words alphabetically.loud 1. __________tone 2. __________music 3. __________song 4. __________duet 5. __________alto 6. __________chorus 7. __________melody 8. __________opera 9. __________

Arrangement According to Importance, Quality, etc. : Example: Arrange the following cities according to their contribution to the country’s foreign trade.Davao 1. __________Zamboanga 2. __________Manila 3. __________Cebu 4. __________Iloilo 5. __________

Essay Type Tests:

1. State questions that elicit thedesired cognitive skills specified inthe learning outcomes.2. Write the questions in such a way that the specific task is clearly understood by the examinee.3. Ask all students to answer the same questions. Avoid using optional questions.

4. Indicate the number of points or the amount of time to be spent on each question.5. Ask a colleague to critique the questions.6. Prepare a model answer to each question.

Non-test Measures

Performance-Based

Evaluation Measures

Restricted-type TasksMeasures a narrowly defined skillRequire relatively brief responseTask is structured and specific

Examples:-Constructing a

histogram from data provided

-Writing a term paper about the significance of Edsa Revolution.

Extended-type TasksMore complex elaborate and time consumingInvolves collaborative work with small groups of learnersExample:

-Composing a poem-Making commercial

Affective Evaluation Measures

Teacher ObservationUnstructured-Open-ended-Does not require checklist or rating scale for recording purposes.Structured-Uses checklist or rating scale for recording purposes.

Learner Self-reportAutobiography- The learner describes his/her own life as he/she has experienced and viewed it.Self-expression- The learner responds to a particular question, issue or concern in an essay form.Self-description- The learner paints a picture of himself/herself in his own words.

Peer RatingsSociometric Technique- shows the interpersonal relationships among the members of a group.Sociodistance scale- Measures the degree of acceptance or rejection of a learner in relation to the other group members.

RUBRICS

A rubric is a scoring tool for subjective assessments. It is a set of criteria and standards linked to learning objectives that is used to assess a student's performance on papers, projects, essays, and other assignments. Rubrics allow for standardized evaluation according to specified criteria, making grading simpler and more transparent.

Rubrics for Class Debate

CATEGORY 4 3 2 1

Understanding of

The team clearly understood the topic in-depth and presented their information forcefully and convincingly.

The team clearly understood the topic in-depth and presented their information with ease.

The team seemed to understand the main points of the topic and presented those with ease.

The team did not show an adequate understanding of the topic.

ORGANIZATION

All arguments were clearly tied to an idea (premise) and organized in a tight, logical fashion.

Most arguments were clearly tied to an idea (premise) and organized in a tight, logical fashion.

All arguments were clearly tied to an idea (premise) but the organization was sometimes not clear or logical.

Arguments were not clearly tied to an idea (premise).

PRESENTATION STYLE

Team consistently used gestures, eye contact, tone of voice and a level of enthusiasm in a way that kept the attention of the audience.

Team usually used gestures, eye contact, tone of voice and a level of enthusiasm in a way that kept the attention of the audience.

Team sometimes used gestures, eye contact, tone of voice and a level of enthusiasm in a way that kept the attention of the audience.

One or more members of the team had a presentation style that did not keep the attention of the audience.

Use of Facts / Statistics

Every major point was well supported with several relevant facts, statistics and/or examples.

Every major point was adequately supported with relevant facts, statistics and/or examples.

Every major point was supported with facts, statistics and/or examples, but the relevance of some was questionable.

Every point was not supported.

INFORMATION

All information presented in the debate was clear, accurate and thorough.

Most information presented in the debate was clear, accurate and thorough.

Most information presented in the debate was clear and accurate, but was not usually thorough.

Information had several inaccuracies OR was usually not clear.

Respect for Other

All statements, body language, and responses were respectful and were in appropriate language.

Statements and responses were respectful and used appropriate language, but once or twice body language was not.

Most statements and responses were respectful and in appropriate language, but there was one sarcastic remark.

Statements, responses and/or body language were consistently not respectful.

REBUTTAL

All counter-arguments were accurate, relevant and strong.

Most counter-arguments were accurate, relevant, and strong.

Most counter-arguments were accurate and relevant, but several were weak.

Counter-arguments were not accurate and/or relevant

GUIDELINES IN

DEVELOPING RUBRICS

1. Identify the important and observable features or criteria of an excellent performance or quality product.

2. Clarify the meaning of each trait or criterion and the performance levels.

3. Describe the gradations of quality product or excellent performance.

4. Keep the number of criteria reasonable enough to be observed or judged.

5. Arrange the criteria in order in which they will likely to be observed.

6. Determine the weight of each criterion and the whole work or performance in the final grade.

ESTABLISHING CONTENT VALIDITY

- The degree of validity is the single

most important aspect of a test.

Validity: can be defined as the degree to which a test is capable of achieving certain aims. Validity is sometimes defined as truthfulness: Does the test measure what it intends to measure?

KINDS OF

VALIDITY

1. Face Validity: is done by examining the physical appearance of the test.

2. Content Validity: is related to how

adequate the content of the test samples the domain about which inferences are to be made. It has to do with the appropriateness of the test to the curricular objectives.

3. Criterion-related Validity: This kind of validity pertains to the empirical technique of studying the relationship between predictor, or test scores and some independent external measure, or criterion.

Kinds of Criterion-related Validity

3.1 Concurrent Validity: describes the present status of the individual by correlating the sets of scores obtained from two measures given concurrently.

3.2 Predictive Validity: describes the future performance of an individual by correlating the sets of scores obtained from two measures given at a longer time interval.

4. Construct Validity: is established statistically by comparing psychological traits or factors that theoretically influence scores in a test.

Kinds of Construct

Validity

4.1 Convergent Validity: is established if the instrument defines another similar trait other than what it is intended to measure. e.g. Critical Thinking Test may be correlated with Creative Thinking Test.

4.2 Divergent Validity: is established if an instrument can describe only the intended trait and not the other traits. e.g. Critical Thinking Test may not be correlated with Reading Comprehension Test.

FACTORS INFLUENCI

NG VALIDITY

1. Appropriateness of Test: it should measure the abilities, skills and information it issupposed to measure.

2. Directions: it should indicate how the learners should answer and record their answers.

3. Reading Vocabulary and Sentence Structures: it should be based on the intellectual level of maturity and background experience of the learners.

4. Difficulty of Items: it should have items that are not too difficult and not too easy to be able to discriminate the bright from slow pupils.

5. Construction of Test Items: it should not provide clues so it will not be a test on clues nor ambiguous so it will not be a test on interpretation.

6. Length of the Test: it should just be of sufficient length so it can measure what it is supposed to measure and not that it is too short that it cannot adequately measure the performance we want to measure.

7. Arrangement of items: it should have items that are arranged in ascending level of difficulty such that it starts with the easy so that the pupils will pursue on taking the test.

8. Patterns of Answers: it should not allow the creation of patterns in answering the test.

ESTABLISHING TEST

RELIABILITY

Reliability: refers to consistency; it is the degree to which measurements of the content knowledge or of cognitive development ability are consistent each time the test is given.

CONSIDERATIONS IN

ESTABLISHING TEST

RELIABILITY

1.Length of the Test: Generally speaking, the longer the test, the more reliable it will be.

2. The quality of the individual questions or test items: Test maker must see to it that each item is as precise and as understandable as possible.

3. Interpretability (Scorability): The interpretability of an evaluation devise refers to how readily scores may be derived and understood.

4. Usability (Practicability, Economy): Usability is the degree to which the evaluation instrument can be successfully employed by classroom teachers and school administrators without an undue expenditure of time and energy.

Five factors upon which usability depends:

a. Ease of administration;b. Ease of scoring;c. Ease of interpretation and

application; d. Low cost

5. Objectivity: can be obtained by eliminating the bias, opinions or judgments of the person who checks the test.

6. Authenticity: the test should simulate real-life situations.

DESCRIBING EDUCATIONA

L DATA

ITEM ANALYSISThe process of

testing the effectiveness of the

items in an examination.

Item analysis gives information concerning each of the following points:1. The difficulty of the item2. The discriminating power of the item3. The effectiveness of each item

SEVERAL BENEFITS OF ITEM ANALYSIS

1. It gives useful information for class discussion of the test. 2. It gives data for helping the students to improve their learning method. 3. It gives insights and skills which lead to the construction of better test items for future use.

SIMPLIFIED ITEM-ANALYSIS PROCEDURE (UL Method)

Only the upper group (U) and lower group (L) scores are considered. The middle or average group is held in abeyance.

Result of item # 5 taken by 30 students in Mathematics test which is subject for Item

Analysis.Student No. Score Answer Student No. Score Answer1 86 D 16 60 D

2 81 A 17 80 A

3 73 E 18 50 C

4 82 E 19 80 B

5 85 D 20 89 C

6 74 C 21 90 E

7 94 A 22 77 E

8 74 B 23 63 A

9 75 C 24 57 B

10 76 D 25 70 E

11 75 E 26 95 A

12 79 A 27 72 E

13 65 D 28 79 E

14 87 E 29 83 B

15 98 E 30 97 E

STEPS:

1. Arrange the test scores from the highest to the lowest.

98-E 82-E 74-C

97-E 81-A 74-B95-A 80-A 73-E94-A 80-B 72-E90-E 79-E 70-E89-C 79-A 65-D87-E 77-E 63-A86-D 76-D 60-D85-D 75-E 57-B83-B 75-C 50-C

2. Separate the top 27% and the bottom 27% of the papers. The former is called the upper group (U) and the latter, lower group (L). Set aside the middle group.

30 students x 27% = 88 papers from the upper group and 8 papers from the lower group should be analyzed.

Upper 27% Lower 27%

98-E 73-E

97-E 72-E

95-A 70-E

94-A 65-D

90-E 63-A

89-C 60-D

87-E 57-B

86-D 50-C

3. Record the frequency from step 2

Options A B C D E*Upper (27%)

2 0 1 1 4

Lower (27%)

1 1 1 2 3

4. Compute the percentage of the upper group that got the item right and call it U.

U= 4/8 (100)U= 50%

Where: 4= Number of right items

8= Number of cases of upper 27%

5. Compute the percentage of the lower group that got the item right and call it L.

L=3/8 (100)L= 37.5%

Where: 4=Number of right items8=Number of cases of upper 27%

6. Average U and L percentage and the result is the difficulty index of the item. U+L/2=difficulty index

50%+37.5% / 2 = difficulty index

87.5 / 2 = 43.75 %

7. Use the table of equivalents below in interpreting the difficulty index:

.00-.20 Very difficult

.21-.80 Moderately difficult

Retained

.81-1.00 Very Easy

8. Estimate the discrimination index.In the foregoing sample, four students in the upper group and three students in the lower group chose the correct answer. This shows positive discrimination, since the upper group got the item right more frequently than the students of the lower group.

Negative discriminating power is obtained when more students in the lower group got the right answers than the upper group.

Index of discrimination = RU-RL/NG

Where: RU= right responses of the upper group

RL= right responses of the lower group

NG= number of students in each group

To illustrate:Index of discrimination= 4-3/8

= 1/8=.125

9. Refer to the table of equivalents below in interpreting the discrimination index.

.00-.19 Poor item Rejected

.20-.29 Moderate (reasonably good item)

Rejected or revised

.30-up Very good item

Retained

10. Determine the effectiveness of the distracters.

A good distracter attracts students in the lower group more than in the upper group.

Hence, for our illustrative item analysis data in step 3:

Options:

Options A B C D E*

Upper (27%) 2 0 1 1 4

Lower (27%) 1 1 1 2 3

B Good: because more students from the lower groups are attractedD

A Poor: since it attracted more students in the upper groupC Fair: because both the upper and the lower groups have

the same frequencyE (the correct answer) Good: because more students from

the upper group choose the correct answer.

SKEWNESS AND

KURTOSIS

Skewness: is the degree of asymmetry, or departure from symmetry of a distribution.

Skewed to the Right (positive skewness): If the frequency curve of a distribution has a longer “tail” to the right of the central maximum than to the left. Most scores are below the mean.

Illustration:

Positive Skewness: Low Performance: Mean is greater than the mode

Skewed to the Left (negative skewness): if the frequency curve of a distribution has a longer “tail” to the left of the central maximum than to the right. Most scores are above the mean and there are extremely low scores.

ModeMean Median

Illustration:

Negative Skewness: High Performance: Mean is lower than the mode

Kurtosis: is the degree of peakedness of a distribution, usually taken relative to a normal distribution.

Leptokurtic: A distribution having a relatively high peak

Platykurtic: a distribution having

flat-topped.

Mesokurtic: a distribution which is

moderately peaked.

MEASURES OF

CENTRAL TENDENCY

and VARIABILITY

Assumptions used

STATISTICAL TOOL

Measures of Central

Tendency -

describes the representative value of a set

of data

Measures of Variability

–Describes the

degree of spread or

dispersion of a set of data

When the frequency

distribution is regular/

symmetrical/normal

Mean-Computational average-Affected by extreme scores-Most reliable among the measures of central tendency

Standard Deviation- the root mean square of the deviations from the mean-the most reliable measure of variability

When the frequency

is irregular / skewed

Median- Positional average-middle score-measure of location-50th percentile-the most stable measure of central tendency because it is not affected by the magnitude of the scores

Quartile Deviation- the average deviation of Q1 and Q3-the most stable measure of variability-commonly used as measure of dispersionor variability

When the distribution of scores is normal and

quick answer is needed

Mode- Nominal average-the score with highest frequency

Range- the difference between the highest and lowest values in a set of observation

Computation of the

Measures of Central

Tendency

Ungrouped Data; used for few cases (N<30)

Formula: x = x ÷ NWhere:

x = Mean x = Summation of

score N= Number of scores

Examples: Score are 6,8,3,9, and 12

Solution: x = x ÷ N = 6+8+3+9+12

x = 7.6

Grouped Data : used for large cases (N≥30)Formula:

x= x f ÷ NWhere:

f= class frequencyx= class midpointN= sum of

frequencies

Example:Data given: The following are scores of 50 students in College Algebra.

Unarranged data

41 47 29 28 2523 26 46 38 2837 46 28 23 2827 20 44 26 3736 29 26 43 2118 27 29 34 4243 29 34 14 2740 25 28 32 1432 29 40 13 2411 41 31 24 27

Procedure:1. Arrange the given data into an array.Array: This is the arrangement of data from highest to lowest or from lowest to highest.

Arranged Data47 40 29 27 2446 38 29 27 2346 37 29 27 2344 37 29 26 2143 36 29 26 2043 34 28 26 1942 34 28 26 1841 32 28 25 1441 32 28 25 1340 31 28 24 11

2. Construct a frequency distribution:

a. find the range of the score in the above data.

R = H-LWhere: R= range;

H= highest score; L= lowest score

The range is 36. (47-11) =36

b. Find the number of classes: Formula: Number of classes = Range ÷ class width desired+ 1

The number of classes is

13. (36 ÷ 3 + 1= 13)

Class interval

45-4742- 4439-4136-3833-3530-3227-2924-2621-2318-2015-1712-149-11

Number of

classes

3. Get the frequency and midpoint of the entire class interval.Frequency Distribution of College Algebra Scores for 5o

StudentsClass Interval f (frequency) mp (midpoint)

45-47 3 46

42- 44 4 43

39-41 4 40

36-38 4 37

33-35 2 34

30-32 3 31

27-29 13 28

24-26 7 25

21-23 3 22

18-20 3 19

15-17 0 16

12-14 2 13

9-11 1 10

4. Multiply the midpoint by their corresponding frequencies.

N= 50 xf = 1,477

Class Interval f (frequency) x (midpoint) fx

45-47 3 46 13842- 44 4 43 17239-41 4 40 16036-38 4 37 14833-35 2 34 6830-32 3 31 9327-29 13 28 36424-26 7 25 17521-23 3 22 6618-20 3 19 5715-17 0 16 012-14 2 13 269-11 1 10 10

5. Solve using the formula:

x= xf ÷ N= 1,477 ÷ 50= 29. 54

MEDIAN

Ungrouped DataCase 1. The total number of cases is an odd number

Procedure:1. Arrange the scores from highest to lowest or vice versa.Example: (N=11)2. Get the middle score. That is the median.

Median

Case 2: The total number of cases is an even number. (N=10)Procedure:1. Arrange the score from highest

to lowest or vice versa. 2. Get the middlemost scores.3. Compute the average of the two middlemost scores. The average is the median score.

Median= 90+85 / 2 = 87.5

98969591908584838280

Middlemost scores

Case 3: When the middlemost score occurs two or more times.

Procedure:1. Get the middlemost score/s,

its/their identical score/s and its/their counterparts either above or below the middlemost score/s.

2. Compute their average and the average score is the median.

Example:a. N is odd (N=7)

Median = 75+75+73/ 3

= 223+3 = 74.33

86847575736967

Middlemost

scores

b. N is Even (N=8)

Median= 75+75+73+71 / 4 =73.5

848175 7573716760

Middlemost scores

Grouped Data 1. Add up or accumulate the frequencies starting from the lowest to the highest class limit. Call this cumulative frequency. (CF)2. Find one half of the number of cases in the distribution. (N/2)3. Find the cumulative frequency which is equal or closest but higher than the half of the number of cases. The class

containing this frequency is the median class.4. Find the lowest limit (LL) of the class below the median class.5. Get the cumulative frequency of the class below the median class.6. Subtract this from the half of the number of cases in the distribution. (N/2-CFb)7. Get the frequency of the median class. (FMdn)8. Find the class interval (i) then follow the given formula below.

Formula :

Median = LL + i N/2 – CFb FMdn

Where:LL= lowest limit of the median classi= class intervalN/2= half of the number of casesCFb= cumulative frequency below the

median classFMdn= frequency of the median class

Example:

i=5 N=50

Class Limits f CF45-49 2 5040-44 0 4835-39 12 4830-34 13 36 24-29 10 23 20-24 5 1315-19 4 810-14 4 4

LL29.5

Solution:Median= LL + N/2 – CFb i

= 29.5 + 25 – 23 5 13

= 29.5 + (2/13) 5= 29.5 + 0.77= 30.27

Ungrouped Data Get the most frequent score.Example 1: one mode or unimodal: 28, 21, 25, 25, 22 and 20

Mode is 25

Example 2: two modes or bimodal: 23, 23, 21, 20, 19, and 19Modes are 23 and 19

Example 3: three modes or trimodal: 15, 16, 14, 12, 11, 11, 12, and 16Modes are 11, 12, and 16

When there are more than three modes, they are called polymodal or multimodal.

Grouped Data Crude Mode- refers to the midpoint of the class limit with the highest frequency.

Procedure:1. Find the class limit with the highest frequency.2. Get the midpoint of that class limit.3. The midpoint of the class limit with the highest frequency is the crude mode.

Example:Class limits f (frequency)

45-47 342- 44 439-41 436-38 433-35 230-32 327-29 1324-26 721-23 318-20 315-17 0

The highest frequen

Mode class 27 -29= 28 (mode)

Refined Mode- refers to the mode obtained from an ordered arrangements or a class frequency distribution.Procedure:

1. Get the mean and the median of the grouped data.

2. Multiply the median by three. (3Mdn)

3. Multiply the median by two. (2Mn)

4. Subtract 2Mn from 3Mdn to get the Mode. (Md)

Formula: Md= 3Mdn – 2Mn

Example: Md = 3 (30.27) – 2

(29) = 90.81 – 58 = 32.8

GRADING AND

REPORTING

Grading: is the process of assigning value to a performance. Purposes of Grading 1. Certify learners mastery of a specific content or level of achievement2. Identify, selecting and grouping learners for a particular academic programs3. Providing information in diagnosis and planning4. Helping learners improve their school

performance

Approaches to GradingLetter Grades: ( A, B, C, D)Generally used in marking learners’ performance on products other than objective tests.Correspond to verbal descriptions such as excellent or outstanding, good, average, or acceptable.Provide an overall indication of performance.

Percentage: (70%, 75%, 80%)Indicate the percentage of items answered correctly.Gives a finer discrimination in learners’ performance than letter grades.Communicates only a general indication of learners’ performances.

Pass/FailShows mastery or no mastery of learning objectives.Does not clearly reflect the learners’ actual level of performance.

Two Methods of Interpretin

g Scores

Absolute or Criterion-Referenced GradingGrading is based on fixed or absolute standards where grade is assigned based on how a student has met the criteria or the well-defined objectives of a course that were spelled out in advance.

Advantages:Matches learner performance with clearly defined objectivesDiscourages competition

Disadvantages:Difficulty in establishing clearly defined learning outcomes and setting standards that indicate masterySubject to error leniency Scores depend on the difficulty of the test

Relative or Norm-Referenced GradingAlso known as grading on the curveBased on comparing learners’ performance to each other

Advantages:May result in higher level or more complicated assessments that can be challenging to learners.May ensure the distribution of grades on the basis of scores in relation to one another, regardless of the difficulty of the test.

Disadvantages:Encourages competition among learnersMay affect learners’ social relations

GUIDELINES IN GRADING STUDENTS1. Explain your grading system to the students early in the course and remind them of the grading policies regularly.2. Base grades on a predetermined and reasonable set of standards.

3. Base your grades on as much objective evidence as possible.4. Base grades on the student’s attitude as well as achievement, especially at the elementary and high school level.

5. Base grades on the student’s relative

standing compared to classmates.

6. Base grades on a variety of sources.

9. When failing a student, closely follow school procedures.

7. As a rule, do not change grades.8. Become familiar with the grading policy of your school and with your colleagues’ standards.

The end

65. Study this group of test which was administered with the following results, and then answer the question that follows.

z score-1.3-1.11.8

Compute the Z scoreformula: z=score – mean/SD

Subject Mean SD Rommy’s Score

Math 56 10 43

Physics 41 9 31

English 80 16 109

100. Study this group of tests which was administered with the following results, and then answer the questions.

Compute the z score: z = score – mean / SDSubject Mean SD John’s Score

Z score

Math 56 10 43 - 1.3

Physics 41 9 31 -1.1

English 80 16 109 1.8

115. Study this group of tests which was administered with the following results, and then answer the question.

In which subject(s) were the sores most homogenous?

The LOWER the STANDARD DEVIATION (SD), the more HOMOGENOUS the SCORES will be.

Subject Mean SD Jamil’s Score

Math 40 3 58Physics 38 4 45English 75 5 90

measurement and evaluation ppp

Documents

an evaluation of functional size measurement...

turkey school evaluation national measurement and evaluation...

final evaluation of unicef’s programme “making ppp work...

measurement,assessment and evaluation

performance measurement to evaluation

educ 714: measurement & evaluation for decision-making...

evaluation of ppp procurement structures – benchmarking...

the definition, measurement, and evaluation of tax ... ·...

educational measurement, assessment and evaluation

5g ppp use cases and performance evaluation models

evaluation of measurement data

performance evaluation of measurement algorithms used...

educational measurement and evaluation

basic of measurement & evaluation

running head: evaluation of measurement...

evaluation & performance measurement

ppp impact evaluation report

measurement & evaluation

training-measurement & evaluation

measurement & evaluation [2006_nou nigeria]