ss18 item+analysis+on+the+validity+of+english+summative+test+for+the+first+year+students

ITEM ANALYSIS ON THE VALIDITY OF ENGLISH SUMMATIVE

TEST FOR THE FIRST YEAR STUDENTS

(A Case study at the first year SMP YPPUI Ciledug Tangerang School year

2005/2006)

A skripsi

Submitted to the English Teachers Training Program as Partial Fulfillment

of the Requirements for the Degree of Sarjana Pendidikan

By:

Ade Rosita

Reg. No. 102014023778

FACULTY OF TARBIYAH AND TEACHERS� TRAINING

STATE ISLAMIC UNIVERSITY

SYARIF HIDAYATULLAH JAKARTA

2006

id768968 pdfMachine by Broadgun Software - a great PDF writer! - a great PDF creator! - http://www.pdfmachine.com http://www.broadgun.com

ITEM ANALYSIS ON THE VALIDITY OF ENGLISH SUMMATIVE

TEST FOR THE FIRST YEAR STUDENTS

(A Case study at the first year SMP YPPUI Ciledug Tangerang School year

2005/2006)

A Skripsi Presented to Tarbiyah and Teachers Training Faculty In Partial of

the fulfillment for Sarjana Degree (S1)

By:

Ade Rosita NIM. 102014023778

Approved by:

Drs. Nasrun Mahmud, MPd. Advisor

ENGLISH DEPARTMENT

FACULTY OF TARBIYAH AND TEACHERS� TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

2006/1427 H

8

CHAPTER II

THEORITICAL FRAMEWORK

A. The Meaning of Test

One of the evaluation instruments is a test. There are many meanings of

the test. M. Buchari as quoted from Suharsimi Arikunto said that �Test is a trial

which is held to know some results from a certain subject which is taken from a

student or a group of students�11 By testing teacher can know the ability of

learning that students have.

According to Anthony J Nitko, Test is �Systematic procedure for

observing and describing one or more characteristics of person with the aid of

either a numerical of category system.12 Groundlund defined, �the evaluation is a

systematic process of determining the extent to which instructional objectives are

achieved by a student�.13

By testing, teacher can know the ability of learning that students have.

Mochtar Buchari said that �Test is a trial which is held to know some results from

a certain subject which is taken from a student or a group of students�.14

11 Dr. Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta: Bina

Aksara,1987), p.199 12 Anthony J. Nitko, Educational Test and Measurement An Introduction, (NewYork.

Harcourt Brace Jovanovic, Inc.1983), p.6 13 N.E. Groundlund, Measurement and Evaluation in Teaching, (USA : Mc. Millan

Publishing Company, 1985). p.25 14 M. Buchari M. Ed, Tehnik-tehnik dalam Evaluasi Pendidikan, (Bandung, 1980). P.119


9

B. Types of test

There are many types of test used to measure students achievement. There

are four types of achievement test which are commonly used by teachers in the

classroom:

1. Placement test

A placement test is designed to determine the pupils� performance at the

beginning of instruction.

2. Formative test

It is used at the end of a unit in the course book or after a lesson designed.

The result of this test will also give the students immediate feed back.

3. Diagnostic test

Diagnosric test is intended to diagnose learning difficulties during

instruction. Thus, the main aim of diagnostic test is to determine the

causes of learning difficulties and then to formulate a plan for a remedial

action.

4. Summative test

The summative test is intended to show the standard that the students have

now reached in relation to other students at the same stage. Therefore it

typically comes at the end of a course or unit of instruction.15

15 Drs. Wilmar Tinambunan, Evaluation of students achievement, (Jakarta:

Depdikbud.,1998), p.7-9

10

Based on statement above the writer could summarize that generally test is

a systematic and objective procedure to find out the knowledge and the ability of

what have been learned from someone.

While there are a number of tests that teachers usually carry out in the

classroom, however for practical purpose. The writer presents only two of them.

Both which directly related to the analysis written in this skripsi; they are

formative and summative test. Norman E. Grounlund states that �Formative test is

used to monitor the learning process during the instructional program usually

teachers make the test by themselves�.16

Furthermore, with the summative test given for the students, teachers are

not only having a final report about the programs achievement, but also the

comparison among their individual students� ability and achievement in the

instructional objectives of teaching learning activities.

C. Types of Item

The Question, exercises and tasks appearing on the test are called items

the kinds of items on test are:

1. The letter type of items called choice items, which includes true false item,

multiple choice items, and matching exercise.

2. Completion items, present and incomplete sentence and examine is

required to supply a word or short phrase that best complete the sentence.

16 Norman E. Ground lund, Measurement and Evaluation in Teaching, (New York:

Macmillan publishing Co., Inc., 1981), 4 th, p.6

11

3. Short answer items, in this type of item, the students usually is not free ton

give expression to creative and imaginative thoughts.

4. Essay items, permit the testing of a student�s ability to organize ideas and

thoughts and allow for creative verbal expressions.

D. The Criteria of a Good test.

1. Validity

JB. Heaton said, �The validity of a test is the extent to which it measures

what it is soppossed to measure and nothing else�.17 The validity of a test

must be considered in measurement in this case there must be seen wheter

the test used really measures what are supposed to measure, briefly. The

validity of a test is the extent to which the test measures what it is intended

to measure. There are four types validity:

a. Face validity

Face validity means the way the test looks to the testiest, teachers,

moderators, and administrator. Therefore it is useful to show a test to

colleagues or friends in order to discover absurdities and ambiguities

of a test.

b. Content validity

Content validity is concerned with the materials that the students have

learned. The test should cover samples of the teaching materials given.

To fulfill this the teacher should refer his consideration to the teaching

17 JB. Heaton, Writing English Language test, (Longman1998), p. 153

12

syllabus. JB. Heaton says �Content validity depends on careful

analysis of the language being tested and of the particular course

objectives; the test should be so constructed as to contain a

representative sample of the course�.

c. Construct validity

Construct validity deals with construct and underlying theory of the

language learning and testing. JB. Heaton states. �If the test has

construct validity it is capable of measuring certain specific

characteristics in accordance with a theory of language and behavior

and learning�.

d. Empirical validity

There are two kinds of empirical validity : Concurrent validity and

Predictive validity which depend on whether the test scores are

correlated with subsequent or concurrent criterion measures.

If we use a test of english as a second language to screen university

applicants and then correlate test scores with grades made at the end of

the first semester, we are attempting to determine predictive validity of

the test. If, on the other hand, we follow up the test immediately by

having an English teacher rate each student�s English proficiency on

the basis of his class performance during the first week and correlate

the two measures, we are seeking to establish the concurrent validity of

the test.18

18 Ibid., p. 154-155

13

2. Reliability

A test shoud be reliable as a measuring instrument. A test cannnot measure

anything well unless it measures consistently. According to J.Charles

Alderson, Caroline clapham and Dianne wall . �a test cannot be valid

unless it is reliable�19

If the test administered to the same students on the different occasion and

there is no difference to the results. It can be said that the test is reliable.

3. Practicality

The third characteristics of a good test is practicality or usability in the

preparation of a new test. The teacher must keep in mind a number of very

practical considerations which involves economy, ease of administration,

scoring and interpretation of result.

Economy means the test is not costly. The teachers must take into account

the cost percopy, how many scores will be needed, (for the more personnel

who must be involved in giving and scoring a test, the more costly the

process becomes). How long the administering and scoring of it will take,

choosing a short test rather than longer one.

Ease of administration and scoring means that the test administrator can

perform his task quckly and efficiently. We must also consider the ease

with which the test can be administered.

19 J. Charles Anderson, Caroline Clapham and Dianne Wall, Language Test Construction

and Evaluation, (British: Cambridge University Press, 1995), p.187

14

Ease of interpretation and application JB. Heaton states �The final point

concerns the presentation of the test paper it self�, where possible , it

should be printed or type written and appear neat, tidy and aesthetically

pleasing. Nothing is worse and more disconcerting to the testiest than

untidy test paper, full of miss spellings, omissions and corrections. �if it

happens, it will be easy for the students or testiest easy to interpret the test

items�.20

Besides having a good criteria, the other characteristics of the test that�s

more important and specific is the quality of the test items. To know the quality of

the test items, teachers should use a method called item analysis.

E. Item Analysis

There are several meanings of what item analysis. According to Anthony J

Nitko, in his book, he stated that: �Item analysis refers to the processof collecting,

summarizing, and using information about individual test items especially

information about pupils� response to items�21

Item analysis is an important and necessary step in the preparation of good

multiple choice test. Because of this fact; it is suggestested that every classroom

teacher who uses multiple choice test data should know something of item

analysis. How it is and what it means.22

20 Op cit., p. 161 21 Anthony J. Nitko, Educational Test and Measurement an Introduction, (New

York:Harcourt Brace Jovanich inch., 1983), p.284 22 Jhon W. Oller, Language Test at School , (London: Longman group., 1979), p. 245

15

For the teacher made test , the followings are the important uses of item

analysis: determining whether an item functions as teacher intended, feed back to

students about their performance and as a basis for class discussion, feed back

about pupil difficulties, area for curriculum improvement, revising the item and

improving item writing skill.

Item analysis usually provides two kinds of information on items:23

1. Item facilty, which helps us decide if the test items are at the right

level for the target group, and

2. Item discrimination, which allows us to see if the individual items are

providing information on candidates� abilities consistent with that

provided by the other items on the test.

Item facility expresses the proportion of the people taking the test who got

a given item right. (item difficulty is sometimes used ton express similar

information, in this case the proportion that got an item wrong). Where the test

purpose is to make distinctions between candidates, to spread them out in terms of

their performance on the test, the items should be neither too easy nor too

difficult. If the items are too easy, then people with differing levels of ability or

knowledge will all get them right, and the differences in ablity or knowledge will

not revealed by the item. Similarly if the items are too hard, then able and less

able candidates alike will get them wrong and the item will not help us in

distinguishing between them.

23 H.G Widdowson, Language Testing, (Oxford: University Press., 2000), p.60

16

Analysis of item discrimination addreses a different target: consistency of

performance by candidates acrross items. The usual method for calculating item

discrimination involves comparing performance on each item by different groups

of test takers: those who have done relatively poorly. For example, as items get

harder, we would expect those who do best on the vest overall to be ones who in

the main get them right. Poor item discrimination indices are signal that an item

deserves revision.

If there are a lot of items with problems of discrimination, the information

coming out of the test is confusing , as it means that some items are suggesting

certain candidates that realtively better, while order individuals are better, no clear

picture of the candidates� abilities emerges from the test.(The scores, in other

words, are missleading and not reliable indicators of the underlying abilities of the

candidates) such a test will need considerable revision.24

24 Ibid., p.61

17

CHAPTER III

RESEARCH METHODOLOGY AND FINDINGS

A. RESEARCH METHODOLOGY

1. Research Design

a. Research Method

To solve this problem which is presented in the statements of the

problems and the limitation of the study, the writer did both the library study

that is by reading some books relating to the characteristics of a good test of

English test and the field research by analyzing the test paper, making the

interview and taking the test instrument of the test form which will be

analyzed as data.

b. Time and Location

The writer took the test paper and the test instruments on 21th

December 2005. The school which was used as the case study is Senior

High School Ciledug which is locating at Jl Raden Fatah no.36 Sudimara

Barat Ciledug Tangerang.

c. Techniques of Sample Taking

In this research, the writer took the sample from the first year students

of SMP YPPUI Ciledug Tangerang. The total number students which are

taken as the sample are 40 students.


18

d. Techniques of Data Collecting

To collect data the writer needed, she used the steps below:

1). Observation

In implementation of her observation, the writer had done some activities,

namely, by visiting the school to ask for the test result (Summative test) of

English Subject from the school in order to know the students summative

test and asking for the question sheet of English Subject to be analyzed.

The writer did interview with the English teacher of the first class of SMP

YPPUI Ciledug. And take the statements of research from Head Master of

SMP YPPUI Ciledug, Tangerang.

2). Documentation

Documentation means collecting the files or data of related information

including the result of first grade students� examination of odd semester.

e. Techniques of Data Analysis

The Data Analysis of this research, the writer used the descriptive analysis

and the quantitative research method; the writer processed and analyzed the

data by using the formulas as follows:20

r = N∑ X Y � (∑X) (∑Y)

√ [ N∑2 � (∑X)2 ] [ N∑ Y2 � (∑Y)2 ] Where:

r = Validity of item

20 Dr. Sumarna Surapranata, Analisis, Validitas, Reliabilitas, dan Interprtasi hasil tes

implementasi kurikulum 2004, (Bandung: PT Remaja Rosdakarya Bandung., 2004), p.74

19

X = Deviation Squared

N = Total number of testiest

Y = Total number of responses

2. Research Findings

a. Description of Data

The type of the test which is studied by the writer is summative test. The

summative test is final test of odd semester for the first year students of Junior

High School for the academic year 2005-2006, meanwhile the multiple choice

items are 40 items. Each item is consisting of stem and four options which

include one of them is the key and the other is distracter. The test was held on

Wednesday, 21th December with the total time which is given the teachers for

answering the test items are 90 minutes.

Based on the explanation of the first class English Teachers of SMP YPPUI

CILEDUG (The school which becomes the place of the study for this case) in

the interview with the writer, the process organizing of the test is by

establishing from Regional Office of National Education.

b. Analysis of Data

The result of moment calculation shows that validity of item no. 1 is the

same as (equals) key answer, (key answer of this item is= B). The result is

20

0,914. The positive mark shows that the items have been useful as it should be

(see the table 6 in appendix).

The negative mark on validity of item for no. 2 is -0,302 shows that the

key answer is not useful should be. That means lower group will response with

right on key. But upper group answer is wrong. (see the table 7 in appendix)

The result of moment calculation shows that validity of item for no. 3 is

the same as (equals) key answer, (key answer of this item is = C). The result is

0,386. The positive mark shows that the items have been useful as it should be,



the same as (equals) key answer, (key answer of this item is = A). The result is











21













The result of moment calculation shows that validity of item for no. 10

is the same as (equals) key answer, (key answer of this item is = A). The result

is 0,562. The positive mark shows that the items have been useful as it should


The result of moment calculation shows that validity of item no. 11 is

the same as (equals) key answer, (key answer of this item is = B). The result is



22


is the same as (equals) key answer, (key answer of this item is = D). The result


be, (see the table 17 in appendix).



right on key. But upper group answer is wrong, (see the table 18 in appendix).


is the same as (equals) key answer, (key answer of this item is = B). The result













23











The result of moment calculation shows that validity of item for no.20 is












24

















is the same as (equals) key answer, (key answer of this item is = C). The result



The negative mark on validity of item for no.28 is -0,315 shows that the



25























26



The negative mark on validity of item for no.35 is -0,070 shows that the




















27



c. The Interpretation of Data

Based on the data analysis the writer would like to conclude item

validity of English Summative test, there are 6 items (no.2,13,25,26,35, and 39)

which must be revised because of the negative mark show that the items have

not been useful as it should be.

The other items have positive mark that shows that the items have

been useful as they should be. Based on the computation of the result of validity

analysis there 6 items (no.5,7, 10,17,18, and 22) have enough quality, 11 items

(no.3,4,6,16,20,21,24,29,31, and 32) have low quality, and 17 items

(no,1,8,11,12,14,15,19,23,26,17,30,33,36,37,38, and 40) have lowest quality.

.

27

CHAPTER IV

CONCLUSION AND SUGGESTION

A. CONCLUSION

Based on the data analysis and the data interpretation in the previous chapter,

the writer would like to conclude that item validity of English test for odd semester

for the first year students of SMP YPPUI CILEDUG is as follows:

Based on the statistical calculation of item validity (empirical validity), the

writer interprets that the empirical validity of the test in the level of �badness�

because there are 18% which is must be revised, 18% has enough quality, 24% has

low quality and 40% has lowest quality.

B. SUGGESTION

From the conclusion written above the writer would like to give some

suggestion as follows:

1. To fulfill the characteristic of a good test , the items should be examined and

analyzed the test first.

2. The content of the test should be suitable with the curriculum and the GBPP

and should not deviate from the material which were given to the testees.

3. The test maker should make particular objectives of the test which are related

to the curriculum, so that the test would be representative enough to the

curriculum


29

BIBLIOGRAPHY

Anas, Sudijono, Prof, Drs., Pengantar Evaluasi Pendidikan, Jakarta: PT. Raja

Grafindo Persada, 2003, Cet. ke 4

Anas, Sudijono, Prof, Drs., Pengantar Evaluasi Pendidikan, Jakarta: PT. Raja

Grafindo Persada, 2005, Cet. Ke 5

Bailey, kathleen M, Learning About Language Assesment: Dilemmas Decisions, and

Directions, United States of America: ITP an Interantional Thomson

Publishing Company., 1998

Brown, Douglas, H., Teaching by Principles an Interactive Approach to Language

Pedadogy, Longman: San Francisco University 2001,Second Edition

Gronlund, N.E., Measurement and Evaluation in Teaching, New York: Macmillan

Publishing co., 1985, fifth edition

Harris, David P., Testing English as Second foreign Language. New York: Tata Mc

Graw Hill. Inc.1969

Heaton, J.B., Writing English Language Test, Longman:1988.

Murcia, Celce, Marianne, Teaching English as a second or foreign Language, Second

Edition. United States of America., 1991.

Nitko, J. Anthony. Educational Test and Measurement An Introduction. New York,

Harcourt Brace Jovanich inch. 1983.

Nunnaly, Jum C., Educational Measurement and Evaluation. New York: Mc Graw

Hill Book Company, 1964.


30

Nurkancana, Wayan and Sumartana, Evaluasi Pendidikan, Surabaya : Usaha

Nasional, 1986.

Oller, Jhon W., Language Test at School, A Pragmatic Approach, London: Longman

., 1979.

Purwanto, Ngalim, Prinsip-prinsip dan Tehnik Evaluasi Pengajaran, Bandung: PT

Remaja Rosdakarya.,1989.

Slameto, Drs., Evaluasi Pendidikan, Jakarta: Bumi Aksara.,2001, Cet. Ke 3

Sudjana, Nana, Drs., Penilaian Hasil Proses Belajar Mengajar, Bandung: PT Remaja

Rosdakarya., 1991

Supranata, Sumarna, Dr., Analisis, Validitas, Reliabilitas dan interpretasi hasil test

Implementasi kurikulum 2004, Bandung: PT Remaja Rosdakarya.,2004.

Tim Penyusun, Pedoman Penulisan Skripsi, Tesis, dan Disertasi, Cet 2, Jakarta: UIN

Jakarta Press., 2002.

Widdowson, H G. Language Testing, Oxford University, 2000.

Wilmar Tinambunan, Evaluation of students Achievement, Jakarta: Depdikbud.,1998

ss18 item+analysis+on+the+validity+of+english+summative+test+for+the+first+year+students

Documents

types of test

formative test

placement test

meaning of test

educational test

good test

types of achievement

year students