introduction of external, independent testing in “new countries”: successes and defeats of the...

Educational Measurement: Issues and PracticeSummer 2012, Vol. 31, No. 2, pp. 38–44

Introduction of External, Independent Testing in “NewCountries”: Successes and Defeats of the Introduction ofModern Educational Assessment Techniques in Former Sovietand Socialist Countries

Steven Bakker, dutchTest

A particular trait of the educational system under socialist reign was accountability at the inputside—appropriate facilities, centrally decided curriculum, approved text-books, and uniformlytrained teachers—but no control on the output. It was simply assumed that it met the agreedstandards, which was, in turn, proven by the statistics provided by authorities. The introduction ofprofessional testing methods for national large-scale assessment efforts and participation ininternational surveys such as Trends in International Mathematics and Science Study (TIMSS) andProgramme for International Student Assessment (PISA) unveiled the myth of the traditional “all isgood” message spread by the former party rulers. But another aspect of the introduction ofexternal independent testing has probably even had a bigger impact: the fight against the pervasivecorruption in student admission it became part of. If any, the successful use of independentexternal testing in this struggle has done a lot to have its methods accepted by the audience atlarge. The article describes the experiences of the author as a senior consultant to ministries andnewly established testing institutes in former Soviet and socialist republics.

Keywords: admission testing, corruption, former Soviet and socialist republics, consultancy, assessment of higherorder skills

In the early 1990s of the previous century a large numberof “new countries” arrived on the world scene, due to the

collapse of the Soviet Union and Yugoslavia. Part of break-ing away of old traditions, and preparing for acceding theglobal market economy, was the innovation of educational sys-tems. For this, international donors such as the World Bank,the Asian Development Bank, the European Union (EU), USAgency for International Development (USAID), and numer-ous other national foreign aid organizations devoted largesums of money. Curricula and teaching methods proved to bechange-resistant, and the introduction of new concepts wasalso not helped by the poor situation of school facilities andlack of modern textbooks and equipment. The introduction ofmodern educational assessment methods and the establish-ment of institutes for providing independent external testing,however, was believed to make a strong difference on theshort term, and for that reason became a major componentof many educational reform projects in the new countries inthe last decade of the previous century (Bakker, Van Lent, &De Knecht-van Eckelen, 2005).

Exceeding the PlanIn the early nineties the Dutch Ministry of Education negoti-ated a deal with the Russian Federal Ministry for “Cooperationon Educational Standards and Assessment.” As a coworker at

Steven Bakker, dutchTest, Vermeerstraat 16, 6521 LW Nijmegen,The Netherlands; [email protected]

the Dutch Institute for Educational Measurement and oneof the few at the institute with some previous experience inthe former USSR I was invited to contribute to the standardsand testing component (Bakker, 1998). I was excited withthis opportunity to study the way Russian colleagues had or-ganized the assessment of students in general education andhow they made use of test results for managing the processof education. Focusing on the use of tests for certification atthe end of secondary education and admission to higher ed-ucation, my first findings were a bit disappointing, however.I found federal math tests, which were basically referencesto numbers of items in a handbook with several thousands ofmath items, available at each school and extensively trainedduring the last year of education. For other subjects I foundschool-based oral exams according to the so-called “Biljet”system: a student is presented a box or a table full of tickets(“biljets”) mentioning a certain topic, for instance“Mendeleev’s Periodic System of the Elements.” A studentdraws a ticket and, after some time for preparation, is invitedto the testing room and there demonstrates his or her un-derstanding of the given topic to the examiners (Figure 1).While the tickets were prepared by federal specialists, theevaluation of the student’s understanding was entirely at thediscretion of the examiners.

The performance of students was expressed on a scalerunning from 1 to 5, 5 being highest. It was the usage ofthis scale for educational management purposes that was amajor worry for Arkady, my Russian counterpart. It did notat all support his needs to use testing results for obtaining

38 Copyright C© 2012 by the National Council on Measurement in Education Educational Measurement: Issues and Practice

FIGURE 1. Biljet system in Chui, Kyrgyzstan. The girl in the fronthas just drawn her “biljet.” The girl in the back is demonstrating herunderstanding of the topic on her “biljet.”

an objective and detailed view on the performance of theeducational system. It could also not play any role in ad-missions to higher education—one of the intentions of theRussian federal government at that time—as all universitieswere aware that pass scores didn’t mean much. It basicallywas a ritual rudiment from Soviet times, with its strict inputcontrol and rather formal output control. The idea at thattime was very much that if textbooks were OK and available,teachers trained, and the curriculum under control—to theextent that each soviet student would be taught the samecontent at the same time—the output would be adequatelytrained citizens. The exams only needed to prove that, verymuch in the same way as industrial output figures neededto prove the success and superiority of the socialist system,meeting or even exceeding the challenging objectives of the5-year plans. So everybody should pass, except for a few whorefused to cooperate.

During one session Arkady drew a frequency distributionof exam results and indicated the cut score with a dotted line.“This is what I always see here,” he said, “and this is what Iwant to see,” adding a second curve (Figure 2).

‘This what I always see here...’ ‘And this what I want to see!’

1 2 3 4 5

FIGURE 2. Arkady’s wish.

‘our future lorry drivers‘

‘our future HE students’

FIGURE 3. The camel-shaped frequency distribution.

“The way the scale is used now conceals the true situa-tion. Mark 1 is never given, mark 2 is for those who didn’tdo anything at all, 3 for those who cooperated, 4 for thosewho achieved something and 5 for the few best students,” heexplained.

In an effort to replace the traditional way of evaluatingstudent achievement, multiple choice tests for several sub-jects were developed and piloted in a region north of Moscow.The results of these tests indeed created a more meaningfuldifferentiation between students, as demonstrated in the fre-quency distribution we obtained. The camel shape (Figure 3)revealed the situation prevailing in many Russian classroomsat that time, as was confirmed by teachers we interviewed.

There were actually two populations, a regular one thatstudied and received the attention of the teacher, and one thatwas left behind. “Our future lorry drivers and shop assistants,”as one teacher described them. These were also the studentswho simply didn’t answer many of the test items.

Cramming and Trivia TestingAs in many other projects aiming at introducing modern ap-proaches in educational assessment a lot of effort in thisRussian–Dutch project went into training subject specialistsin item writing. Traditional written exams consisted of essayquestions for which students could only be awarded the max-imum score, the tiniest mistake being fatal and leading to azero score. Partial credit was ideologically wrong: “Would youwant your bus driver to take you only half way to your des-tination?” At that time also multiple choice tests were fairlyuncommon in education in former Socialist states. The word“test” primarily referred to psychological testing, and whenwe announced that we wanted to introduce tests as part ofthe school exams we were advised that these could only be

Which Pharaoh built the highest pyramid?

A. Amenhotep IV

B. Ramses I

C. Ramses II

D. Tut-Anch-Ammon

reviewer's comments The item reduces the history of Egypt to apyramid-building contest! This is ‘trivia-testing’: knowledge of facts that are irrelevant in themselves. It is the context that matters: why did Pharaohs build these massive monuments, why was the pyramid shape used, etc.

FIGURE 4. The pyramid contest.

Summer 2012 39

FIGURE 5. The Antarctica item.

administered by a certified and licensed school psychologist.Once we had made it clear that there was little “psychologi-cal” about our tests we came upon another hurdle, one thatI would come across in the years to come in each and everyproject in the “new countries”: the tendency to assess repro-duction of curricular details, setting items of the “Gotcha!”type such as the pyramid item in Figure 4, and the absence ofitems assessing application of knowledge and problem solvingin new, realistic situations.

This tendency was also reflected in the criticism of “newcountries” taking part in international surveys such as PISA—usually in a reaction to their poor results on these tests. PISAtests are not designed to cover some common denominator ofthe curricula of participating countries. Rather they “assesshow far students near the end of compulsory education haveacquired some of the knowledge and skills that are essentialfor full participation in society” (OECD, 2011). The type ofitems that were developed to operationalize this constructcaused huge problems for students from new countries. De-spite the instructions at the start to pick an option for eachmc item, students left many of them unanswered, trained asthey were to only answer questions the right answer they wereabsolutely sure of. Open-ended items often resulted in blankanswer sheets.

It was items such as the Antarctica item in Figure 5(Vendramel Tamassia, Schleicher, & Kirsch, 2002) that gaverise to statements like “This is not what we teach them.”The idea that students should use their knowledge of howto calculate the area of a circle to estimate the area of thisirregularly shaped continent, or to divide it up in rectangles,parts sticking out compensating for parts not covered, was astraight threat to, if not a disapproval of the general way ofteaching in the “new countries.”

Often it turned out that dissemination of good test writ-ing practices resulted in bringing up groups of item writersbeing perfectly capable to produce technically sound items,but testing the—often trivial—facts that students were sup-posed to cram and regurgitate. And every so often it appearedthat the same item writers might be willing to try t heir handat more sophisticated items, testing higher order skills, butwere forced to give up on it by their superiors who were afraidthat they would not be able to defend such items against al-legations that they would be “outside the curriculum.” In oneparticular case only items were accepted that referred di-

rectly to certain paragraphs in one of the approved textbooks,using the same wording and pictures.

In an effort to show how traditional curricular knowledgecould be used in unfamiliar situations, and thereby wouldrequire something more from candidates than just reproduc-tion of it, I developed a series of examples for Science subjects,one of which is shown in the box on the next page. While theseries was a real eye-opener for subject specialists and waskindly welcomed by authorities, it did, in most cases, not leadto radical change of the ubiquitous trivia testing.

For consultants this predilection to fact-testing causes adifficult dilemma. In high-stakes exams with their strongback-wash effect on education such items will stifle any at-tempts of introducing teaching methods that aim at devel-oping critical thinking skills, communication, and problemsolving in every-day situations. Giving in to the pressure un-derstandably exerted by local subject experts to bring testingin line with prevailing teaching practices might be directlyopposed against efforts in curricular innovation projects di-rected to the development of general competences fundedby the same donors. On the other hand, pushing too hardmay lead to items that are misunderstood by students, sim-ply because for them producing the right answer is re-membering it, not constructing it, and to frustrated itemwriters whose efforts are not really appreciated by theirsuperiors.

The Stalemate RevealedAnother problem consultants come across is the fact thatnot just the results of international surveys are alarminglybad in the “new countries,” but also the results on newlyintroduced national school leaving and university admissionexams. Often highly esteemed academicians and teachers oftop-ranking schools are invited as item writers. They tendto seriously overestimate the average ability of students,producing needlessly complicated items. This problem maybe mitigated to a certain extent by selecting more teach-ers from “average” schools for item development teams. Buteven then the proportion of students achieving the basiclevel or above often is disappointingly low. In standard set-ting sessions cut scores are advised by teams of judges be-fore they have seen the actual results that make more than70% of the students fail. In an effort not to get stuck with

40 Educational Measurement: Issues and Practice

The formula of table salt: classical approach

� The correct formula of table salt is

A. KCl

B. KCl2

C. NaCl

D. NaCl2

In context: Is it safe for me to drink this water?

Nino is pregnant and was advised to go on a salt-free diet.

She loves Nabeghlavi mineral water. She checks the label

to decide whether it is safe for her to drink it. This is what

she finds (see picture).

� Explain if it is safe for Nino to drink Nabeghlavi.

Marking scheme

� Maximum score 2

Example of a right answer

‘There are Na+

Na+

and Cl- ions in this water, that means NaCl, which is the formula of table salt. So Nino should

probably consult my doctor before drinking this mineral water in larger quantities’

• For noting that Nabeghlavi mineral water contains and Cl- ions….…1

• For concluding that this means Nabeghlavi mineral water contains salt (so it may not be

safe)……………………………………………………..….1

large quantities of students without a diploma, sometimescut scores close to the guessing score are decided by theresponsible authorities. This, and outcomes of sample-basednational assessments, in fact expose the dramatic situationin many schools where, as Georgian president Saakashvili,shortly after the Rose Revolution in 2003, put it: “Studentshad stopped learning and teachers had stopped teaching.”One of the driving forces behind this stalemate is corruption,teaching and learning being replaced by selling and buyingof access to higher education or higher grades in prestigiousschools.

The introduction of the modern assessment technology,and the establishment of national, and in some cases private,testing institutions in the last two decades helped reveal seri-ous problems in the quality of education. Independent exter-nal testing services becoming available also meant the startof the decline of the power of schools and higher education in-stitutions to run their own selection procedures, which wereinvariably riddled with corruption. “Can corruption be foughtwith tests?” When I started as a consultant, coming from aculture built on mutual trust and transparency, I would nothave known how. But soon it became a main reason for inter-national donors to invest even more in building educational

assessment capacity and make it one of the spearheads in thewar against corruption.

Why Don’t They Want Our Tests?“Travelling from here to the east,” a colleague once explainedto me, “there is a simple rule of thumb for salaries in educa-tion. They decrease by a factor of 4–5 going from one sectorto the next. We start here in Holland with say, 3000 Euro permonth. In the former Eastern bloc they go down to 600 Euro.As soon as we are in Russia we find salaries of 150–200 Euro,and once you’ve got to the ‘stans’1 you will find that 50 Europer month is not uncommon.” This is not sufficient to make aliving, not even in Kyrgyzstan. This puts teachers in the sameposition as the underpaid police officer who needs the fines tosupplement his wages. It is as simple as that, and everybodyshould understand the rest of the story.

Nevertheless I wasn’t aware of all this the first time I en-countered “professors with a side job.” I was working on anUNESCO project with the Moscow Mendeleev University forChemical Technology at a teaching module on chemical in-dustry. I had prepared a collection of test items and familiar-ized Mendeleev staff with western approaches in educational

Summer 2012 41

assessment, which were quite new for them. The rector of theinstitution then invited me to help modernize the Mendeleevadmission exams. “For over a month each year most of my staffis unapproachable, taken up as they are by the oral admissionexams,” he complained. A general written test would be quitea relief. “And more than that,” he added meaningfully.

The project failed before it even started, because of to-tal disinterest at the teachers’ end. The same teachers whoshortly before had attended my seminars full of enthusiasm.I didn’t understand the signal, though.

Paying at the GateThe first time that the reality dawned upon me in full was atthe start of a job I did in Georgia in 1999 as a consultant for theWorld Bank. In a report written by two of my colleagues I founda list of coaching fees. Coaching was a common phenomenonalready in Soviet times and usually carried out by universityprofessors setting the “Biljets” and administering the oral ad-mission tests. The first worry of every parent with a childleaving high school would be to find a middleman who couldput them in touch with the right coach, taking into accountthe means of the parents. The list in the report mentionedfees ranging between 200 and 1,500 USD, but also some muchhigher than that, mounting up to 20,000 USD. The latter onecame with a “degree guarantee.” In many universities coach-ing was very much part of the regular activities of the school.The rector of Georgia’s largest university, Tbilisi State, owneda bank account on which he accrued all coaching fees paid bystudents, and from which he transferred bonuses to his staff,keeping a nice percentage for himself. Once admitted therewas little reason for students to engage in their studies. Alsothe next exam results had to be bought. Giorgi Kandelaki,the leader of the student opposition movement Kmara, in aconversation I had with him in 2003, described the stifling ef-fect of corruption on education. “A student survey we recentlyconducted indicates that out of all Tbilisi State students, 60%hasn’t touched a single text book.”

Hesitation in UkraineAlso Ukraine had opened its borders to western donor agen-cies. After the “Orange Revolution” in 2004 Viktor Yushchenkoreplaced the fraudulently elected Yanukovich and immedi-ately declared the eradication of corruption in higher educa-tion as a top priority (Cabinet of Ministers of Ukraine, 2004).USAID was invited to design a training and consultancy pro-gram for the introduction of standardized external testing atthe end of secondary education. New life was breathed into aninitiative that had failed dramatically before 2004: replacingall University-conducted entrance exams by one single na-tional admission exam. Due to continuing political instabilityit took till 2008 before the first of these exams was adminis-tered. In his foreword in the report of this campaign the thenminister of education described the far-reaching devastatingand demoralizing effects of corruption on education. He wrotein the past tense, but whether these practices indeed belongto the past is the question yet. Prestigious universities arestill permitted to use additional means for selection, and findincreasing support in the parliament. The Ukrainian Centrefor Evaluation of the Quality of Education (UCEQA) estab-lished in 2005 by the same parliament (Cabinet of Ministersof Ukraine, 2005), and charged with the task of administering

the exams for 11 subjects in five languages to half a millioncandidates, is weak and continuously under political attack.In December 2010 its director was replaced by a former mem-ber of the Yanukovich cabinet who, as a vice-minister, haddeclared herself a firm opponent of external testing.

Momentum in GeorgiaInitiatives to replace the university entrance exams by ex-ternal testing came up in almost all of the former soviet andsocialist states, and in many fighting corruption was a majorreason. National centers for testing and examinations wereestablished, some as a department of the ministry of educa-tion, some independent but funded by the state, and some-times even as private organizations delivering testing servicesto universities. Some have become an indispensable part ofthe educational system, e.g., the centers in the Baltic states,Slovenia, and Poland, some are constantly under threat ofmarginalization and have to fight hard for their survival, likeCEATM (the Centre for Educational Assessment and Teach-ing Methods) in Kyrgyzstan (Clark, 2005).

While in Ukraine the process is slow and faltering, Georgiahas a remarkable success story to tell. Already in 1999, stillunder the Shevardnadze rule, Georgia signed an agreementwith the World Bank for a large-scale long-term project foreducational reform. In 2002 this reform led to the establish-ment of the National Assessment and Examination Centre.Soon however, one of the major goals of the reform program,the replacement of the system in which each university con-ducted its own student admission procedures using internallyset and administered instruments by a national system of ex-ternal testing and optimal allocation of students to availablestudy places, came under threat. The Shevardnadze regime,itself more and more spreading the unpleasant smell of cor-ruption, didn’t seem to be willing to take a firm line with thealmighty group of university rectors, and postponed introduc-tion till 2007.

The Rose revolution of November 2003, however, gave theeducational reform project back its momentum. The newlyappointed minister of education committed himself to an im-mediate introduction of the national entrance tests. Whilethe rectors with a smile gave interviews to the press, statingthat they already had implemented this initiative by creatingan immense computerized item bank and an advanced algo-rithm that would allow them to generate any test with onemouse click—only one person having access to that mouse,so security guaranteed—the National Examination Centre(NAEC) was working at setting up a national testing systemwith the help of a small group of international consultants.

Days of TruthThe first administration took place on July 11, 2005, in 14test centers spread over the territory of Georgia. Inside thecenters many hundreds of well-trained administrators andproctors steered the session in the right direction, while ahundred national and international observers monitored theprocess. Outside, about 500 policemen patrolled the centersin a very visible way, and 35 medical doctors were stand-ing by, keeping an eye on the physical condition of theexaminees.

An important aspect was the visibility of the security mea-sures. After all, it wasn’t just necessary to exclude all possible


FIGURE 6. Admission exams in Georgia—the control room in one of the major testing centers. Standing in the middle: the minister ofeducation.

fraud, but also to show to all stakeholders that this indeed hadsuccessfully happened. The CCTV cameras played a specialrole in this. Presumably their presence already kept studentsand administrators from engaging in unethical practices. Butin actuality they produced another and probably even moreimportant effect. The CCTV images that were monitored byNAEC staff in the control room (Figure 6) were also dis-played on screens outside the test centers. Large groups ofparents and relatives had gathered in front of these screensand watched the process in the testing rooms with increasingenthusiasm. “Now we can see that our children are treatedin a friendly and polite way, and not like cattle, as before.We see with our own eyes that they are seated properly, ata good desk on a proper chair, and that each is given a fairchance,” a parent told me. “I was expecting the same gamesthat were played before, but this is totally different!” For themit was a relief not to be caught any longer in a system whereeverybody was cheating and where playing along was the onlyoption.

Soon the opposing rectors and faculty started to under-stand that they had lost the battle and decided to focus onprofits that could be made from students after they had beenadmitted. This eventually led to the dismissal and replace-ment of a large number of corrupt rectors in the followingyear, and to the introduction of a national test for admissionto master programs, the “Georgian GRE.” And while at thesame time the Saakashvili administration was losing muchof its initial credit, friend and foe agreed that the introduc-tion of independent, external admission testing was a mostimportant step forward for Georgia.

ConclusionsAligning to the global market economy by former social-ist states is supported by international donor organiza-tion, amongst others by investments in educational reform

projects. In some cases, for instance accession to the EU,such reforms have been a precondition. These reform projectshave brought a rapid and wide dissemination of primar-ily Anglo-Saxon approaches in educational assessment, in-cluding research-based standard-setting, the use of multiple-choice questions, attention to validity and reliability issues,and advanced models for statistical analyses and interpreta-tion of student outcomes.

With international donor organizations such as the WorldBank granting loans or gifts for establishing educational test-ing centers, a new consultancy market emerged. This newmarket was not without hurdles and dilemmas, as describedabove. In several cases capacity built was lost due to newregimes coming to power and reverting to “old” politics. Forsome larger testing firms such problems gave rise to secondthoughts about their involvement. Profits from consultancyfees are usually much lower than on the home market. Ex-pectations that consultancy would serve as an introductionto new markets opening up for international test publishersdid not come true, education is too much a national matterafter all.

But testing firms may also profit in other ways from reach-ing out to “new countries” than by immediate and sizeableprofits. Working as a consultant abroad certainly helps staffto take distance from what they have always taken for grantedworking in their own culture and environment, allowing themto become more receptive to new ideas and anticipate moreeffectively the changing needs of the home market. And, lastbut not least, countries that have no testing culture and noextensive investments in existing testing machinery are moreopen to new approaches that the home market is not yet readyto absorb. The recent introduction of computer-adaptive test-ing for the large-scale high-stakes school leaving exams in therepublic of Georgia (National Examinations Centre, 2011) isa striking example of that. And in our industry, the “Dialecticsof Progress” apply (Figure 7)!.

Summer 2012 43

FIGURE 7. Dialectics of progress.

Note1The Central Asian republics: Kazakhstan, Kyrgyzstan, Uzbekistan andTajikistan.

References

Bakker, S., Van Lent, G., & De Knecht-van Eekelen A. (2005). Eastmeets West in assessment development: Western technical assis-tance to Eastern needs. Educational Measurement: Issues and Prac-tice, 20(1), 33–35.

Bakker, S. (1998) Educational Assessment in the Russian Federation.In J. Voogt & T. Plomp (Eds.), Education standards and assessmentin the Russian Federation: Results from Russian-Dutch cooperationin education (pp. 113–124). Leuven, Belgium: Uitgeverij Acco.

Cabinet of Ministers of Ukraine. (2004). Resolution 1095, 25.08.2004.Kyiv, Ukraine: Author.

Cabinet of Ministers of Ukraine. (2005). Resolution 1312, 31.12.2005.Kyiv, Ukraine: Author.

Clark, N. (2005, December). Education reform in the former So-viet Union, World Education News & Reviews, 18(6). Available athttp://www.wes.org/ewenr/05dec/feature.htm

National Examination Centre. (2011). The tense exam period is over.Tbilisi, Georgia: Author. Available at http://www.naec.ge/?lang = en-GB

OECD> (2011). Programme for International Student Assessment(PISA). Paris: Author. Available at http://www.pisa.oecd.org/

Vendramel Tamassia, C., Schleicher A., & Kirsch, I. S. (2002). Continentarea. In Sample tasks from the PISA 2000 assessment (pp. 93). Paris:OECD.


introduction of external, independent testing in “new countries”: successes and defeats of the...

Documents