wolfgang glänzel 1,2, koenraad debackere 1 on the “multi-dimensionality” of ranking some...

Post on 13-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

wolfgang glänzel1,2, koenraad debackere1

on the “multi-dimensionality” on the “multi-dimensionality” of rankingof ranking

some methodological and some methodological and mathematical questions to be mathematical questions to be solved in university assessmentsolved in university assessment

1 k.u.leuven, centre for r&d monitoring (ecoom), leuven (belgium)2 hungarian academy of sciences, iprs, budapest (hungary)

STRUCTURE OF THE PRESENTATIONSTRUCTURE OF THE PRESENTATION

1.Objectives of the presentation

2.What is ranking?

3.Three fundamental pillars of ranking

3.1 Reliability of data

3.2 Validity of constructs and methodology

3.3 Correctness of the mathematical-statistical approach

4.Conclusions

2

1. OBJECTIVES OF THE 1. OBJECTIVES OF THE PRESENTATIONPRESENTATION

According to the Berlin principles complied by the International Ranking Expert Group [IREG, 2006] ranking methodology should be based on reliable sources, valid measures and must provide robust and reproducible results.

Thus we will mainly focus on these issues. But we will also devote attention to some mathematical-statistical aspects of ranking since whenever we measure objects and compare and gauge the outcomes, we enter the domain of mathematics, and have to make sure that the fundamental criteria of mathematical correctness are met as well.

3

The aim of this presentation is neither to give an exhaustive and profound methodological approach nor to evaluate existing ranking exercises.

We merely wish to contribute to the discussion of the possible improvement of HEI ranking with some thoughts on the cognitive and conceptual approach to ranking exercises in general and to HEI ranking in particular.

The discussion will be supplemented by examples which will be taken from the world of bibliometrics.

4

2. WHAT IS RANKING?2. WHAT IS RANKING?

Definition: Ranking is positioning comparable objects on an ordinal scale based on a non-strict order relation among (statistical) functions of measures or scores associated with those objects.

5

GENERALISATION GENERALISATION

6

Remark: For k = 1, the previous case is obtained with Y1=Y according to Eq. (1), while for k = n the partition is formed by the original individual variables with j = i, where each subset contains exactly one element, i.e., we have Yi = Xi.

PROBLEMS IN USING COMPOSITE PROBLEMS IN USING COMPOSITE INDICATORSINDICATORS

The following problems usually occur if several variables are (at least partially) bundled into a single measure.

• Possible interdependence of components

• Altering weights can result in different ranking outcomes

• Results might be obscure and irreproducible

• Random errors of statistical functions are usually ignored

• The multi-dimensional space is crashed into linearity and information is definitely lost

7

3. 3. THREE FUNDAMENTAL PILLARS OF THREE FUNDAMENTAL PILLARS OF RANKINGRANKING

3.1 Reliability of data (cleanness and completeness) Data sources are surveys, databases or data-mining

3.2 Validity of constructs and methodology (soundness and appropriateness) This comprises conceptual questions, the definition of variables, measures, weights etc. as well as the time dimension.

3.3 Correctness of the mathematical-statistical approach

8

3.1 RELIABILITY OF DATA3.1 RELIABILITY OF DATA

This is the precondition for all ranking exercises. The best methodology and soundest mathematical approach cannot correct what data collection might have distorted.

All data sources have their specific advantages and limitations. Data-mining might leave the greatest extent of uncertainty while databases provided pre-processed data.

The following example from bibliometrics illustrates that, even if the exercise is based on large databases, cleanness and completeness is not a trivial matter.

Data collection for large-scale ranking still remains a challenge if it is at all feasible.

9

EXAMPLE FOR SPELLING VARIANCES OF A GERMAN UNIVERSITY IN THE WEB OF SCIENCE DATABASE

UNIV ERLANGEN NURNBERGFRIEDRICH ALEXANDER UNIVUNIV HOSP ERLANGENUNIV ERLANGENUNIV ERLANGEN NUREMBERGFRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERGUNIV HOSP ERLANGEN NURNBERGUNIVERSITAT ERLANGEN NURNBERGKLINIKUM UNIV ERLANGEN NURNBERGFRIEDRICH ALEXANDER UNIVERSITAT ERLANGEN NURNBERGUNIV ERLANGEN NURNBERG POLIKLINPOLIKLIN UNIV ERLANGEN NURNBERGUNIV ERLANGEN NURNBERG KLINIKUMFRIEDRICH ALEXANDER UNIV ERLANGENERLANGEN UNIV HOSPFAU UNIV ERLANGENKLINIKUM FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBEUNIV ERLANGEN NURNBERG KLINUNIV KLINIKUM ERLANGEN NURNBERGERLANGEN UNIVUNIV ERLANGEN NURNBERG HOSPFRIEDRICH ALEXANDER UNIV ERLANGEN NUREMBERGFRIEDRICH ALEXANDER UNIV POLIKLINUNIV CLIN ERLANGEN NURNBERGUNIV ERLANGEN NURNEMBERGUNIV ERLANGER NURNBERG

DR REMEIS STERNWARTE UNIV ERLANGEN NURNBERGERLANGEN NUREMBERG UNIVERLANGEN NUREMBURG UNIVFAU UNIVFREDRICH ALEXANDER UNIV ERLANGEN NURNBERGFRIEDRICH ALEXANDER UNIV ERLANGEN NUERNBERGFRIEDRICH ALEXANDER UNIV NURNBERGFRIEDRICH ALEXANDER UNIV NURNBERG ERLANGENHOSP UNIV ERLANGEN NURNBERGKLIN FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERGKOPFKLIN UNIV ERLANGEN NURNBERGKUNIV ERLANGEN NURNBERGUB ERLANGEN NURNBERGUNIV ERLANDGEN NURNBERGUNIV ERLANGEN NURNBERG KINDERKLINUNIV ERLANGEN NURNBERG KOPFKLINIKUMUNIV ERLANGEN NURNBERG PSYCHIAT KLINUNIV ERLANGEN NURNBURGUNIV ERLANTEN NURNBERGUNIV EYE CLIN ERLANGEN NURNBERGUNIV EYE HOSPITAL ERLANGEN NURNBERGUNIV GRLANGEN NURNBERGUNIV KLIN ERLANGEN NURNBERGUNIV LIB ERLANGEN NURNBERGUNIV NURNBERG ERLANGEN

FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG (#69)

10

3.2 VALIDITY OF METHODOLOGY 3.2 VALIDITY OF METHODOLOGY

The main conceptual issues in university ranking are as follows.

3.2.1 Selective vs. integrated (i.e., “holistic”) approach

3.2.2 Global vs. “local” (i.e., national/regional) approach

3.2.3 Multidimensional vs. linear ranking

3.2.4 Scalar scoring vs. grouping into classes

• In the case of classes: pre-defined vs. self-adjusting classes

11

3.2.13.2.1 SELECTIVE VS. INTEGRATED RANKING SELECTIVE VS. INTEGRATED RANKING

Evaluation of education

In 1993 a national education-related university ranking was published in Germany. The ranking was survey-based. Questionnaires were sent to students and professors.

A breakdown by fields was presented as well to give a more differentiated picture, to reveal “strengths and weaknesses” and to help students and academic staff make a selection.

12

The German university ranking in 1993 …

… and the Hungarian version in 2007

13

Research performance

With the “Shanghai Ranking”, published since 2003, the focus was shifted to research assessment. This world-wide ranking was to a large extent facilitated by the availability of the multidisciplinary bibliographic databases SCIE, SSCI and their derivatives.

“Holistic” approach

The broader approach chosen by THES-QS, which is largely relying on peer review score, could not overcome the limitations of previous attempts and remained controversial as well.

14

3.2.2 GLOBAL VS. LOCAL APPROACH3.2.2 GLOBAL VS. LOCAL APPROACH

Different approaches have their specific clientele. Thus “education ranking” can help parents, students and academic staff make a choice for studies or a career in higher education.

Ranking on “research performance” can, for example, provide valuable information for science policy and funding agencies.

Ranking on important “third stream activities” might provided useful information to industry, agriculture and all non-academic entities that might become potential partners of HEIs.

15

Differences and peculiarities of national educational and accreditation systems as well as problems in the technical execution of such ranking occur in the evaluation of education and third-stream activities. Such endeavours are therefore restricted to the regional and national level.

Integrated quantification of university performance and a world-wide ranking based on all HEI activities, including education, research and third-stream activities remains – at least for the present – utopian.

16

3.2.3 MULTIDIMENSIONAL VS. LINEAR RANKING3.2.3 MULTIDIMENSIONAL VS. LINEAR RANKING

17

We consider a ranking “multidimensional” if it consists of multiple components that cannot be simply expressed by functions of each other, i.e., according to Eq. (2) we assume Yi ≠ f(Yj) if i ≠ j. In the ideal case {Yi } forms a set of independent variables, but in practice this is rather the exception to the rule.

The practice of selective ranking as such indicates the existence of different dimensions in university ranking.

Bibliometrics can serve as one of these components, but forms a multidimensional space on its own.

• Education

• Research

• Third mission

Bibliometrics

University activities

Research output

Scientific communication

18

THE FUNCTION OF BIBLIOMETRICS IN QUANTIFYING AND MEASURING ACTIVITIES OF

HEIS

19

The underlying variables (e.g., Xi or Yj) represent factors influencing performance. These factors are not always separable and they are often interdependent.

For instance, the variables academic personnel, publication output and citation impact are expectably not independent.

In such cases, weighting of variables can hardly be controlled and might have unpredictable effect upon other variables defining the composite indicator.

Another example: Science fields can form the components of the multidimensional approach. These have strong influence on academic staff, publication activity and citation impact, and therefore cause interdependences among the corresponding variables.

0%

5%

10%

15%

20%

25%

30%

35%

40%

BIO AGR MDS GSS TNS CHE GRM SPM

Belgium Finland Spain

EXAMPLE FOR DIFFERENT NATIONAL CLUSTER PROFILES OF EUROPEAN RESEARCH

INSTITUTIONS

Source: Thijs & Glänzel, 2008 based on WoS, Thomson Reuters

20

21

The following example illustrates that not all variables used for ranking exercises are independent indeed.

24 institutions from twelve European countries have been selected. Papers have been published in the period 2001-2003, citations have been counted for 3-year citation windows (beginning with the publication year) each.

If profile clusters and/or subject fields are not separated and used as different “dimensions”, subject normalisation might help eliminate subject-related biases.

22

Source: WoS, Thomson Reuters

VARIABLES MIGHT NOT BE INDEPENDENT

(y = 6.2x; r = 0.946)

23

NORMALISATION AND SEPARATION OF VARIABLES

Source: WoS, Thomson Reuters

3.2.4 SCALAR SCORING VS. CLASSES3.2.4 SCALAR SCORING VS. CLASSES

24

The disadvantage of linear scoring is the problem of ties. The standard errors of statistics might implicate that even different positions in the ranking list must sometimes be interpreted as ties.

Instead of scalar scoring (i.e., positioning objects on an ordinal scale according to their indicator values Yj) objects with similar scores can be grouped into classes.

Classes might be pre-defined according to some criteria (e.g., poor, fair, good, very good, excellent etc.), or be self-adjusting according to some mathematical rules.

25

Advantages of pre-defined classes: Number of classes can be chosen according to the needs; the same categories can be used for all dimensions.

Disadvantages of pre-defined classes: Arbitrariness and difficulty in finding common criteria for classification.

Advantages of self-adjusting classes: Less arbitrary and adjust themselves whenever the underlying system changes. (Important for longitudinal approaches)

Advantage of pre-defined classes: Difficult algorithms (see above pinprick and separation problem)

PINPRICK AND SEPARATION PROBLEM (SKETCH)

26

We can summarise the above thoughts as follows.

• The idea of ranking HEIs according to simple, seemingly objective and robust indicators is perhaps tempting, but robustness is easily lost by building composite indicators with partially interdependent or even incompatible components and arbitrary weights. Reality is more complex than to be described this way.

• Resisting any temptation to choose the holistic approach of HEI ranking, a more sophisticated, complex analysis is necessary to grasp and to reflect several important aspects of performance among the manifold of university activities.

4. CONCLUSIONS4. CONCLUSIONS

27

• Because of the discussed issues, multi-dimensional approaches should be preferred to linear ranking and score classes to scalar scoring.

• Standardisation and normalisation help eliminate biases and facilitate longitudinal ranking analysis as well.

• However, comparing HEIs with completely different profiles still remains an exercise of comparing apples and oranges since those might form “complementary subspaces”.

28

top related