on ranking merit

60
On ranking merit: applying the page-rank algorithm to the electoral process Robert Spekkens Perimeter Institute for Theoretical Physics October 19, 2010 WICI seminar

Upload: waterloo-institute-for-complexity-and-innovation

Post on 08-Mar-2016

222 views

Category:

Documents


3 download

DESCRIPTION

On ranking merit: applying the page-rank algorithm to the electoral process Robert Spekkens

TRANSCRIPT

On ranking merit:applying the page-rank algorithm to

the electoral processRobert Spekkens

Perimeter Institute for Theoretical Physics

October 19, 2010WICI seminar

The pagerank algorithm provides a way of ranking the members of a community by merit using the aggregate opinions of the community and without any prior ranking

Call this the merit-rank algorithm This should be considered a module to be incorporated

into broader systems for collective decision-making

Ex: Appointment of the most meritorious members of a community to a particular set of offices

• the most trustworthy to decision-makers• the most fair to jurors• the most expert to policy-makers

During Michael Nielsen's course on the google technology stack in fall 2008, I had this idea.

Outline

• Shortcomings of current schemes• How the merit-rank algorithm works• Case study: Google’s search engine• Case study: Citation networks• Criticisms and Possible failure modes• Beyond pagerank

Two schemes for identifying merit and their shortcomings

By majority vote-Popular opinion may be less reliable than that of a better-qualified minority (the pitfalls of rule by referendum)- Each voter has a short horizon of deep familiarity

By authority- Requires a prior notion of who is best qualified to judge merit- Susceptible to corruption- doesn’t scale well- Each authority has a short horizon of deep familiarity

Merit-rank can hope to avoid some of these shortcomings

How the merit-rank algorithm works

Pagerank as a sloganImportant webpages are those that are linked to

by other important webpages

Merit-rank as a sloganMeritorious individuals are those who are

judged to have merit by other meritorious individuals

What kinds of merit will the algorithm work for?

• Auto-indicating merit: An individual having merit is better able to assess merit in others

Or equivalently,

• Merit that is transitive: If Alice esteems Bob, then she would also esteem those who are esteemed by Bob.

From a majority vote system to the merit-rank algorithm

Ranking a slate of candidates

Ranking the entire community

One vote per personOne unit of voting power per person

Either:- split equally among targets - split arbitrarily among targets

11

1 1

1 1

1

1

1

1

1

11

1 1

1 1

1

1

1

1

1

4.00.33

3.83 1.0

0.33 0.5

0

0

0

0

0

Beyond majority vote: adding recursion

Primitive version of merit-rank algorithmIterate the calculation of individual ranksAt step 0, everyone has equal merit-rank Alice’s merit-rank at step k = Bob’s merit-rank at step k-1

£ Fraction of Bob’s vote cast for Alice + Charlie’s merit-rank at step k-1

£ Fraction of Charlie’s vote cast for Alice + …If the calculation converges, final ranking = merit-rank

11

1 1

1 1

1

1

1

1

1

Merit-rank at step 0

11

1 1

1 1

1

1

1

1

1

Merit-rank at step 0

4.00.33

3.83 1.0

0.33 0.5

0

0

0

0

0

Merit-rank at step 1

4.00.33

3.83 1.0

0.33 0.5

0

0

0

0

0

Merit-rank at step 1

0

0

0

0

0

0.171.33

2.66 3.83

1.33 0.33

Merit-rank at step 2

0

0

0

0

0

0.171.33

2.66 3.83

1.33 0.33

Merit-rank at step 2

0.670.06

5.22 2.66

0.06 0.67

0

0

0

0

0

Merit-rank at step 3

Problems with primitive versionPeople who earn but do not cast any votes are sinks for

merit-rank

0.670.06

5.22 2.66

0.06 0.67

0

0

0

0

0

Sol’n: Uniformly distribute their vote

0.670.06

5.22 2.66

0.06 0.67

0

0

0

0

0

Problems with primitive versionPeople who earn but cast no votes other than to

themselves & Groups who earn but cast no votes other than to their own membership are sinks for merit-rank

Sol’n: Uniformly distribute a fraction of their vote

Problems with primitive versionPeople who earn no votes are left with no voting

power after the first step

0.670.06

5.22 2.66

0.06 0.67

0

0

0

0

0

Sol’n: Uniformly distribute a fraction of every vote

“taxing votes for the common good”

Fraction X of vote uniformly distributed Fraction 1-X of vote distributed at voter’s discretion

(uniform if unspecified)

Standard choice: X=0.15

The merit-rank algorithm

11

1 1

1 1

1

1

1

1

1

Merit-rank at step 0

11

1 1

1 1

1

1

1

1

1

Merit-rank at step 0

4.00.33

3.83 1.0

0.33 0.5

0

0

0

0

0

c

4.0

0.33

3.83 1.0

0.33 0.5

0

0

0

0

0

0.15 + 0.85

=4.0

0.33

3.83 1.0

0.33 0.5

0

0

0

0

0

Merit-rank at step 1

Merit-rank at step 1

Merit-rank at step 1

c

0.15 + 0.85

= …………………

Merit-rank at step 2

final merit-rank

The algorithm always converges to a unique solution

final merit-rank

4.0

0.33

3.83 1.0

0.33 0.5

0

0

0

0

0

Weighted in-degree

Compare

Case study: Google’s search

engine

Sergey Brin and Lawrence Page (1998), "The anatomy of a large–scale hypertextual Web search engine," at http://www-db.stanford.edu/~backrub/google.html

Webpages vote for one another by linking to one another

Every webpage has a unit of voting power which is divided equally among the webpages to which it links

The random surfer picture

with probability 0.85: follows a random link from the webpage she is currently on; or

with probability 0.15: “teleports” to a completely random webpage.

The long-time probability of ending up on a given webpage converges to a fixed value

= the pagerank of that webpage

Note: Average # of steps before teleport: 1/0.15=6.6 Could X=0.15 be effective because of six degrees of separation?

Google’s dominance over other search engines is perhaps the strongest recommendation of pagerank

Case study: Citation networks

Applying pagerank to a citation network

P. Chen, H. Xie, S. Maslov, S. Redner, “Finding Scientific Gems with Google,” J.Informet. 1, 8-15 (2007)

353,268 nodes = all publications in the Physical Review family of journals from 1893–2003

3,110,839 links = all citations to Physical Review articles from other Physical Review articles

A value X>0 is required to prevent all votes to sink to the oldest papers.

Chosen value: X=0.5.

The distribution of # of citations is reasonably approximated by a power law

Strong correlation between # of citations and pagerank

However, outliers constitute exceptional papers

Slater’s paper introduced the determinantform for the many-body wavefunction. This form is soubiquitous in current literature that very few articles actuallycite the original work when the Slater determinantis used. The Google PageRank algorithm identifies thishidden gem primarily because the average Google contributionof the children of S is hGj/kji = 3.51 × 10−6,which is a factor 2.3 larger than the contribution of thechildren of C. That is, the children of Slater’s paper wereboth influential and Slater loomed as a very importantfather figure to his children.
In RM, a model that is essentiallydiffusion-limited aggregation is introduced. Althoughthese authors had stumbled upon a now-famousmodel, they focused on the kinetics of the system andapparently did not appreciate its wonderful geometricalfeatures. This discovery was left to one of the childrenof RM—the famous paper by T. Witten and L. Sander,“Diffusion-Limited Aggregation, a Kinetic Critical Phenomenon”Phys. Rev. Lett. 47, 1400 (1981), with 680citations as of June 2003. Furthermore, the Witten andSander article has only 10 references; thus a substantialfraction of its fame is exported to RM by the GooglePageRank algorithm. The appearance of RM on the listof top-100 Google-ranked papers occurs precisely becauseof the mechanics of the Google PageRank algorithm inwhich being one of the few references of a famous papermakes a huge contribution to the Google number.

Benefits of merit-rank

• Identifies a set of individuals that are more exceptional than the set that majority vote would identify

• Completely democratic yet gives more weight to the opinions of the best qualified

• Plays to our strengths by permitting us to assess only those we know well

Criticisms and Possible failure

modes

Unrecognized merit

Not necessarily a problem The algorithm actually ranks people by their

degree of vetted merit

Requirements for high merit-rank:Having merit + opportunity for other meritorious individuals to recognize it Probably, Malcolm Gladwell’s “connectors” will fare well

Disenfranchisement

Proportional representationSee: Xie, Yan and Maslov, “Optimal ranking in networks with

community structure”, arXiv:physics/0510107

Merit-ranking of groupsIf the algorithm works for individuals, it should work for groups

Voting based on ideology rather than on merit

It is not necessarily a failure of the algorithm if an individual chooses to judge merit primarily in terms of ideology

The network may partition into ideologically homogeneous groups

Still, we have proportional representation for different ideologies

The celebrity failure mode

A large imbalance in degree of recognition can trump considerations of merit

Note: Merit-rank fares better than majority vote (consider the difficulty of gaming google)

Possible fix: A weighting factor in proportion to the depth of a relationship

Are the relevant kinds of merit really auto-indicating?

- the trustworthy can be naive - Experts can fall prey to groupthink

Response: Unreliability of assessments of merit increases in proportion to superficiality of the relationship

Possible fix: A weighting factor in proportion to the depth of a relationship

Few will understand the algorithm

How can a community that doesn’t understand an algorithm ever come to endorse it?

Answer: Trust based on past performance

)

No secret ballot

The algorithm needs to know how everyone voted

But how can one trust the institution that calculates the outcome without making public all of the information and thereby opening the door to bribery and coercion?

Possible Fix: A cryptographic scheme

Beyond pagerank

The HITS algorithm – Hubs and Authorities

Jon Kleinberg, "Authoritative sources in a hyperlinked environment“ Journal of the ACM 46 (5): 604–632 (1999).

Recall: Pagerank as a sloganImportant webpages are those that are linked to by

other important webpages

HITS algorithm as a sloganHubs are webpages that link to authorities, authorities

are webpages that are linked to by hubs

The HITS algorithm – Hubs and Authorities

The HITS algorithm returns two numbers for a webpage:

• Authority value = the value of the content of the page• Hub value = the value of its links to other pages.

Start with authority value = in-degree Hub value = out-degreeWeight in-degree by Hub valuesWeight out-degree by Authority valuesIterate.

Suppose some nodes have only incoming links (pure authorities) and others only outgoing links (pure hubs)Let Pure Hubs be the expertsLet Pure Authorities be the beliefs of the experts

Experts are the people who have the right beliefs. The right beliefs are the ones believed by the experts.

Such a scheme can overcome the problem of unrecognized merit

Using a HITS-like algorithm to rank expertise

Outlook

Consider more complicated schemes (ex: negative votes, combination of HITS and pagerank, etc.)

Numerical tests of the various failure modesFind a good forum for a real-world trial- online gaming community? - facebook applications?