d istributed c omputing g roup

20
The Layered World of Scientific Conferences Michael Kuhn Roger Wattenhofer APWEB 2008 Shenyang, China Distributed Computing Group

Upload: deepak

Post on 24-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

The Layered World of Scientific Conferences Michael Kuhn Roger Wattenhofer APWEB 2008 Shenyang, China. D istributed C omputing G roup. The Proximity of Scientific Conferences. The web around APWeb How does the proximity of conferences look like? Different aspects of proximity Scope Quality - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: D istributed C omputing  G roup

The Layered World of Scientific Conferences

Michael KuhnRoger Wattenhofer

APWEB 2008Shenyang, China

DistributedComputing

Group

Page 2: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 2

The Proximity of Scientific Conferences

• The web around APWeb– How does the proximity of conferences look like?

• Different aspects of proximity– Scope– Quality

• Why do we care about conference proximity?

APWEB

1. WAIM2. WISE3. GCC4. DASFAA5. SKG6. ISPA7. PDCAT8. DEXA9. ICDF10. PAKDD

Page 3: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 3

Application: Conference Search

• Different search types– For related conferences– By keywords– By author

• Based on DBLP– Freely available

• Wiki-Approach for some attributes– Important dates– Location– Link to website

Try it at www.confsearch.org!

Page 4: D istributed C omputing  G roup

4

Page 5: D istributed C omputing  G roup

5

Page 6: D istributed C omputing  G roup

6

Page 7: D istributed C omputing  G roup

7

Page 8: D istributed C omputing  G roup

8

Page 9: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 9

„Social similarity“ and the Conference Graph

• A single author tends to submit to similar conferences– Conferences C1 and C2 are similar if many authors often submit to both of them– Data available from DBLP

• Problem: Conferences have unequal „size“– Just counting the number of authors over-estimates the proximity of large venues– Normalization required:

i

ii

sp

spT

2

2

1

1 ,min

T = 17/50

A1

1/10

A2

1/25

A3

5/25

C1

C2

p11/s1 = 3/25

1/25

5/25 4/10

2/10

1/10

Page 10: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 10

Page 11: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 11

Some Examples

AAAINational Conference on

Artificial Intelligence

IJCAI 0.76ATAL 0.37ICML 0.33AGENTS 0.32AIPS 0.31ECAI 0.26KR 0.25UAI 0.25CP 0.23FLAIRS 0.20

Agent Theories, Architectures, and

Languages

European Conference on Artificial Intelligence

Proximity is not purely thematic!

PODCPrinciples of Distributed

Computing

DISC 1.00OPODIS 0.49SPAA 0.46SIROCCO 0.36ICDCS 0.32SRDS 0.30STOC 0.27SODA 0.24FOCS 0.22DIAL-M 0.21

Symposium on Parallel Algorithms &

Architectures

Structural Information & Communication

Complexity

Int. Conference on Distributed

Computing Systems

Page 12: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 12

The Concept of Layers

• Layers correspond to different reasons (catalysts) for edges– Thematic scope and quality are such reasons– Similar to the concept of „social dimensions“ of Watts, Dodds, Newman (2002)

• Total graph is the sum of its layers: i

iuviuv wxT )(

Page 13: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 13

Thematic Layer

• Comparing publication titles allows to estimate thematic similarity of conferences– Score for each conference-keyword pair

• TF-IDF (Term-Frequency Inverse-Document-Frequency) – Similarity: cardinality of the intersection of the top-50 keywords

1. Learning2. Planning3. Robot4. Reasoning5. Knowledge6. Search7. Agent8. Constraint9. AI10. Reinforcement...

AAAI1. Byzantine2. Consensus3. Quorum4. Wait5. Exclusion6. Detectors7. Distributed8. Networks9. Asynchronous10. Stabilizing...

PODC

1. Distributed2. Networks3. Wireless4. Exclusion5. Multicast6. Consistency7. Mobile8. Hoc9. Protocol10. ad...

ICDCS

1. Parallel2. Scheduling3. Routing4. Oblivious5. Adversarial6. Networks7. Memory8. Load9. Stealing10. Algorithms...

SPAA

Page 14: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 14

Layer Separation by Subtraction

• Assumption: 2 major layers: thematic layer (t) and quality layer (q)– Total weight T = x1t + x2q + x3r– Remainder r is neglected

• The qualitative similarity q can be determined from T and t!– Result is only a rough estimate due to considerable simplifications

(independence of layers, neglecting r, etc.)

q ≈ T - αt

Quality layer Social similarity(total weight) Thematic layer

Page 15: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 15

Example: Thematic and Quality Layer for AAAI

Page 16: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 16

Proximity Based Conference Rating (1)

• In the quality layer a tier-1 conference is supposed to have many tier-1 conferences in its proximity (the same holds for tier-2 and tier-3)– Unknown ratings can be „interpolated“– Intial ratings taken from Libra (MSR Asia)– Existing approaches mostly citation based (initiated by Garfield in 1972)

? Median

Page 17: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 17

0.3

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

Alpha

Erro

r (fr

actio

n)

Proximity Based Conference Rating (2)

• Intial ratings taken from Libra– Libra vs. „Internet List“: „Error“-rate 34.5%– Conference rating is difficult and partly subjective– Tier-1 vs. Tier-3: 4.5% Error (α = 0)

1) Roughly detect tier (1,2 vs. 2,3)

2) Use specific Alpha for fine separationTier-3

Tier-1

Tier-2

Total

Recall: q ≈ T - αt

Page 18: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 18

Proximity Based Conference Rating (3)

T1 T2 T3 % Correct

T1 54 28 3 64%T2 38 112 48 57%T3 19 92 172 61%

Estimated Tier

Tier

(Lib

ra)

Few „serious“ errors:22 of 567 = 3.9%

Diagonal elementsdominate

Total error drops from 50.5% to 40.3%

Libra vs. „Internet List“: 34.5% Random: 66.7%

After „thematic correction“: 40.3%

Total graph: 50.5%

Page 19: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 19

Conclusion and Future Work

• We have seen that– „Social similarity“ is a good measure to relate conferences– „Social similarity“ consists of thematic and a quality layer– The thematic layer can be estimated using publication titles– The quality layer can be emphasized by subtracting the thematic

component– These ideas can be used for conference rating and search

• www.confsearch.org

• It would be interesting to look at– A generic method for layer separation (that works on various graphs)– Looking at combinations of the presented conference rating ideas with

citation based approaches

Page 20: D istributed C omputing  G roup

Michael Kuhn, ETH Zurich @ APWEB 2008 20

Thanks for Your Attention

• Questions?