multimodal graph-based analysis over the dblp repository: critical discoveries and hypotheses

21
Introduction Methodology Experiments Conclusions Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses Gabriel Perri Gimenes, Hugo Gualdron, Jose F Rodrigues Jr 1 Mario Gazziro 2 1 University of Sao Paulo 2 Fed. University of Santo Andre Av Trab Sao-carlense, 400 Av dos Estados, 500 Sao Carlos, SP, Brazil - 13566-590 Santo Andre, SP, Brazil - 09210-580 {ggimenes,gualdron,junio}@icmc.usp.br [email protected] This work has financial support from Fapesp (2013/10026-7) http://www.icmc.usp.br/pessoas/junio/Site/index.htm The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 1/21

Upload: jose-f-rodrigues-jr

Post on 17-Jul-2015

55 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal graph-based analysis over the DBLPrepository: critical discoveries and hypotheses

Gabriel Perri Gimenes, Hugo Gualdron, Jose F Rodrigues Jr 1

Mario Gazziro 2

1University of Sao Paulo 2Fed. University of Santo AndreAv Trab Sao-carlense, 400 Av dos Estados, 500

Sao Carlos, SP, Brazil - 13566-590 Santo Andre, SP, Brazil - 09210-580{ggimenes,gualdron,junio}@icmc.usp.br [email protected]

This work has financial support from Fapesp (2013/10026-7)

http://www.icmc.usp.br/pessoas/junio/Site/index.htm

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 1/21

Page 2: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Summary

1 Introduction

2 Methodology

3 Experiments

4 Conclusions

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 2/21

Page 3: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Summary

1 Introduction

2 Methodology

3 Experiments

4 Conclusions

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 3/21

Page 4: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Introduction

High demand for informations about the behavior ofscientists: authors, editors, funding agencies and society

Combining analytical techniques - multimodal approach

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 4/21

Page 5: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Problem

Finding non-evident facts about DBLP is a non-trivial task

Single-technique approaches - limited analytical potential

Sistematic process - can be applied on similar data from otherdomains

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 5/21

Page 6: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Hypothesis

Hypothesis

The use of multiple analytical techniques, through a well-definedprocess, is capable of revealing important aspects of the scientificcommunity in computer science

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 6/21

Page 7: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Summary

1 Introduction

2 Methodology

3 Experiments

4 Conclusions

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 7/21

Page 8: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Materials

Cardinality of the entities extracted from DBLP - XML

Entity Number

Authors 1.060.221

Articles 1.801.576

Events 14.654

Publications 4.262

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 8/21

Page 9: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Data migration

Semi-structured format ⇒ Relational model

Need of specific software for the migration

Definition of the entity-relationship model:

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 9/21

Page 10: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Extracted relationships

Relationship Description

Co-authorship Authors that published an article

togheter.

Co-edition Authors that appear as editors in the

same event or journal.

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 10/21

Page 11: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Summary

1 Introduction

2 Methodology

3 Experiments

4 Conclusions

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 11/21

Page 12: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - WCC

Weakly-connected components distribution - Co-authorship

13% small components with up to 30 nodes

Giant component with 87% of the authors

44.000 sub-networks of co-authorship - eventual researchers,industry white papers

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 12/21

Page 13: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - ACC

Node degree × average clustering coefficient - Co-authorship

High coefficient values are found in nodes with degree < 10

Coefficient value decreases as the node degree increases - ACC ∝ degree−1.06

Authors tend to colaborate with the co-authors of their co-authors - triangles

Young authors vs. older authors

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 13/21

Page 14: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - Densification

Degree distribution - Co-autorship

As new authors appear new edges also appear - e(t) ∝ n(t)1.47 - densification

Edges appear exponentially vs. publication of elaborated articles

Master and Ph.D as regular coursesFunding agencies - numbersMore authors per paper

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 14/21

Page 15: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - Diameter

Effective diameter evolution - Co-edition

Peaked near 1995 - beginning of a shrink period

Before that - new editors/publication vehicles vs. after that - same editor/samevehicles

Densification period: more new edges than new nodes - editor commitees rotatebetween same members

Editor: experience and expertise - limitations for new researchers

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 15/21

Page 16: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - Previsibility

Previsibility analysis - Co-authoring

Can we predict new interactions in the DBLP newtork?

Extraction of topological features → supervised learning

Figure: Results - Interval G [1995, 2005],G [2006, 2007]

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 16/21

Page 17: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - Counting and algebraic analysis

Counting - Bipartite author-article network with timestamps

Accomplishment: number of years with at least onepublication

Silence: number of consecutive years with no publications

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 17/21

Page 18: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Multimodal Analysis - Counting and algebraic analysis

Proposed metric

Importance = 1√silence+1

∗ log(Accomplishment)

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 18/21

Page 19: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Summary

1 Introduction

2 Methodology

3 Experiments

4 Conclusions

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 19/21

Page 20: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Conclusions

Well-defined analytical process - combination of multipletechniques

Non-trivial extraction of information from DBLP

Multi-perspective interpretations about the past and future ofthe academic community in computer science

Application in the decision making process of funding agenciesand academic personnel

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 20/21

Page 21: Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions

Thanks!

Questions?

The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 21/21