1
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
Delroy CameronMasters ThesisComputer Science, University of Georgia
11/27/2007
Advisor: I. Budak ArpinarCommittee: Prashant Doshi
Robert J. Woods
2
OUTLINE
BackgroundExpertise ProfilesRanking ExpertsCollaboration Networks ExpansionResults and EvaluationConclusionDemo
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
3
BACKGROUND
Semantic WebWhat?
Extension of current WebAttach Meaning to Data
Why? Under Utilization of Current WebHTML Limitations
GoalEnhance Information ExchangeAutomatic Information DiscoveryInteroperability of Services
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
4
BACKGROUND
Semantic WebTechnologies
XMLRDF/RDFS/OWLURIOntology
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
“David Billington is a Professor of Mathematics”
<course name=”Mathematics”>
<lecturer>David Billington</lecturer>
</course>
<lecturer name=”David Billington”>
<teaches>Mathematics</teaches>
</lecturer>
<teachingOffering>
<lecturer>David Billington</lecturer>
<course>Mathematics</course>
</teachingOffering >
<rdf:Description rdf:id=mynamespace:Professor_2”>
<rdf:has_name>David Billington</rdf:has_name>
<rdf:teaches rdf:resource=”#Mathematics”/>
</rdf:Description>
5
BACKGROUND
Semantic WebCommon Challenges
Entity DisambiguationOntology Mapping/AlignmentTrust/ProvenanceSemantic Association Discovery
ApplicationSocial NetworksBio-InformaticsNational SecurityGPS Data Mining
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
6
BACKGROUND
Social NetworksWhat?
Connected through Social Relationships
Characteristics Clustering Coefficient (connectedness to neighbors)Centrality (average shortest path length)Geodesic (shortest path length)
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
7
BACKGROUND
Peer-Review ProcessWhat?
Review scholarly manuscripts
Challenges SlowConflict of Interest Finding Suitable Reviewers
Arbitrary Knowledge ApproachResearch DiversificationEmerging Fields
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
8
CONTRIBUTIONS
Applicability of Semantics Finding Expertise
Fine Levels of Granularity
Finding ExpertsTaxonomy
Collaboration NetworksDiscovery of Unknown Experts
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
9
SEMEF
SEMantic Expert FinderFinding Expertise (Expertise Profiles)
Collecting ExpertiseQuantifying Expertise
Finding (Ranking) Expertsw/ and w/o taxonomy
Collaboration NetworksGeodesicC-Nets
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
10
EXPERTISE PROFILES
Collecting ExpertiseCollect All PublicationMap papers to topicQuantify all papers
Publications DatasetDBLP 473,296 papers (conference/session names - Nov.
2007)ACM, IEEE, Science Direct 29,454 papers (abstracts/index
terms)Combined 476,299 papers
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
11
EXPERTISE PROFILES
Collecting ExpertisePapers-to-Topics Dataset
Combined (476,299)Topics (320)Relationships (676,569)Expertise Profiles (560,792)
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
12
EXPERTISE PROFILES
Quantifying ExpertiseMapping each paper to distinct value
Publication ImpactHector Garcia-Molina (248 papers - 2003)E. F. Codd (49 papers - 2003)Citeseer Impact Statistics (1221 venues)DBLP URIs
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
13
EXPERTISE PROFILES
Figure 1: Expertise Profile
author_A
topic1 (4.50)
paper1
1.54
topic2 (1.86) topic3 (3.08)
paper2 paper3
1.541.10 1.86 1.54
paper4 paper6paper5
1.86
14
RANKING EXPERTS
Taxonomy of TopicsSession namesConference NamesO’CoMMAPaper AbstractsIndex Terms
Figure 2: Taxonomy of Topics
192
128
320
216
60
50
15
RANKING EXPERTS
Case 1 Single Topic without Taxonomy
Traverse all Expertise Profiles Sum impact, (papers topics)
Case 2 Single Topic with Taxonomy
Traverse all Expertise Profiles Sum impact, (papers topics, subtopics)
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
Prevent Expertise Overestimation
1) Map
2) Papers to leaf nodes only
16
RANKING EXPERTS
Case 3 Array of Topics without Taxonomy
Same as Case 2
Case 4 Array of Topics with Taxonomy
Filter input topics Sum impact, (papers topics, subtopics)
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
17
COLLABORATION NETWORKS EXPANSION
Geodesic
Figure 3: Geodesic Relationships
author_A
author_1
author_Bauthor_A
author_B
author_B
author_2author_A
author_B
opus:Article_in_Proceedings_179
opus:Proceedings_543
opus:Article_in_Proceedings_35 opus:Article_in_Proceedings_8
author_A
STRONG
MEDIUM UNKNOWN
WEAK
opus:author opus:author
opus:authoropus:author
opus:Article_in_Proceedings_291
opus:author
opus:Article_in_Proceedings_3
opus:author opus:author
opus:isIncludedIn opus:isIncludedIn
opus:author
18
COLLABORATION NETWORKS EXPANSION
C-NetOrdering Cluster of ExpertsCollaboration Strength*
* Newman, M. E. J.: Coauthorship Networks and Patterns of Scientific Collaboration. National Academy of Sciences of the United States of America, 1(101): 5200- 5205, (2004).
coauthor_1 {0.73, 0.5}
Super Node {14.80}
coauthor_2 {1.81, 1.0}
coauthor_3 {0.73, 0.5}
coauthor_4 {0.73, 0.5}
coauthor_5 {1.54, 1.0}
coauthor_n {1.1, 0.8}
Figure 3: Geodesic Relationships
19
RESULTS AND EVALUATION
EvaluationWWW Search Track (2005/6/7)Input Topics Call For PapersSWETO-DBLP Subset (67,366 authors)DBLP (560,792)
ValidationCollaboration Networks Expansion
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
20
RESULTS AND EVALUATION
Validation
Table 1: Past PC Lists comparison with SEMEF
52%668510-20%
58%200620-30%
65%211430-40%
73%302640-50%
79%211350-60%
82%100460-70%
85%101170-80%
85%000180-90%
85%000090-100%
29/3421/2526/2940/48
Total 89
13
Search2006
84
13
Search 2007
85
12
Average
83
35%10(top) 0-10%
Search 2005
Cumulative Percentage in
PC List
Search Track (Number of PC Members in SEMEF List)
Percentage in SEMEF List
21
RESULTS AND EVALUATION
Validation
Figure 4: Average Number of PC in SEMEF List
22
RESULTS AND EVALUATION
Validation
Figure 5: Average PC Distribution in SEMEF List
23
RESULTS AND EVALUATION
Collaboration Networks Expansion
Table 4: PC Chair – SEMEF List Geodesic Relationships
10141120151731WEAK
2
2
0
Chair2
1
6
3
Chair1
Search2006
0
7
3
Chair1
Search 2007
PC List (Number of Expert Relationships)
EXTREMELY WEAK
MEDIUM
STRONG
Relationships
1
10
2
Chair1
Search 2005
2
7
0
Chair2
00
48
00
Chair2
Above Average Expertise
(in PC)
58576605582608293649WEAK
26
55
3
Chair2
66
88
10
Chair1
Search2006
66
88
10
Chair1
Search 2007
SEMEF (Number of Expert Relationships)
EXTREMELY WEAK
MEDIUM
STRONG
Relationships
99
106
6
Chair1
Search 2005
26
53
2
Chair2
32
1676
343
Chair2
Above Average Expertise
(in PC)
Table 3: PC Chair – PC Member Geodesic Relationships
24
CONCLUSION
Expertise Profiles Publication Data Publication Impact Statistics Papers-to-Topics Relationships
Ranking Experts w/ and w/o Taxonomy Single and Array of Topics
Collaboration Networks Expansion Semantic Association Discovery Geodesic C-Nets
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
25
DEMO
Web ApplicationApache Tomcat 6.0Java Server PagesUbuntu 7.10
Delroy CameronMasters ThesisComputer Science, University of Georgia
26
RELATED WORK
Particle Swarm Algorithm
ExpertiseNets
Expertise BrowserExperience Atoms
Expertise RecommenderChange historyTech Support HeuristicsProfiling, Identification, Supervisor
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
27
RELATED WORK
Web-Based CommunitiesExpert Rank
Formal Probabilistic ModelsCandidate ModelsDocument Models
RDF-Matcher
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
28
EXPERTISE PROFILE ALGORITHM
Algorithm findExpertiseProfile(researcherURI, list of publications)create ‘empty expertise profile’
foreach paper of researcher do
get ‘topics’ list of paper (using papers-to-topics dataset)
get ‘publication impact’
if ‘publication impact’ is null do ‘publication impact’ default weight
else ‘weight’ ‘publication impact’ + existing ‘weight’ from expertise profile
if ‘expertise profile’ contains ‘topic’ do update ‘expertise profile’ with <’topic,’ ‘weight’>
else add <’topic,’ ‘weight’> pair to ‘expertise profile’
end
return ‘expertise profile’
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
29
RANKING EXPERTS ALGORITHM
Algorithm rankValue(researcherURI, list of topics)
set expertRank to zero
create temp ‘expertise profile’
filter topics
foreach topic in filtered topics list do
get ‘papers’ for this topic (using papers-to-topics dataset)
foreach paper in papers list do if researcher is author do
get ‘publication impact’ as ‘weight’ expertRankValue = expertRankValue + ‘publication impact’
add <’topic,’ ‘weight’> pair to temporary ‘expertise profile’
end if
end
end
return ‘rankValue’
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks