ekaw 2016 - techminer: extracting technologies from academic publications

14
Francesco Osborne, Helene de Ribaupierre, Enrico Motta KMi, The Open University, United Kingdom EKAW2016 TechMiner: Extracting Technologies from Academic Publications

Upload: francesco-osborne

Post on 06-Jan-2017

93 views

Category:

Science


2 download

TRANSCRIPT

Page 1: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Francesco Osborne, Helene de Ribaupierre, Enrico Motta

KMi, The Open University, United Kingdom

EKAW2016

TechMiner: Extracting Technologies from Academic Publications

Page 2: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

22

Osborne, F., Motta, E. and Mulholland, P.Exploring scholarly data with Rexplore.International Semantic Web Conference 2013

technologies.kmi.open.ac.uk/rexplore/

Page 3: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Semantic Enhanced Scholarly Data

Most scholarly datasets capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others.

We still lack comprehensive information about the content of research papers, often simply represented as a collection of keywords or categories from a taxonomy.

Hence, researchers are working for extracting other kinds of entities, including:

– Genes– Chemical components– Epistemological concepts (e.g., hypothesis, motivation, experiments)

3

Page 4: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

What about technologies?

• Technologies such as applications, systems, languages and formats are an essential part of the Computer Science ecosystem.

• Current knowledge bases cover just a little part of the set of technologies presented in the literature.

• Identifying semantic relationships between technologies and other research entities allows:– Richer semantic search;– Monitoring the emergence and impact of new technologies, both within

and across scientific fields;– Studying the scholarly dynamics associated with the emergence of new

technologies; – Supporting companies in the field of innovation brokering and initiatives

for encouraging software citations across disciplines, e.g. FORCE11 Software Citation Working Group.

4

Page 5: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

TechMiner

TechMiner (TM) is a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies applications, systems, languages and formats from research publications.

It generates an OWL ontology describing technologies and their relationships with other research entities.

We evaluated TM on a manually annotated gold standard and found that it improves significantly both precision and recall over alternative NLP approaches.

– The proposed semantic features significantly improve both recall and precision.

5

Page 6: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Some example – Tecnologies created by E. Motta

6

Page 7: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Some example – Popular Knowledge Bases in SW

7

Page 8: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

TechMiner - Architecture

8

Page 9: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Evaluation – Gold Standard

We tested our approach on a gold standard (GS) of manually annotated publications in the field of the Semantic Web

We selected a number of publications tagged with keywords related to this field (e.g., ‘semantic web’, ‘linked data’, ‘RDF’) and asked a group of 8 Semantic Web experts to annotate these papers with their technologies.

The resulting GS includes 548 publications, each of them annotated by at least two experts, and 539 technologies.

9

Page 10: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Evaluation

10

Page 11: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Evaluation

11

Page 12: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Future works

• Enriching the approach for identifying other categories of scientific objects, such as datasets, algorithms and so on.

• Trying the approach on other research fields.• Building a pipeline for allowing human experts to

correct and manage the information extracted by TechMiner.

12

Page 13: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

Helene de Ribaupierre Enrico MottaFrancesco Osborne

Osborne, F., Ribaupierre, H., and Motta, E. (2016) TechMiner: Extracting Technologies from Academic Publications.EKAW 2016, Bologna, Italy

Email: [email protected]: FraOsborneSite: people.kmi.open.ac.uk/francesco

http://oro.open.ac.uk/47332/1/EKAW2016_TM.pdf

Page 14: EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications