bringing math to lod

29
Bringing Math to LOD: A Semantic Publishing Platform Prototype for Scientific Collections in Mathematics Olga Nevzorova, Nikita Zhiltsov, Danila Zaikin, Olga Zhibrik, Alexander Kirillovich, Vladimir Nevzorov, Evgeniy Birialtsev Kazan Federal University Russia October 23, 2013 1 / 29

Upload: nikita-zhiltsov

Post on 29-Nov-2014

1.087 views

Category:

Technology


1 download

DESCRIPTION

The presentation slides at ISWC 2013

TRANSCRIPT

Page 1: Bringing Math to LOD

Bringing Math to LOD:A Semantic Publishing Platform Prototype for

Scientific Collections in Mathematics

Olga Nevzorova, Nikita Zhiltsov, Danila Zaikin, Olga Zhibrik,Alexander Kirillovich, Vladimir Nevzorov, Evgeniy Birialtsev

Kazan Federal UniversityRussia

October 23, 2013

1 / 29

Page 2: Bringing Math to LOD

Outline

1 Introduction

2 Approach

3 Use Cases

2 / 29

Page 3: Bringing Math to LOD

Our Contribution

Our prototype is geared to build a semantic graph ofmathematical knowledge objects, that

I is extracted from a collection of mathematicalscholarly papers, and

I is integrated into the LOD «cloud»

3 / 29

Page 4: Bringing Math to LOD

Research OutputIVM Data Set

I LOD representation of 1 330 scholarly publications ofthe «Izvestiya Vuzov. Matematika» (IVM) journal

I Covers the semantics of:I article metadataI elements of the logical structureI terminologyI formulas

I Aligned with DBpedia, CORDISI More than 850 000 RDF triplesI SPARQL endpoint:

http://cll.niimm.ksu.ru:8890/sparql-auth∗

∗the SPARQL endpoint is secured. Please email the authors for credentials4 / 29

Page 5: Bringing Math to LOD

Related Work

I Domain-specific languages: OMDoc, MathLangI Domain models: Cambridge MathematicalThesaurus, DBpedia (math-related part),ScienceWISE Ontology

I Math-related NLP: mArachna; linguistic modules ofarXMLiv

5 / 29

Page 6: Bringing Math to LOD

Outline

1 Introduction

2 Approach

3 Use Cases

6 / 29

Page 7: Bringing Math to LOD

Key Research Contributions

I a thorough ontological model of the mathematicaldomain

I an ontology-based language-independent method forextraction of logical structure elements in papers

I an ontology-based method for extraction ofmathematical named entities from texts in Russian

I a method that connects mathematical named entitiesto symbolic expressions

7 / 29

Page 8: Bringing Math to LOD

Prototype’s Design

8 / 29

Page 9: Bringing Math to LOD

Domain Model

9 / 29

Page 10: Bringing Math to LOD

Ontology of Structural Elements (1)http://cll.niimm.ksu.ru/ontologies/mocassin

I Covers 15 common structural elements:

I Defines 9 object properties and 4 datatype properties:

10 / 29

Page 11: Bringing Math to LOD

Ontology of Structural Elements (2)http://cll.niimm.ksu.ru/ontologies/mocassin

I 3 cardinality axioms, e.g.Proof ∧ (= 1 proves ProvableStatement†)

I 2 transitivity axioms for hasPart and dependsOnproperties

I DL expressivity: SRIN (D)

†i.e., Claim ∨ Corollary ∨ Lemma ∨ Proposition ∨ Theorem11 / 29

Page 12: Bringing Math to LOD

Ontology of Mathematical Concepts (1)http://cll.niimm.ksu.ru/ontologies/mathematics

I Covers 3 450 mathematical conceptsI Defines commonly used terms as well as terms fromthe emerging professional vocabulary (e.g.Bitsadze-Samarsky problem)

I Supports Russian/English labels

12 / 29

Page 13: Bringing Math to LOD

Ontology of Mathematical Concepts (2)http://cll.niimm.ksu.ru/ontologies/mathematics

I Includes two taxonomies:I taxonomy of mathematical theories‡:

F number theory, set theory, algebra, analysis, geometry,mathematical logic, discrete mathematics, theory ofcomputation, differential equations, numerical analysis,probability theory and statistics

I taxonomy of mathematical objects

I Covers common scientific concepts, such as Problem,Method, Statement, Formula etc.

I DL expressivity: ALCHI

‡covers just a part of the mathematical knowledge13 / 29

Page 14: Bringing Math to LOD

Ontology of Mathematical Concepts (3)Object properties

I belongsTo/contains, e.g.Barycentric Coordinates belongsTo Metric Geometry

I defines/isDefinedBy, e.g.Christoffel Symbol isDefinedBy Connectedness

I seeAlso, e.g.Chebyshev Iterative Method seeAlso Numerical Solution ofLinear Equation Systems

14 / 29

Page 15: Bringing Math to LOD

Ontology of Mathematical Concepts (4)Stats

I 3 450 classesI 27% of classes are mapped onto DBpediaI 3 630 subclass-of property instancesI 1 140 other object property instancesI Common facts about the development:

I lasted for 4 monthsI 7 pro mathematicians participated as domain experts

guided by the authorsI WebProtege was used as a collaborative tool

15 / 29

Page 16: Bringing Math to LOD

Semantic Annotation

16 / 29

Page 17: Bringing Math to LOD

NLP Annotation

I Relies on the OntoIntegrator facilitiesI Solves some of the conventional linguistic tasks, suchas:

I tokenizationI sentence splitting (∼ 98% F-measure§)I morphological analysisI NP extraction (88% precision)

I Special handling of math symbols, abbreviations, andmath expressions as parts of NPs

I Currently supports only Russian language

§the metrics were evaluated on real math texts with the help ofdomain experts

17 / 29

Page 18: Bringing Math to LOD

Mining the Logical StructureI Supports our ontology of structural elements:

elements in real texts are instances of the ontology classesI Recognizing types of structural elements:

I A string similarity based method gives 89%-100%F-measure depending on the class

I Recognizing semantic relations between them:I A decision tree learner gives 61%-95% F-measure

depending on the relation

18 / 29

Page 19: Bringing Math to LOD

Mathematical Named Entity Extraction

I Supports our ontology of mathematical concepts:assigned NPs are instances of the ontology classes

I Our method employs annotations of the NP structureand Jaccard similarity

I The method gives 86% F-measure with parametersfocusing on precision/recall trade-off

19 / 29

Page 20: Bringing Math to LOD

Connecting Named Entities to Formulas

20 / 29

Page 21: Bringing Math to LOD

Connecting Named Entities to FormulasI Parsing mathematical expressionsI Detection of variablesI Proximity-based matching of mathematical variableswith noun phrases at 68% accuracy

21 / 29

Page 22: Bringing Math to LOD

Other supported features

22 / 29

Page 23: Bringing Math to LOD

Other supported features

I Article metadata extraction (title, author names,publication year etc.) according to AKT Portalschema

I Semi-manual interlinking¶ with existing LOD datasets: DBpedia, CORDIS

I Publishing the extracted data as an LOD-compliantRDF data set

¶by leveraging the Silk app23 / 29

Page 24: Bringing Math to LOD

Outline

1 Introduction

2 Approach

3 Use Cases

24 / 29

Page 25: Bringing Math to LOD

Finding DBpedia Entities in Mathematical Formulashttp://cll.niimm.ksu.ru/iswc-demo

1

2

25 / 29

Page 26: Bringing Math to LOD

Semantic Search of Theoretical FindingsFinding articles with theorems about finite groups

PREFIX moc: <http://cll.niimm.ksu.ru/ontologies/mocassin#>PREFIX math: <http://cll.niimm.ksu.ru/ontologies/mathematics#>SELECT ?article WHERE {?article moc:hasSegment ?theorem .?theorem moc:mentions ?entity; a moc:Theorem .?entity a math:E2183}

26 / 29

Page 27: Bringing Math to LOD

Conclusion

I We have developed a holistic approach for miningLOD representation of scholarly papers inmathematics

I We applied the prototype to a collection of over1 300 real math papers

I We conducted a thorough evaluation of the proposedmethods with the help of domain experts

I We provided several use cases to illustrate the utilityof the published data

27 / 29

Page 28: Bringing Math to LOD

Future Work

I Integrating all the modules into a full-fledged toolkitI Add support of English to the NLP moduleI Extend our approach to texts on other naturalscience domains

28 / 29

Page 29: Bringing Math to LOD

Thanks for your attention!Questions?

29 / 29