bringing mathematics to the web of data: the case of the mathematics subject classification (msc)

31
Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) Extended Semantic Web Conference 2012 Ch. Lange 1,2,3 , P. Ion 4,5,6 , A. Dimou 5 , Ch. Bratsas 5 , W. Sperber 7 , M. Kohlhase 2 , I. Antoniou 5 1 School of Computer Science, Univ. of Birmingham, UK 2 Computer Science, Jacobs Univ. Bremen, DE 3 SFB/TR 8 “Spatial cognition”, Univ. of Bremen, DE 4 Mathematical Reviews/American Mathematical Society, US 5 Web Science, Aristotle Univ. Thessaloniki, GR 6 Univ. of Michigan, Math. Dept., US 7 Zentralblatt MATH/FIZ Karlsruhe, DE Project page: http://msc2010.org/mscwork/ 2012-05-30 Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 1

Post on 19-Oct-2014

752 views

Category:

Technology


1 download

DESCRIPTION

Extended Semantic Web Conference 2012Digital Libraries track2012-05-30

TRANSCRIPT

Page 1: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Bringing Mathematics To the Web of Data:the Case of the

Mathematics Subject Classification (MSC)Extended Semantic Web Conference 2012

Ch. Lange1,2,3, P. Ion4,5,6, A. Dimou5, Ch. Bratsas5, W. Sperber7,M. Kohlhase2, I. Antoniou5

1School of Computer Science, Univ. of Birmingham, UK 2Computer Science, Jacobs Univ.Bremen, DE 3SFB/TR 8 “Spatial cognition”, Univ. of Bremen, DE 4Mathematical

Reviews/AmericanMathematical Society, US 5Web Science, Aristotle Univ. Thessaloniki,GR 6Univ. of Michigan, Math. Dept., US 7Zentralblatt MATH/FIZ Karlsruhe, DE

Project page: http://msc2010.org/mscwork/

2012-05-30Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 1

Page 2: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

The MSC in Paper Publications

Three-level tree structure:52 Convex and discrete geometry53 Differential geometry

53A Classical differential geometry53A04 Curves in Euclidean space53A45 Vector and tensor analysis

53B Local differential geometry

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 2

Page 3: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Browsing PlanetMath.org by Subject

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 3

Page 4: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Searching MathSciNet by Subject

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 4

Page 5: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Uploading to arXiv.org

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 5

Page 6: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

How to Know the Right MSC Code?

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 6

Page 7: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

The MSC Master Source So Far

\MajorSub 53-++\SubText Differential geometry\SeeFor{For differential topology, see \SbjNo 57Rxx.For foundational questions of differentiable manifolds,see \SbjNo 58Axx}

...\SecndLvl 53Axx\SubText Classical differential geometry...\ThirdLvl 53A45\SubText Vector and tensor analysis

Processing MSC-related information (in applications and formaintenance) requires specially tailored scripts!

Who knows how towrite them?

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 7

Page 8: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Redesign Requirements

1 facilitate use and reusefor Mathematical Reviews/Zentralblatt MATH servicesbut also for 3rd-party publishers and authors

2 facilitatemaintenance:preserve all existing information, leave room for semanticrefinementsuse standard tools instead of custom scriptsintegrate maintenance-related information into the scheme

3 enable knowledge workers and service developers to adapt andextend the MSC:

connections to related subjects e.g. in scienceadd unofficial translations. . . without impairing the editorially controlled core scheme

4 allow end users to explore connections to related subjects

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 8

Page 9: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Our Choice: a SKOS Linked Dataset

RDF linked dataset, using SKOS as vocabulary – same as these:

← Dewey DecimalClassification (DDC,http://dewey.info)

Library of CongressSubject Headings (LCSH,http://id.loc.gov/

authorities/subjects.html)→

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 9

Page 10: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

The Basic Hierarchy (SKOS Core)63 top-level nodes, 528 second-level nodes, 5606 leavesStraightforward application of SKOS vocabulary terms:

Concept-Scheme

Concept Concept

hasTopConcept

topConceptOf

narrower

broader

inScheme

inScheme

msc2010:53A45 a skos:Concept ;skos:inScheme msc2010: ;skos:broader msc2010:53Axx ;skos:prefLabel "Vector and tensor analysis"@en ;skos:notation "53A45"^^mscsmpl:MSCNotation .

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 10

Page 11: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Multilingual Labels (SKOS Core)

TEX source had English labelsTrusted parties contributed Chinese, Italian and Russian labels(stored externally)

msc2010:53A45skos:prefLabel"Vector and tensor analysis"@en,"向量与张量分析"@zh .

Greek labels needed, but no official ones available?No problem, merge a separate graph!msc2010:53A45skos:prefLabel"Διανυσµατική και τανυστική ανάλυση"@el .

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 11

Page 12: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Mathematical Markup in Labels (SKOS Core)

215 out of 6198 labels (3,4%) containmathematical markup26E10 C∞-functions, quasi-analytic functionsUnicode covers most of it, e.g. bold Greek letters,sub-/superscript digits, operatorsNo two-dimensional markup (fractions, matrices)23 remaining problematic labels:

expressions in a sub-/superscript: Sn−1

non-standard sub-/superscript letters: 1k , Hp, vnsub-/superscript symbols: C∞

overlined operators: ∂Solution:MathML

msc2010:26E10 skos:prefLabel "<mml:math alttext="$C^\infty$"><mml:msup><mml:mi>C</mml:mi><mml:mi>∞</mml:mi></mml:msup>

</mml:math>-functions, quasi-analytic functions"^^rdf:XMLLiteral .

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 12

Page 13: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Linking Partitively Related Concepts (Extension)

Three types of non-symmetric links:00A08 Recreational mathematics (see also 97A20)→ straightforward SKOS extension property20F60 Ordered groups (seemainly 06F15)→ straightforward SKOS extension property11Hxx Geometry of numbers (for applications in codingtheory, see 94B75)→ a bit trickier (next slide)

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 13

Page 14: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Faithfully Representing “See For” Links

11Hxx 94B75

Concept

for applications in coding theory

seeConditionally

seeFor

forTarget

type type

scope

mscvocab∶seeFor ○mscvocab∶forTarget ⊑mscvocab∶seeConditionally

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 14

Page 15: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Linking Across Concept Schemes (SKOS Core)

MSC2000 still widely in useexplicit links to MSC2010 would assist migration

Typical cases:no change→ skos:exactMatchreclassification→ skos:relatedMatche.g. 05E40 “Combinatorial aspects of commutative algebra”partly replacing the MSC2000 classes 05E20 and 05E25diversification→ skos:broadMatche.g. 97-XX “Mathematics education”:

MSC2000: 49 conceptsMSC2010: 160 concepts

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 15

Page 16: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Linking to non-SKOS Concepts (Extension)

Some relevant classification schemes not fully available in SKOSDDC dataset only covers the top three levels(just 9 classes for mathematics /)We knowmore fine-grainedmappings and represent themusing local DDC placeholders

msc:53A45 skos:relatedMatch [a skos:Concept ;dcterms:isPartOf ddc:, msc: ;skos:notation"515.63"^^<http://dewey.info/schema-terms/Notation> ;

skos:prefLabel "Vector, Tensor, Spinor Analysis"@en ] .

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 16

Page 17: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Collections Besides the Hierarchy (SKOS Core)

01 History and biography (see also the classification number–03 in the other sections) – how to link there?52 Convex and discrete geometry→ 52-03 Historical53 Differential geometry→ 53-03 Historical

msc:HistoricalTopics a skos:Collection ;skos:prefLabel "Historical topics"@en ;skos:member msc:03-03, ..., msc:97-03 .

Further candidates:explicitly given: general reference works (–00), instructionalexpositions (–01), works on computational methods (–08)requiring conceptual analysis: stability of different mathematicalstructures (scattered all over the MSC)

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 17

Page 18: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Co-Classification Policies (Extension?)

03-03 Historical (must also be assigned at least oneclassification number from Section 01)

01A55 19th century01A70 Biographies, obituaries, personalia, bibliographies. . .

Not represented fully explicitly for now, . . .. . . but kept in one central place, separate from concept labels

msc:HistoricalTopics skos:note "Any resource classified as -03must also be assigned at least one classification numberfrom Section 01." .

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 18

Page 19: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

URI Format

http://msc2010.org/resources/MSC/2010/53A45

expanded dataset has 92,000 triples (7 MB in RDF/XML)typical linked data clients need fewMSC classes:

publications typically classified with two MSC classessuperclasses may also be of interest

Therefore:“slash URIs” . . .. . . plus a SPARQL endpoint. . . plus all-in-one downloads for developers

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 19

Page 20: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Development and Deployment of the Dataset

TEXcore SKOS(RDF/XML)

core SKOS(other serializations)

expanded SKOS(N-Triples)

expanded SKOS(other serializations)

one RDF/XML file

per resource

SPARQLendpoint

(for “end users”)

customPerlscript

N3ruleset(cwm)

split(Makefile)

import

cwm/Python rdflib

cwm/Python rdflib

all available from http://msc2010.org/mscwork/, licensed underCC-BY-NC-SA

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 20

Page 21: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Expanding a Dataset with N3 Rules

As little manual maintenance as possible, . . .. . . withmaximum convenience for stupid linked data clients.# infer skos:broader back-links from skos:narrower# (actually hard-coding the semantics of owl:inverseOf){ ?concept skos:narrower ?narrowerConcept }=> { ?narrowerConcept skos:broader ?concept }.

similarly forun-reifying the “see for” linksdumbing down MSC-specific links to rdfs:seeAlso

Makefile applies this usingcwm --rdf msc2010-core.skos --n3 expand-skos-rules.n3 --think

(expansion from 79,000 triples to 92,000 triples = + 16%)

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 21

Page 22: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

BenefitsBenefits experienced immediately:

all information preserved, in most cases more explicitly(hierarchy, cross-references)links to other concept schemes and translations includedwiththe core schemerigorous conceptual modeling helped to uncoverconceptualization issues in the MSCeasymaintainability (in our deployment workflow)

Benefits envisaged potentially:easy maintainability (in the editorial workflow)promotingwidespread adoption thanks to existing search,query, editing, consistency checking, and annotation toolssupporting reuse in linked data settings, and in legacy settings(by easier conversion to non-RDF formats)

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 22

Page 23: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Multilingual Labels vs. Mathematical Markup (I)

Two warm-up questions:1 Who thinks that XML literals in RDF are obsolete?

We do not think so!2 Who knows why RDF literals may either have a language or a

datatype?

I would like to get your advice!

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 23

Page 24: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Multilingual Labels vs. Mathematical Markup (I)

Two warm-up questions:1 Who thinks that XML literals in RDF are obsolete?

We do not think so!2 Who knows why RDF literals may either have a language or a

datatype?

I would like to get your advice!

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 23

Page 25: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Multilingual Labels vs. Mathematical Markup (I)

Two warm-up questions:1 Who thinks that XML literals in RDF are obsolete?

We do not think so!2 Who knows why RDF literals may either have a language or a

datatype?I would like to get your advice!

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 23

Page 26: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Multilingual Labels vs. Mathematical Markup (II)

Multilingual labels? No problem! (Plain literals)Mathematical formulas in labels? No problem! (XML literals)(just violates a “convention” from the SKOS recommendation)Both plain and datatyped literals? ☇Potential workaround: Encode language into the XML<math xml:lang="en">...</math>

removes language information from the RDF data modelslows down SPARQL filtering by languagemultiple prefLabelswith “no language”?

Note: Cutting mathematical formulas out of the label texts is notan option!Not sure how other SKOS tools like this. . .

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 24

Page 27: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

In Use (I): http://www.math.auth.gr

RDF Storage

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 25

Page 28: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

In Use (II): Exploring Connections within AUTH

connections between three AUTH researchers (using MSC researchtopics and other linked data), powered by RelFinder(http://www.visualdataweb.org/relfinder.php)

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 26

Page 29: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

In Use (III): http://alpha.planetmath.org

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 27

Page 30: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Conclusion & Roadmap

ConclusionLODified the central classification scheme in mathematics(actually one of the first mathematical LOD sets)SKOS and LOD largely satisfied our requirements, . . .. . . but still semantic web standards are not quite ready formathematics.

Roadmap for the MSC dataset itself:soon official announcement by Math. Reviews/Zentralblattadding precise definitions of the MSC classesadding index terms to classesintroducing a faceted structure (beyond collections)

Roadmap for the Mathematical Web of Data (next slide)

Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 28

Page 31: Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)

Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap

Roadmap: The Mathematical Web of DataConnection points (besides the obvious DBpedia):

OpenMath Content Dictionaries (defining the semantics ofMathML; we have previously LODified them✓)ACM Computing Classification System (soon officially inSKOS✓)PlanetMath (soon exposing its metadata as LOD)Physics and Astronomy Classif. Scheme (on our own agenda)European Digital Mathematics Library (interested in LOD)

MSC and other datasets enable fine-grainedclassification of mathematical resources smaller thanarticles (e.g. blog posts)

⇒democratization of scientific publishing,

towards networked scienceLange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 29