maria theodoridou semantic integration experiments
TRANSCRIPT
ARIADNE is funded by the European Commission's Seventh Framework Programme
SemanAc IntegraAon experiments Improving Interoperability and Reusability
Unlocking the PotenAal of Digital Archaeological Data Florence, 15 December 2016
Maria Theodoridou FORTH-‐ICS, Greece
The challenge Build an Integrated Knowledge Repository and support innovaAve reasoning on archaeological datasets (relaAng and combining data) preserving the original meaning and the perspecAve of the different data providers. Two main pillars: Ø a global, extensible schema in the form of a formal ontology that allows for
integraAon without loss of meaning.
Ø ARIADNE Reference Model = CIDOC CRM + Extension Suite
Ø Common vocabularies/terminologies Ø Use of well established standard terminologies Ø GeCy AAT Ø Nomisma.org
Case Studies Ø NumismaAcs
• tradiAonal science with experience and iniAaAves in standardizaAon so it was chosen as a very good starAng point for item-‐level integraAon
• Nomisma.org serves as a authoritaAve resource
Ø Wood/Dendrochronology • integraAon of informaAon from diverse datasets and (via NLP)
archaeological reports in different languages • GeCy AAT serves as an authoritaAve resource
Ø Sculptures • data integraAon of sources from various disciplines including sculpture informaAon and its archaeological context.
• focuses on the provenance of informaAon according to bibliographic references which leads to advanced literature research
NumismaAcs Case Study Extracts of 5 diverse databases & datasets: Ø OEAW: dFMRO coin archive 72 records
Ø COINS Project: SAR Archive 627 records
Ø COINS Project: FWM Archive
Ø iDAI Coins Pergamon 517 records
Ø CultureItalia: MuseiD-‐Italia 25562 records
Ø NLP data from Heslington East ExcavaAon Archive 37 records
Ø ACDM records
Wood/Dendrochronology Case Study • Extracts of 5 archaeological datasets, output from NLP
on 25 grey literature reports • MulAlingual -‐ English, Dutch and Swedish data • Data integraAon via CIDOC CRM and Geay AAT • 1.09 million RDF triples • 23,594 records • 37,935 objects • DemonstraAon query builder
for easier cross-‐search and browse of integrated datasets
Wood/Dendrochronology Case Study
SPARQL queries
DemonstraAon applicaAon: Query Builder
DCCD
RDF triple store
ADS, DANS, SND
Geay AAT (RDF)
VAG cruck NMS VAG
dendro UNID
XML NLP
Direct import TransformaAon (STELETO)
Cleansing + NormalisaAon (OpenRefine)
tabular records
TransformaAon (STELETO)
Grey literature Archaeological datasets
tabular records TransformaAon (XSLT)
Sculptures Case Study • Extracts of 5 diverse databases & datasets: – Archaeological object database: Arachne – Field research databases: Athenian Agora, iDAI.field – Museum data: BriAsh Museum – Research data: Oxford Roman Economy Project
• Data integraAon via CIDOC CRM and controlled vocabularies: Geay AAT, Wikidata, Zenon, iDAI.gazeaeer
• 5,44 million triples • 58343 records
IntegraLon & Interoperability ARIADNE portal
Integrated Knowledge Repository
X3ML Mapping Framework
mapping provider dataset records to CIDOC CRM
Content Providers
ARIADNE aggregaLon infrastructure
Provider dataset descripLons
Catalog
Integrated Browse/Query Interface
Provider records
ACDM records
ACDM records
mapping ACDM records to CIDOC CRM
Browse the Catalog
NLP
NLP records
Integrated Knowledge Repository Experimental integrated knowledge repository
Ø NumismaAcs Case Sudy 1,2M triples Ø Wood/Dendrochronology Case Study 1,5M triples Ø Sculptures Case Study 5,5 M triples Ø AAT thesaurus 4,4M triples
Total ~ 13M triples Contains different levels of informaAon:
Ø Item specific informaAon Ø Document research data Ø NLP data Ø Catalog informaAon
Technologies used:
hap://www.metaphacts.com/
haps://www.blazegraph.com/
Research quesAons Ø Query mechanisms support innovaAve reasoning on
archaeological datasets
Ø Query power lies in relaAng and combining
Ø data from different providers, preserving the original meaning and their perspecAve
Ø data from grey literature reports Ø item level with catalog info on archaeological datasets
Research quesAons
Ø Find all bronze coins (item level info, retrieves datasets from mulAple providers)
Ø Find the publishers of all collecAons that contain coins (catalog info)
Ø Find all datasets and grey literature reports that contain bronze antonianus (item level, NLP data and catalog info)
SAR records
NLP
record
CulturaItalia records
DAI
record
OEAW records
Catalog info
ContribuAng partners Achille Felicem, PIN
Carlo Meghini, CNR-‐ISTI
Philipp Gerth, DAI
Ceri Binding, USW
Douglas Tudhope, USW
Andreas Vlachidis, USW
Nadezhda Kecheva, NIAM-‐BAS
Sara di Giorgio. ICCU
Edeltraud Aspoeck, OEAW
Anja Masur, OEAW
ARIADNE is a project funded by the European Commission under the Community’s Seventh Framework Programme, contract no. FP7-‐INFRASTRUCTURES-‐2012-‐1-‐313193. The views and opinions expressed in this presentaAon are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.
Thank you