accessing exisng distributed science archives as rdf...

25
Accessing Exis+ng Distributed Science Archives as RDF Models Alasdair J G Gray 1 Norman Gray 2 Iadh Ounis 1 1 Compu+ng Science, University of Glasgow 2 Physics and Astronomy, University of Leicester

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

AccessingExis+ngDistributedScienceArchivesasRDFModels

AlasdairJGGray1

NormanGray2

IadhOunis1

1Compu+ngScience,UniversityofGlasgow2PhysicsandAstronomy,UniversityofLeicester

Page 2: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

Outline

•  Mo+va+ngscienceproblems•  Adataintegra+onapproach•  RDFandSPARQL•  Extrac+ngscien+ficdata•  Performanceresults

•  Conclusions

Page 3: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

SearchingforBrownDwarfs

•  Datasets:– NearInfrared,2MASS/UKInfraredDeepSkySurvey

– Op+cal,APMCAT/SloanDigitalSkySurvey

•  Complexcolour/mo+onselec+oncriteria

•  Similarproblems– HaloWhiteDwarfs

Page 4: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

DeepFieldSurveys

•  Observa+onsinmul+plewavelengths– RadiotoX‐Ray

•  Searchingfornewobjects– Galaxies,stars,etc

•  Requirescorrela+onsacrossmanycatalogues–  ISO– Hubble– SCUBA– etc

Page 5: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

TheProblem

Locate and combine relevant data 

•  Heterogeneouspublishers–  Archivecentres–  Researchlabs

•  Heterogeneousdata–  Rela+onal–  XML

–  Files

VirtualObservatory

Page 6: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

GenericDataIntegra+onApproach

•  Heterogeneoussources–  Autonomous–  Localschemas

•  Homogeneousview– Mediatedglobalschema

•  Mapping–  LAV:local‐as‐view–  GAV:global‐as‐view

GlobalSchema

Query1 Queryn

DB1

Wrapper1

DBk

Wrapperk

DBi

Wrapperi

Mappings

Page 7: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

P2PDataIntegra+onApproach

•  Heterogeneoussources–  Autonomous–  Localschemas

•  Heterogeneousviews– Mul+pleschemas

•  Mapping–  Betweenpairsofschema–  Networkoflinks

•  Requirecommonintegra+ondatamodel

Schema1

DB1

Wrapper1

DBk

Wrapperk

DBi

Wrapperi

Schemaj

Mappings

Query1 Queryn

Page 8: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

ResourceDescrip+onFramework(RDF)

•  W3Cstandard

•  Designedasametadatadatamodel

•  Makestatementsaboutresources

•  Containsseman+cdetails

•  Idealforlinkingdistributeddata

#foundIn

#Sun

TheSun#name

#MilkyWay

MilkyWay#name

TheGalaxy#name

IAU:Starrdf:type

IAU:BarredSpiral

rdf:type

Page 9: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

Reasoning

•  InfersknowledgefromRDF(S)statements–  Subclassing

•  OWL:extendswhatcanbeexpressed–  Inversepredicates–  Equivalence–  etc

#Sun

IAU:Starrdf:type

TheSun#name

#MilkyWay

MilkyWay#name

#foundIn

TheGalaxy#name

IAU:BarredSpiral

rdf:type

IAU:Galaxyrdfs:subClassOf

rdf:type

#contains

Page 10: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

SPARQL

•  Declara+vequerylanguage– Selectreturneddata•  Graphortuples•  A_ributestoreturn

– Describestructureofdesiredresults– Filterdata

•  W3Cstandard

•  Syntac+callysimilartoSQL

Page 11: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

FindthenameofthegalaxywhichcontainsastarwiththenameTheSun

SELECT ?galName WHERE { ?gal a IAU:Galaxy ; #name ?galName . ?star a IAU:Star ; #name ?starName ; #foundIn ?gal . FILTER REGEX(?starName, “The Sun”)

}

QueryingRDFwithSPARQL

#Sun

IAU:Starrdf:type

TheSun#name

#MilkyWay

MilkyWay#name

#foundIn

TheGalaxy#name

IAU:BarredSpiral

rdf:type

IAU:Galaxyrdfs:subClassOf

rdf:type

Page 12: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

FindthenameofthegalaxywhichcontainsastarwiththenameTheSun

QueryingRDFwithSPARQL

#Sun

IAU:Starrdf:type

TheSun#name

#MilkyWay

MilkyWay#name

#foundIn

TheGalaxy#name

IAU:BarredSpiral

rdf:type

IAU:Galaxyrdfs:subClassOf

rdf:type

?galName

TheGalaxy

Milky Way

Page 13: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

Integra+ngUsingRDF

•  Dataresources–  ExposeschemaanddataasRDF

–  NeedaSPARQLendpoint•  Allowsmul+ple

–  Accessmodels–  Storagemodels

•  Easytorelatedatafrommul+plesources

Rela+onalDB

RDF/Rela+onalConversion

XMLDB

RDF/XMLConversion

CommonModel(RDF)

Mappings

SPARQLquery

Page 14: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

AccessingRela+onalSourcesasRDF

DataDump

•  DatastoredasRDF–  Originalrela+onalsourceis

replicated

–  Datacanbecomestale

•  Na+veSPARQLquerysupport

•  Exis+ngRDFstores–  Jena–  Seasame

On‐the‐flyTransla3on

•  Datastoredasrela+ons•  Na+veSQLsupport

–  Highlyop+misedaccessmethods

•  SPARQLqueriesmustbetranslated

•  Exis+ngtransla+onsystems–  D2RQ/D2RServer–  SquirrelRDF

Page 15: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

SystemHypothesis

Itisviabletoperformon‐the‐flyconversionsfromexis+ngsciencearchivestoRDFtofacilitatedataaccessfromadatamodelthatascien+stisfamiliarwith

Rela+onalDB

RDF/Rela+onalConversion

XMLDB

RDF/XMLConversion

CommonModel(RDF)

Mappings

SPARQLquery

Page 16: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

TestData

•  SuperCOSMOSScienceArchive(SSA)– DataextractedfromscansofSchmidtplates– Storedinarela+onaldatabase– About4TBofdata,detailing6.4billionobjects– Fairlytypicalofastronomicaldataarchives

•  Schemadesignedusing20realqueries•  Personalversioncontains– Dataforaspecificregionofthesky– About0.1%ofthedata,~500MB

Page 17: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

AnalysisofTestData

•  About500MBinsize•  Organisedin14Rela+ons– Numberofa_ributes:2–152•  4rela+onswithmorethan20a_ributes

– Numberofrows:3–585,560

– Twoviews•  Complexselec+oncriteriainview

Page 18: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

RealScienceQueries

Query5

Findtheposi+onsand(B,R,I)magnitudesofallstar‐likeobjectswithindeltamagof0.2ofthecoloursofaquasarofredshij2.5<z<3.5

SELECT TOP 30 ra, dec, sCorMagB, sCorMagR2, sCorMagI

FROM ReliableStars WHERE (sCorMagB-sCorMagR2 BETWEEN 0.05 AND 0.80) AND (sCorMagR2-sCorMagI BETWEEN -0.17 AND 0.64)

Page 19: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

AnalysisofTestQueries

QueryFeature QueryNumbers

Arithme+cinbody 1‐5,7,9,12,13,15‐20

Arithme+cinhead 7‐9,12,13

Ordering 1‐8,10‐17,19,20

Joins(includingself‐joins) 12‐17,19

Rangefunc+ons(e.g.Between,ABS) 2,3,5,8,12,13,15,17‐20

Aggregatefunc+ons(includingGroupBy) 7‐9,18

Mathfunc+ons(e.g.power,log,root) 4,9,16

Trigonometryfunc+ons 8,12

Negatedsub‐query 18,20

Typecas+ng(e.g.Radianstodegrees) 7,8,12

Serverfunc+ons 10,11

Page 20: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

ExpressivityofSPARQL

Features

•  Select‐project‐join•  Arithme+cinbody•  Conjunc+onanddisjunc+on•  Ordering•  Stringmatching

•  Externalfunc+oncalls (extensionmechanism)

Limita3ons

•  Rangeshorthands•  Arithme+cinhead•  Mathfunc+ons

•  Trigonometryfunc+ons•  Subqueries•  Aggregatefunc+ons•  Cas+ng

Page 21: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

AnalysisofTestQueries

QueryFeature QueryNumbers

Arithme+cinbody 1‐5,7,9,12,13,15‐20

Arithme+cinhead 7‐9,12,13

Ordering 1‐8,10‐17,19,20

Joins(includingself‐joins) 12‐17,19

Rangefunc+ons(e.g.Between,ABS) 2,3,5,8,12,13,15,17‐20

Aggregatefunc+ons(includingGroupBy) 7‐9,18

Mathfunc+ons(e.g.power,log,root) 4,9,16

Trigonometryfunc+ons 8,12

Negatedsub‐query 18,20

Typecas+ng(e.g.radianstodegrees) 7,8,12

Serverfunc+ons 10,11

Expressiblequeries:1,2,3,5,6,14,15,17,19

Page 22: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

ExperimentalSetup

•  Machine–  IntelCore2Duo2.4GHz–  2GBRAM

– WindowsXP–  Java1.5

•  Sojware– MySQL5.0.51a–  D2RQ0.5.1–  SquirrelRDF0.1

•  Only4queriescompletedwithin2hours

Page 23: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

PerformanceResults

Page 24: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

ANewApproach

•  Exploitqueryenginesanddatastructureofunderlyingdatasources

•  Aiduserquerygenera+onbyexplainingsourcedatamodelintermsofknowndatamodel

•  Dataextractedinna+vemodel

Rela+onalDB

XMLDB

SQL XQuery

ExplicatorMappingsModels

Explain

Page 25: Accessing Exisng Distributed Science Archives as RDF Modelsexplicator.dcs.gla.ac.uk/Publications/ahm2008Presentation.pdf · delta mag of 0.2 of the colours of a quasar of redshi 2.5

Conclusions

•  SPARQL:Notexpressiveenoughforscience•  QueryConverters:Poorperformance•  Proposednewapproach– RDFtounderstanddatamodels– Na+vequeryenginesfordataextrac+on

RDF Rela3onal

Raggeddata Structureddata

Smalltomediumdatavolumes Largedatavolumes

Reasoningoverthedata Extrac+ngspecificdata