aginfra work on germplasm and soil linked data by luca matteus, giovanni l’abate and maria...
DESCRIPTION
Presentation delivered at the Agricultural Data Interoperability Interest Group -- Research Data Alliance (RDA) 4th Plenary Meeting -- Amsterdam, September 2014TRANSCRIPT
agINFRA work ongermplasm and soilLinked DataPublishing agricultural databases as RDF
Luca Matteis1, Valeria Pesce2, Giovanni L’Abate3, Maria Antonietta Polombi31Bioversity InternationalVia dei Tre Denari 472/a00057 Maccarese (Fiumicino) Rome, [email protected]
2GFAR - The Global Forum on Agricultural Research c/o FAO,Viale delle Terme di Caracalla - 00153, Roma (Italy)
3Consiglio per la Ricerca e la sperimentazione in AgricolturaCentro di ricerca per l’agrobiologia e la pedologia (CRA-ABP)Piazza M. D’azeglio, 30 - 50121 Florence (Italy)
20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
5 stars Linked Data
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Data source analysis
PostgreSQL SISI
MSAccess
a) CRA CNCPsoil data
PlantaRes
MySQL
b) CRA PlantaResgermplasm data
original source
� intermediate source
� published data
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
1st Step: RDF Conversion
PostgreSQL SISI
MSAccess
D2RQ
a) CRA CNCPsoil data
original source
� intermediate source
� published data
published linked data
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
D2RQD2RQ automatically converts relational databases intoRDF. It also publishes the data as Linked Data alongwith a SPARQL endpoint.
https://aginfra-sg.ct.infn.it/rdf/cncp/
2nd Step: Mapping to RDF Vocabularies
ID type Latitudine_WGS84 Longitudine_WGS84 ...
16.4LPhk1-1 observation 42.57 12.93 ...
becomes...
{
" @id " : "http://rdf.entecra.it/soil/ 16.4LPhk1-1 ",
" @type " : " soil:ObservedSoilSite ",
" geo:lat " : "42.57",
" geo:long " : "12.93",
...}
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
2nd Step: Mapping to RDF Vocabularies
ID type Latitudine_WGS84 Longitudine_WGS84 ...
16.4LPhk1-1 observation 42.57 12.93 ...
becomes...
{
" @id " : "http://rdf.entecra.it/soil/ 16.4LPhk1-1 ",
" @type " : " soil:ObservedSoilSite ",
" geo:lat " : "42.57",
" geo:long " : "12.93",
...}
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Future work
– More interlinks between datasets
– Interlinks with AGROVOC and integration in theAGRIS portal
– Move URIs under the entecra.it domain
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Future work
– More interlinks between datasets
– Interlinks with AGROVOC and integration in theAGRIS portal
– Move URIs under the entecra.it domain
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Future work
– More interlinks between datasets
– Interlinks with AGROVOC and integration in theAGRIS portal
– Move URIs under the entecra.it domain
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Acknowledgements
Riccardo Bruno (from INFN) and Gino Barreca (fromCRA) for the help with configuring the serverinfrastructure.
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Thank you!
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014