aginfra work on germplasm and soil linked data by luca matteus, giovanni l’abate and maria...

28
agINFRA work on germplasm and soil Linked Data Publishing agricultural databases as RDF Luca Matteis 1 , Valeria Pesce 2 , Giovanni L’Abate 3 , Maria Antonietta Polombi 3 1 Bioversity International Via dei Tre Denari 472/a 00057 Maccarese (Fiumicino) Rome, Italy [email protected] 2 GFAR - The Global Forum on Agricultural Research c/o FAO, Viale delle Terme di Caracalla - 00153, Roma (Italy) 3 Consiglio per la Ricerca e la sperimentazione in Agricoltura Centro di ricerca per l’agrobiologia e la pedologia (CRA-ABP) Piazza M. D’azeglio, 30 - 50121 Florence (Italy) 20/09/2014

Upload: ciard-movement

Post on 12-Jun-2015

309 views

Category:

Education


3 download

DESCRIPTION

Presentation delivered at the Agricultural Data Interoperability Interest Group -- Research Data Alliance (RDA) 4th Plenary Meeting -- Amsterdam, September 2014

TRANSCRIPT

Page 1: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

agINFRA work ongermplasm and soilLinked DataPublishing agricultural databases as RDF

Luca Matteis1, Valeria Pesce2, Giovanni L’Abate3, Maria Antonietta Polombi31Bioversity InternationalVia dei Tre Denari 472/a00057 Maccarese (Fiumicino) Rome, [email protected]

2GFAR - The Global Forum on Agricultural Research c/o FAO,Viale delle Terme di Caracalla - 00153, Roma (Italy)

3Consiglio per la Ricerca e la sperimentazione in AgricolturaCentro di ricerca per l’agrobiologia e la pedologia (CRA-ABP)Piazza M. D’azeglio, 30 - 50121 Florence (Italy)

20/09/2014

Page 2: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Data coming from different sources is hard tointegrate because:

– Data is published using different formats(CSV, Excel, XML, JSON)

– Data is described using different standards(vocabularies, ontologies, taxonomies)

– Data is not linked together

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 3: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Data coming from different sources is hard tointegrate because:

– Data is published using different formats(CSV, Excel, XML, JSON)

– Data is described using different standards(vocabularies, ontologies, taxonomies)

– Data is not linked together

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 4: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Data coming from different sources is hard tointegrate because:

– Data is published using different formats(CSV, Excel, XML, JSON)

– Data is described using different standards(vocabularies, ontologies, taxonomies)

– Data is not linked together

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 5: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Data coming from different sources is hard tointegrate because:

– Data is published using different formats(CSV, Excel, XML, JSON)

– Data is described using different standards(vocabularies, ontologies, taxonomies)

– Data is not linked together

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 6: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Linked Data provides the principles that enablesmarter integration of data:

– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)

– Resources should be identified using resolvableHTTP URIs

– Link your resources to other RDF resources usingHTTP URIs

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 7: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Linked Data provides the principles that enablesmarter integration of data:

– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)

– Resources should be identified using resolvableHTTP URIs

– Link your resources to other RDF resources usingHTTP URIs

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 8: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Linked Data provides the principles that enablesmarter integration of data:

– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)

– Resources should be identified using resolvableHTTP URIs

– Link your resources to other RDF resources usingHTTP URIs

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 9: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Motivation – Why Linked Data?

Linked Data provides the principles that enablesmarter integration of data:

– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)

– Resources should be identified using resolvableHTTP URIs

– Link your resources to other RDF resources usingHTTP URIs

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 10: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

5 stars Linked Data

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 11: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Data source analysis

PostgreSQL SISI

MSAccess

a) CRA CNCPsoil data

PlantaRes

MySQL

b) CRA PlantaResgermplasm data

original source

� intermediate source

� published data

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 12: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

1st Step: RDF Conversion

PostgreSQL SISI

MSAccess

D2RQ

a) CRA CNCPsoil data

original source

� intermediate source

� published data

published linked data

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 13: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

D2RQD2RQ automatically converts relational databases intoRDF. It also publishes the data as Linked Data alongwith a SPARQL endpoint.

https://aginfra-sg.ct.infn.it/rdf/cncp/

Page 14: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

2nd Step: Mapping to RDF Vocabularies

ID type Latitudine_WGS84 Longitudine_WGS84 ...

16.4LPhk1-1 observation 42.57 12.93 ...

becomes...

{

" @id " : "http://rdf.entecra.it/soil/ 16.4LPhk1-1 ",

" @type " : " soil:ObservedSoilSite ",

" geo:lat " : "42.57",

" geo:long " : "12.93",

...}

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 15: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

2nd Step: Mapping to RDF Vocabularies

ID type Latitudine_WGS84 Longitudine_WGS84 ...

16.4LPhk1-1 observation 42.57 12.93 ...

becomes...

{

" @id " : "http://rdf.entecra.it/soil/ 16.4LPhk1-1 ",

" @type " : " soil:ObservedSoilSite ",

" geo:lat " : "42.57",

" geo:long " : "12.93",

...}

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 16: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

3rd Step: Linking Data

CNCP

GeoNames

DBpedia

16.4LPhk1-1

“42.57”

geo:lat

“12.93”

geo:

long

gn:6541462

gn:locatedIn

“Rieti”

gn:n

ame

46187

gn:pop

ulation

dbpedia:Rieti

rdfs

:see

Also

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 17: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

3rd Step: Linking Data

CNCP

GeoNames

DBpedia

16.4LPhk1-1

“42.57”

geo:lat

“12.93”

geo:

long

gn:6541462

gn:locatedIn

“Rieti”

gn:n

ame

46187

gn:pop

ulation

dbpedia:Rieti

rdfs

:see

Also

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 18: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

3rd Step: Linking Data

CNCP

GeoNames

DBpedia

16.4LPhk1-1

“42.57”

geo:lat

“12.93”

geo:

long

gn:6541462

gn:locatedIn

“Rieti”

gn:n

ame

46187

gn:pop

ulation

dbpedia:Rieti

rdfs

:see

Also

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 19: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

3rd Step: Linking Data

CNCP

GeoNames

DBpedia

16.4LPhk1-1

“42.57”

geo:lat

“12.93”

geo:

long

gn:6541462

gn:locatedIn

“Rieti”

gn:n

ame

46187

gn:pop

ulation

dbpedia:Rieti

rdfs

:see

Also

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 20: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Apps

We can now build applications that

– make queries via SPARQL endpoints

– crawl data by following links

– integrate various RDF dumps

– query data from multiple sources

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 21: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Apps

We can now build applications that

– make queries via SPARQL endpoints

– crawl data by following links

– integrate various RDF dumps

– query data from multiple sources

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 22: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Apps

We can now build applications that

– make queries via SPARQL endpoints

– crawl data by following links

– integrate various RDF dumps

– query data from multiple sources

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 23: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Apps

We can now build applications that

– make queries via SPARQL endpoints

– crawl data by following links

– integrate various RDF dumps

– query data from multiple sources

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 24: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Future work

– More interlinks between datasets

– Interlinks with AGROVOC and integration in theAGRIS portal

– Move URIs under the entecra.it domain

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 25: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Future work

– More interlinks between datasets

– Interlinks with AGROVOC and integration in theAGRIS portal

– Move URIs under the entecra.it domain

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 26: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Future work

– More interlinks between datasets

– Interlinks with AGROVOC and integration in theAGRIS portal

– Move URIs under the entecra.it domain

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 27: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Acknowledgements

Riccardo Bruno (from INFN) and Gino Barreca (fromCRA) for the help with configuring the serverinfrastructure.

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014

Page 28: agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria Antonietta Polombi

Thank you!

Linked Open Data for germplasm and soil Luca Matteis 20/09/2014