ahm 2014: crawling for earthcube

Post on 05-Dec-2014

137 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation by Ruth Duerr during the lunch & learn sessions on Day 2, June 25 at the EarthCube All-Hands Meeting

TRANSCRIPT

Crawling for EarthCube

Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo

Outline

• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community

2

NSIDC: An overview2

Cooperative Institute for Research in Environmental Sciences

Main sponsors:

University of Colorado Boulder

NSIDC affiliations and sponsorship

National Science Foundation NASA National Oceanographic and Atmospheric Administration

The National Snow and Ice Data Center…

Provides tools for

data access

Researches the cryosphere and data science

Educates the public about the

cryosphereSupports data users

Manages and distributes scientific data

Supports local and traditional

knowledge

Outline

• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community

2

Why not let Google do it?

• What's their incentive? • The schema.org route for data has extreme limitations

2

Ways to build a comprehensive catalog

• Ask folks to register their data and services • Build your catalog by hand • Automate discovery of data and services

2

Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation

What if...

Advertising your data so that everyone could find them, were as simple as...

1 - Filling out a web form 2 - Saving it to your website 3 - Adding its link to your site

Well... It can be!

Why not let Google do it?

2

Outline

• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community

2

Crawler Big Picture

2

BCube Crawler

BCube Broker

CINERGI

Crawler Architecture

2

Things we are going to search for

• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL

2

Things we are going to search for

• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL

2

But what else should we look for?

16

Questions/Comments

top related