how to create the google for earth data (xldb 2015, stanford)

13
XLDB CONFERENCE 2015, STANFORD UNIVERSITY MAY 2015 How To Create the Google for Earth Data Rainer Sternfeld CEO & co-founder in the example of NOAA Big Data Project

Upload: rainer-sternfeld

Post on 27-Jul-2015

669 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: How to Create the Google for Earth Data (XLDB 2015, Stanford)

XLDB CONFERENCE 2015, STANFORD UNIVERSITY MAY 2015

How To Create the Google for Earth Data

Rainer SternfeldCEO & co-founder

in the example of NOAA Big Data Project

Page 2: How to Create the Google for Earth Data (XLDB 2015, Stanford)

2 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

Finding and accessing the right data is really hard Planet OS Data Discovery makes it easy

Crawl the web and index the most recent data without moving the data itself until you need it

Page 3: How to Create the Google for Earth Data (XLDB 2015, Stanford)

3 August 2014

Cloud Platform for Industrial Sensor Networks

Building with: Scala, Akka RabbitMQ, Kafka HDFS, HBase, Elasticsearch Spark, Spark Streaming GIS Libraries (raster, geometry)Compressed array formats

Sensor Data Discovery & Exchange Search, Exploration Visualisation, Analytics Data Management Marketplace APIs

Page 4: How to Create the Google for Earth Data (XLDB 2015, Stanford)

4 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

marinexplore.org — the biggest deployment of Planet OS 43,000+ data streams, 35 organizations, 8,000 users

Data Discovery Raster data, heatmap overlays Access with third party applications

Raster data with quiver plot overlays Rich graph visualizations Build custom datasets

Page 5: How to Create the Google for Earth Data (XLDB 2015, Stanford)

5 April 2015

Marine data website Visualising Browsing Filtering Export

Built with: Python/Cython RabbitMQ, Celery Postgres, PostGIS, VerticaGIS Libraries (raster, geometry) NumPy

marinexplore.org

Page 6: How to Create the Google for Earth Data (XLDB 2015, Stanford)

6 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

How to make NOAA’s large-scale weather and climate data easily discoverable and machine-readable?

Real-time weather & climate data is a global multi-billion dollar opportunity

Page 7: How to Create the Google for Earth Data (XLDB 2015, Stanford)

7 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

NOAA’s Challenge

• Tens of thousands of devices deployed in the ocean, on land, and space

• Critical for the government and industries

• Hundreds scattered web services (FTPs, flat files, THREDDS/OPeNDAP API)

• Data grows 10TB+ per day

• 26,595 NOAA datasets with ISO-19139 metadata

• 4,894 NOAA datasets with OpenDAP interface

Page 8: How to Create the Google for Earth Data (XLDB 2015, Stanford)

8 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

So what’s wrong with how it’s done now?

An enthusiastic young researcher starts downloading data to an

external HDD connected to his laptop — data keeps coming,

external HDDs pile up…

the beard’s getting longer and longer, and when the data is

finally downloaded, we have a middle-aged, bored professor

Page 9: How to Create the Google for Earth Data (XLDB 2015, Stanford)

9 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

Technical Challenges in the NOAA Big Data Project

• Storing, processing, and indexing spatio-temporal time-series & array data

• Processing data at 10s (100s) of TB/day

• Transporting and processing archives at volumes 10s (100s) of PB

• Disseminating real-time data at latency of minutes

• Indexing 100K+ logical datasets and 100M+ technical datasets/files

• Providing uniform API/export for various data formats/protocols/projections

Page 10: How to Create the Google for Earth Data (XLDB 2015, Stanford)

10 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

Potential solutions under consideration

• Indexing spatio-temporal and semantic metadata

• Indexing downsampled remote datasets acquired via OpenDAP and others

• Store chunked array data (MBs) in object store (e.g. Amazon S3)

• Provide on-demand computational infrastructure for analyzing data (e.g. Amazon)

Page 11: How to Create the Google for Earth Data (XLDB 2015, Stanford)

11 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

What would make it even better?

• Incremental data compression

• BitTorrent-like data dissemination

• Sending pre-filtered data to consumers (e.g. by area)

• Computations scheduled next to the data storage

• Fast interconnect (10 GBit / Inifiniband) and GPUs

• Run analytical scripts (eg IPython Notebook, Matlab) to

work with array data in the cloud

Page 12: How to Create the Google for Earth Data (XLDB 2015, Stanford)

12 MAY 2015XLDB CONFERENCE 2015, STANFORD UNIVERSITY

DATA EXCHANGE

APIsPLANET OS

ALGORITHMS

3RD PARTY INTEGRATIONS

DISCOVER

COLLABORATE

VISUALIZE

ENTERPRISE DATA

ALL PUBLIC DATA

MODELS APPS

ANALYZE

Page 13: How to Create the Google for Earth Data (XLDB 2015, Stanford)

We index your world