hadoop world 2011: indexing the earth - large scale satellite image processing using hadoop - oliver...

22
Indexing the Earth Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems [email protected]

Upload: cloudera-inc

Post on 02-Jun-2015

4.511 views

Category:

Technology


0 download

DESCRIPTION

Skybox Imaging is using Hadoop as the engine of it's satellite image processing system. Using CDH to store and process vast quantities of raw satellite image data enables Skybox to create a system that scales as they launch larger numbers of ever more complex satellites. Skybox has developed a CDH based framework that allows image processing specialists to develop complex processing algorithms using native code and then publish those algorithms into the highly scalable Hadoop Map/Reduce interface. This session will provide an overview of how we use hdfs, hbase and map/reduce to process raw camera data into high resolution satellite images.

TRANSCRIPT

Page 1: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

Indexing the Earth

Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems

[email protected]

Page 2: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Session Agenda

2

‣ Skybox

‣ The Big Data problem

‣ Indexing the planet at scale

‣ Questions

Page 3: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Today’s data is old

3

Stadium under construction

(completed 2010)

Bridge under construction (completed

2009)

Convention center under construction (completed 2010)

Image taken September 2008. > than

three years old

Page 4: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

A problem of scale

4

Page 5: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Satellite Imagery = Transparency...

215 automobiles

55,245 gallonsof oil crude

6,254containers

43%damage

-15%vegetation

5J F M A M J J A S O N D J F M A M J J A S O N D J F

Page 6: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

6

The problem ofcapacity

Page 7: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

7

Sensor networkin space

Page 8: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

New approach: Many distributed, low-cost satellites

8

Page 9: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Total Raw Data compute

• Satellites produce ~1TB of raw data/day

9

0

3.75

7.5

11.25

15

Year1 Year2 Year3 Year4 Year50

5

10

15

20

Dat

a C

aptu

red

per

Yea

r (P

B)

Sen

sors

in N

etw

ork

Title

Sensor NetworkSingle SatelliteSensors in Network

Page 10: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Total Raw Data storage

• Satellites produce ~1TB of raw data/day

10

0

7.5

15

22.5

30

Year1 Year2 Year3 Year4 Year50

5

10

15

20

Dat

a C

aptu

red

per

Yea

r (P

B)

Sen

sors

in N

etw

ork

Title

Sensor NetworkSingle SatelliteSensors in Network

Page 11: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Enter the elephant

11

Page 12: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

12

Hadoop is bad at:

๏Calling native C code or libraries at scale

๏Scientific computing is immature in Java

Page 13: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

13

Standard Java Hadoop

๏Hadoop knows where data stored

๏Jobs efficiently scheduled close to data

๏Throughput optimized

Page 14: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

14

Hadoop Pipes & Streaming

๏Hadoop schedules jobs without regard to

the data required by the job

๏Native code reads data across the network

๏Drives up network costs and drives down

throughput

Page 15: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

15

BusBoy

✓Hadoop manages data reads & writes

✓Hadoop schedules jobs close to the data

✓Jobs read data and hand off to native code

for processing

Page 16: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Architecture Overview

16

Hadoop Task

C code

math.libgdal.libcv.lib

BusBoy

Logging ProgressInputs Outputs

Hadoop JobTracker

HDFS HBase Hive

Page 17: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Framework Benefits - Deployment

17

✓Low time to first byte

✓Insight into job progress

✓Diagnostics for large scale operations

✓Logging

Page 18: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Framework Benifits - Development

18

✓Prototyping outside of Hadoop

✓Rapid turnaround

✓Testable interfaces

Page 19: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Skybox providing Big Data

19

✓Produce the most complete and timely data

about the world

✓Make data available to users to mine the raw

data for information

✓Turn Big Data into knowledge, at Earth scale

SkyboxBusBoy

Page 20: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

20

Simulated from aerial platform using flight sensor

Color Images

Page 21: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

HD Video

Page 22: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Questions?Sample Data?

[email protected]