hadoop world 2011: indexing the earth - large scale satellite image processing using hadoop - oliver...
DESCRIPTION
Skybox Imaging is using Hadoop as the engine of it's satellite image processing system. Using CDH to store and process vast quantities of raw satellite image data enables Skybox to create a system that scales as they launch larger numbers of ever more complex satellites. Skybox has developed a CDH based framework that allows image processing specialists to develop complex processing algorithms using native code and then publish those algorithms into the highly scalable Hadoop Map/Reduce interface. This session will provide an overview of how we use hdfs, hbase and map/reduce to process raw camera data into high resolution satellite images.TRANSCRIPT
Indexing the Earth
Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems
HadoopWorld 2011
Session Agenda
2
‣ Skybox
‣ The Big Data problem
‣ Indexing the planet at scale
‣ Questions
HadoopWorld 2011
Today’s data is old
3
Stadium under construction
(completed 2010)
Bridge under construction (completed
2009)
Convention center under construction (completed 2010)
Image taken September 2008. > than
three years old
HadoopWorld 2011
A problem of scale
4
HadoopWorld 2011
Satellite Imagery = Transparency...
215 automobiles
55,245 gallonsof oil crude
6,254containers
43%damage
-15%vegetation
5J F M A M J J A S O N D J F M A M J J A S O N D J F
HadoopWorld 2011
6
The problem ofcapacity
HadoopWorld 2011
7
Sensor networkin space
HadoopWorld 2011
New approach: Many distributed, low-cost satellites
8
HadoopWorld 2011
Total Raw Data compute
• Satellites produce ~1TB of raw data/day
9
0
3.75
7.5
11.25
15
Year1 Year2 Year3 Year4 Year50
5
10
15
20
Dat
a C
aptu
red
per
Yea
r (P
B)
Sen
sors
in N
etw
ork
Title
Sensor NetworkSingle SatelliteSensors in Network
HadoopWorld 2011
Total Raw Data storage
• Satellites produce ~1TB of raw data/day
10
0
7.5
15
22.5
30
Year1 Year2 Year3 Year4 Year50
5
10
15
20
Dat
a C
aptu
red
per
Yea
r (P
B)
Sen
sors
in N
etw
ork
Title
Sensor NetworkSingle SatelliteSensors in Network
HadoopWorld 2011
Enter the elephant
11
HadoopWorld 2011
Hadoop from space - processing bits
12
Hadoop is bad at:
๏Calling native C code or libraries at scale
๏Scientific computing is immature in Java
HadoopWorld 2011
Hadoop from space - processing bits
13
Standard Java Hadoop
๏Hadoop knows where data stored
๏Jobs efficiently scheduled close to data
๏Throughput optimized
HadoopWorld 2011
Hadoop from space - processing bits
14
Hadoop Pipes & Streaming
๏Hadoop schedules jobs without regard to
the data required by the job
๏Native code reads data across the network
๏Drives up network costs and drives down
throughput
HadoopWorld 2011
Hadoop from space - processing bits
15
BusBoy
✓Hadoop manages data reads & writes
✓Hadoop schedules jobs close to the data
✓Jobs read data and hand off to native code
for processing
HadoopWorld 2011
Architecture Overview
16
Hadoop Task
C code
math.libgdal.libcv.lib
BusBoy
Logging ProgressInputs Outputs
Hadoop JobTracker
HDFS HBase Hive
HadoopWorld 2011
Framework Benefits - Deployment
17
✓Low time to first byte
✓Insight into job progress
✓Diagnostics for large scale operations
✓Logging
HadoopWorld 2011
Framework Benifits - Development
18
✓Prototyping outside of Hadoop
✓Rapid turnaround
✓Testable interfaces
HadoopWorld 2011
Skybox providing Big Data
19
✓Produce the most complete and timely data
about the world
✓Make data available to users to mine the raw
data for information
✓Turn Big Data into knowledge, at Earth scale
SkyboxBusBoy
HadoopWorld 2011
20
Simulated from aerial platform using flight sensor
Color Images
HadoopWorld 2011
HD Video
HadoopWorld 2011
Questions?Sample Data?