the earthserver approach to big data analyticsto big data analytics egu 2013 | splinter meeting...
TRANSCRIPT
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
The EarthServer Approach
to Big Data AnalyticsEGU 2013 | Splinter Meeting
Vienna, Austria, 2013-apr-10
Peter Baumann & the EarthServer Team
Jacobs University | rasdaman GmbH
Bremen, Germany
Supported by EU FP7 eInfrastructure, contract 283610 EarthServer
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Roadmap
Motivation
Strictly Standards
Platform technology: the rasdaman analytics server
Conclusion
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Scalable On-Demand Processing for the Earth Sciences• EU funded, 3 years, 5.85 mEUR
• Platform: rasdaman (Array Analytics server)
• Distributed query processing, integrated data/metadata search, 3D clients
Strictly open standards: OGC WMS+WCS+WCPS; W3C Xquery; X3D
6 * 100+ TB databases for all Earth sciences + planetary science
EarthServer: Big Earth Data Analytics
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
„Big Data“: The 4+ Vs
Volume• ngEO plannings 10^12 images under ESA custody
• ECHAM-T40 climate dataset object ~50 TB
Velocity• NASA EOSDIS: 5 TB / day
Variety• Regular / irregular / warped grids; point clouds; TINs; general meshes; ...
Veracity• Quality information, such as 32bit ocean color mask
...plus several more Vs in blogs:• Value, Verisimilitude, Variability, Visualization, ...
[Doug Laney / Gartner & IBM]
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
A Standards-Based Approach
Core element in OGC: geographic feature• = abstraction of some real-world phenomenon
• associated with a location relative to Earth
Special kind of feature: coverage• = space-time varying multi-dimensional phenomenon
• Typical representative: raster image
• ...but there is more!
Typically, coverages are Big Data
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
MultiSolid
Coverage
n-D
Variety of format encodings
The Unifying Coverage Concept
«FeatureType»AbstractCoverage
MultiPoint
Coverage
MultiCurve
Coverage
MultiSurface
Coverage
Grid
Coverage
Referenceable
GridCoverage
Rectified
GridCoverage
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Facing the Coverage Tsunami
coverage
server
7
sensor feeds[OGC SWE]
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
[OGC SWE]sensor feeds
Taming the Coverage Tsunami
8
coverage
server
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Serving Coverages
GMLCOV & WCS:
one generic schema for all coverage
types; n-D; scalable; versatile processing
→ access & processing services
SWE SOS:
high flexibility to accommodate
all sensor types
→ data capturing
SOS WCS
coverage
server
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Web Coverage Service (WCS)
Core: Simple & efficient access to multi-dimensional coverages
subset = trim | slice
WCS Extensions for additional functionality facets• “band extraction”, scaling, reprojection, interpolation, query language, ...
Application Profiles define domain-oriented bundling• EO, MetOcean, ...
10
see next
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Raster Query Language: ad-hoc navigation, extraction, aggregation, analytics
Time series
Image processing
Summary data
Sensor fusion
& pattern mining
OGC Web Coverage Processing Service (WCPS)
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Array DBMS for massive n-D raster data• new database attribute type: array<celltype,extent>
• Data integration: rasters stored in standard database
Extending ISO SQL with array operators: •
“tile streaming” architecture• extensive optimization, hw/sw parallelization
• Scaling from laptop to cloud
In operational use
The rasdaman Raster Analytics Server
select img.green[x0:x1,y0:y1] > 130
from LandsatArchive as img
www.rasdaman.org
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Ex: Spatio-Temporal Image Services
[Diedrich et al 2001]
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Configurable Tiling
Sample tiling strategies [Furtado]:• regular directional area of interest
rasdaman storage layout language
insert into MyCollection
values ...
tiling area of interest [0:20,0:40], [45:80,80:85]
tile size 1000000
index d_index storage array compression zlib
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
In-Situ Databases
Problem: Large-scale data centers sometimes object to import• „cannot duplicate my 100 TB“
Approach: reference external files, use as tiles; cf [Ailamaki et al 2010]
Challenges: efficiency, consistency, caching, ...• Data simultaneously used by other actors in parallel !
• In future: optimized storage of hot spots in DB
insert into PanchromaticPics (GreyImage)
referencing /my/path/ext1.tif [1000:1999,3000:3999], …
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Local: Just-in-time compilation, Multi-threaded tile processing
Global: Cloud parallelization• no single point of failure
• semantic distribution
coverageA
for $b in ( B )
return max(
($b.nir - $b.red) / ($b.nir + $b.red)
)
for $a in ( A )
return max(
($a.nir - $a.red) / ($a.nir + $a.red)
)
Scalability: Parallel Query Processing
coverageB
[Owonibi 2012]
for $a in ( A ), $b in ( B )
return
( max( ($a.nir - $a.red) / ($a.nir + $a.red) )
- max( ($b.nir - $b.red) / ($b.nir + $b.red) )
)
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Building Service Alliances: UDFs
User-Defined Function (UDF) = external code, invoked through query
Anything can be interfaced: R, GDAL, Orfeo Toolbox, ... you name it!• Local or remote
Integrated with rasdaman optimizer
select
encode( orfeo.orthorectify( L0.red – L0.nir ), "png" )
from L0Scenes as L0
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Visual Queries
[ESA VAROS project, 2010]
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
3D Database Visualization
Problem: coupling DB / visualization
Approach: • deliver RGBA image to X3D client,
transparency as height
• Feed directly into client GPU
select
encode(
{ red: (char) s.b7[x0:x1,x0:x1],
green: (char) s.b5[x0:x1,x0:x1],
blue: (char) s.b0[x0:x1,x0:x1],
alpha: (char) scale( d, 20 )
},
"png"
)
from SatImage as s, DEM as d [JacobsU, Fraunhofer 2012]
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Rasdaman Impact
Has pioneered Array Database researchfirst appearance in literature
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Rasdaman Impact
Has established the research field of Array Databases
Registered GEOSS component for Big Data Analytics
OGC WCS Core, OGC WCPS reference implementation
Experience heavily drives OGC coverage & ISO Array standardization
rasdaman hits 2013+
EarthServer :: EGU 2013, Vienna :: ©2012 Peter Baumann
Conclusion
Sensor, image, simulation, & statistics data
= a main source of Big Data in Earth Sciences• Petrol industry has „more bytes than barrels“
OGC W*S standards offer coverage interoperability• www.ogcnetwork.net/wcs
Core paradigm: high-level integrated query language as c/s interface• data + metadata search in same query
• Semantic interoperability through standardized syntax & semantics
• Visual clients can hide QL
• Manifold optimizations, parallelization on rasdaman server
EarthServer introduces Agile Analytics
www.earthserver.eu