ibm pairs -a big physical data service to accelerate ...€¦ · automated data curation and...
TRANSCRIPT
http://pairs.watson.ibm.com:8080/pairs/
IBM PAIRS - A Big Physical Data Service to Accelerate
Analytics and DiscoverySiyuan Lu et al.
IBM TJ Watson Research CenterYorktown Heights, NY, USA
https://pairs.res.ibm.com
More about IBM’s precision agriculture work
20 Jan. 2015
This year's winner was a collaborative experiment by E. & J. Gallo and IBM, whose approach used a variable-rate irrigation system across separate quadrants of a 31-acre Cabernet Sauvignon vineyard. The result decreased vineyard spatial variability and increased water-use efficiency without compromising quality during a period of historic drought.
(1) Introduction to PAIRS (Physical Analytics Information Repository System)
(2) Analytics Leveraging PAIRS• Crop acreage estimate
• Industrial productivity
• Long term weather forecasting
• Super resolution satellite images
Outline
44© 2017 IBM Corporation
Comparison of Approaches
TRADITIONAL PAIRS
Each data set is kept at a different place in different formats, projections, units….
PetaBytes of data stored in billions of “individual scenes”.
Analyst order scenes from each source: − Download, assemble, resample, reproject, align …
Data moved to application, do not scale.
Large scale, pre-processed big data store with pre-aligned layers−Common formats and projections – global
reference
−Spatial & temporal joins
Intelligent query system / discover tool.
Analyze without moving data to drastically accelerate query/discover.
Access to new layers of analytics. LandSat scenes/tiles from different
satellite passes Pre-aligned layers Global Mastergrid
https://pairs.res.ibm.com 5© 2017 IBM Corporation
{"spatial":{
"type":"square", "aoi":null, "coordinates":[38.0,-122.0,48,-112]},
"temporal":{"intervals":[{"start":"2017-07-27T00:00:00Z","end":"2017-07-27T00:00:00Z"}]
},"layers":[
{"id":93,"alias":null,"aggregation":"Mean","filter":"GT 270"
},{"id":94,"alias":null, "aggregation":"Min“
}]
}
Interaction with PAIRS via RESTful APIs
IBM
Clo
ud
Use
rs
IBM PAIRS Architecture
7
Automated data curation and ingestion
Scientists create the process and computer scales it up
Big data bus & scheduler
Data filtering & re-mapping/re-projection
Hadoop/HBase system for large scale data analytics
New data nodes added when data expand
California Continent of USA Global
Large scale analytics linking space and time automatically with the key design
A significant portion of geospatial analytics done on the data bring analytics to the data
Interactive web interface and API
Visualize & download data
© 2017 IBM Corporation
Z-order transfers 2D data into 1D Key is a combination of spatial and temporal information Global grid cell resolution spans from 0.8 m grid cell to 260 km grid cell All resolution layer are nested and aligned at lower left corner of cell grid
Levels Δθ, Δφ[degree]
Δy [km]Δx[km](φ=0O)
Δx [km] (φ=40O)
26 0.000008 0.00089 0.00067
25 0.000016 0.00178 0.00134
24 0.000032 0.00356 0.00268
23 0.000064 0.00712 0.00536
22 0.000128 0.01424 0.01072
21 0.000256 0.02848 0.02144
… … … …
1 268.43546 29863.444 22481.469
PAIRS Uses Global Reference System
8© 2017 IBM Corporation
PAIRS Cluster
100TB 16TBHDFS / HBase1 PB
>24 Data nodesX86; > 260 cores;
4TB memory
Master backupX86; 8 cores;
96GB memoryX86; 32 cores;
~150GB memoryX86; 24 cores;
~100GB memory
9© 2017 IBM Corporation
PAIRS “Scales” to Big Data
PAIRS queries (almost) independent of data size
Conventional systems require more time for larger data sizes
10© 2017 IBM Corporation
Upload Your Own Data to PAIRS
11
User data ingestion via FTP or REST API
aligns automatically user data with other PAIRS data and analytics
makes user data searchable along with other PAIRS data and analytics
Example: Drone images from Watson Research Center
Example: Curated images in PAIRS from Watson Research Center
PAIRS drone data overlaid on Google mapLocalization accuracy better than 1 m
11
© 2017 IBM Corporation
(1) Introduction to PAIRS (Physical Analytics Information Repository System)
(2) Analytics Leveraging PAIRS• Industrial productivity
• Crop acreage estimate
• Long term weather forecasting
• Super resolution satellite images
Outline
12© 2017 IBM Corporation
NDVI Prediction of early June 2012:Percentage of Area in Greene County, IACorn: 51 %Soybean: 29 %
PAIRS Analytics: Early Crop Recognition Greene County, IA - NASS Results
13© 2017 IBM Corporation
Acreage Prediction using Satellite Remote Sensing
NDVI 1, early seasonCorn NDVI rises faster than soy.
2. LateSeasonCorn<Soy
other
NDVI 2, Late seasonCorn NDVI rises faster than soy.
Corn vs. Soy – NDVI (normalized vegetation index)as a function of Day of Year
14© 2017 IBM Corporation
Corn vs. Soy classification using NDVIs– the Challenge of Inter-Year Variation
(Madison County, NA)
• Simplified version using only two NDVI values.
• If one applies same set of planes to divide corn/soy clusters for all years, it will lead to large error!
• How to deal with inter-year variation of NDVI profile?
• Mitigation: Increase the dimensionality of the classification model (including weather parameters) to help separate corn/soy clusters.
corn/soy
15© 2017 IBM Corporation
Insight: the Impact of Weather on NDVI Changes
2012
2009Averaged over growth seasonDoY 170-240
Dry, Warm
Wet, Cool
SVM Classifier
Predictor variables:• NDVI, early and late season data points• Mean Precipitation• Mean T_min, Tmax• Mean Solar Irradiance
Response variables (training target):• Crop Type (Corn/Soy)
16© 2017 IBM Corporation
year ������ (x105) Actual (x105)2009 1.334 1.4002010 1.331 1.340
2011 1.448 1.4652012 1.420 1.4582013 1.478 1.5262014 1.450 1.4072015 1.408 1.4302016 1.424
year ������ (x105) Actual (x105)2009 1.878 1.7552010 1.755 1.760
2011 1.7609 1.8202012 1.787 1.8852013 1.839 1.8552014 1.833 1.7502015 1.749 1.7352016 1.801
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017120,000
130,000
140,000
150,000
160,000
Co
rn P
lante
d
Year
Prediction Actual
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017160,000
170,000
180,000
190,000
200,000
Corn
Pla
nte
d
Year
Predicted Actual
Mean Absolute Percentage Error Normalized by Corn and Soy Acreage
2009-2015
Madison County, NE Corn Planted Greene County, IA Corn Planted
Crop Acreage Prediction Results
17© 2017 IBM Corporation
• Potential of Satellite Remote Sensing of Industrial Activity/Pollution
• Analyze industrial activity (and associated pollution) using
satellite remote sensing of plant heat generation.• Results are nearly impossible to counterfeit.
• Technology tested for steel and aluminum plants with independent validation.
Satellite Thermal IR
Surface Temp. Map
Plant heatdissipation
Productivity/ Pollution
Plant Selection
Atmospheric Analysis
Local Temp.Wind speed etc.
Radiative TransferModel
Heat BalanceModel
Physical modeling to extract useful features
Method:
Statistics
18 18© 2017 IBM Corporation
19
10/15/2014 11/16/2014 12/18/2014
Heat Generation priori to Statistical Correction
APEC (Asia-Pacific Economic Cooperation)Forum Hosted in Beijing
Chengde Steel, Hebei, China
Thermal Infrared Monitoring of Steel Plants
~Weekly Resolution19© 2017 IBM Corporation
Drone data of a farm in Netherland – Lansat image for resolution comparison
20© 2017 IBM Corporation
Point Cloud generations from drone images
• Metrology-identify infrastructure elements-estimate size-locate them in 3D space
• Inspection of infrastructure-Rusting-Structural health
• Digital blueprints-create digital infra structures’ blueprint-integrate 3D images in models
AF Chasea et al., Geospatial revolution and remote sensing LiDAR in Mesoamerican archaeology, PNAS 109 (32), 12921 (2012).
Use of LIDAR to Locate Ancient Structures for Archeological Exploration
© 2017 IBM Corporation 22
LIDAR Data Pre-Processing
Tree Leaves Removed
Terrain Variation Removed
Structures Extracted
Raw LIDAR Input
23 © 2017 IBM Corporation 23
24
Identification of Ancient Structures
Structure identified
Matching manual search of expert.
Error Rate: Missing ~5%
© 2017 IBM Corporation 24
Summary
1. Curated data in common formats and given resolution layers
2. Completely scalable to larger data and queries (Big data platform)
3. Rapid Cross-layer queries capabilities
• Find all areas with clay soil and annual precipitation of more than 40 inches
4. Show-me-where capabilities
• Show me where there will be a wetter spring than average and where the population density is larger than 10000 people per square mile
5. Enabling Unique layers of analytics
• Industrial productivity
• Crop acreage estimate
• Drone image integration
25© 2017 IBM Corporation
https://pairs.res.ibm.com
The Team
Levente Klein(Physical Modeling)
Vanessa Lopez(Mathematics,)
Siyuan Lu(Machine learning)
Hendrik Hamann( Physical Analytics)
Michael Schappert(Embedded System)
Fernando Marianno(Software Architect)
Xiaoyan Shao(Electrochemistry,
Data scientist)Ildar Khabibrakhmanov(Software Engineering)
Theodore van Kessel(Oil and Gas &
Instrumentation)
Rong Chang(Cloud Architecture)
Conrad Albrecht(Physics and Computation)
Marcus Freitag(Precision Agriculture)
Ramachandran Muralidhar(Corrosion Science & Pollution
Modeling)
and many more
Oki Gunawan(Solar and Robotics)
Jason Renwick(Drone 3D imaging)
Bruce Elmegreen(Astrophysics, Traffic)
Wang Zhou(Robotics, drones)
© 2016 IBM Corporation
Johannes Schmude(Physics and Computation)