ibm pairs -a big physical data service to accelerate ...€¦ · automated data curation and...

26
http://pairs.watson.ibm.com:8080/pairs/ IBM PAIRS - A Big Physical Data Service to Accelerate Analytics and Discovery Siyuan Lu et al. IBM TJ Watson Research Center Yorktown Heights, NY, USA https://pairs.res.ibm.com

Upload: others

Post on 28-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

http://pairs.watson.ibm.com:8080/pairs/

IBM PAIRS - A Big Physical Data Service to Accelerate

Analytics and DiscoverySiyuan Lu et al.

IBM TJ Watson Research CenterYorktown Heights, NY, USA

https://pairs.res.ibm.com

Page 2: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler
Page 3: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

More about IBM’s precision agriculture work

20 Jan. 2015

This year's winner was a collaborative experiment by E. & J. Gallo and IBM, whose approach used a variable-rate irrigation system across separate quadrants of a 31-acre Cabernet Sauvignon vineyard. The result decreased vineyard spatial variability and increased water-use efficiency without compromising quality during a period of historic drought.

Page 4: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

(1) Introduction to PAIRS (Physical Analytics Information Repository System)

(2) Analytics Leveraging PAIRS• Crop acreage estimate

• Industrial productivity

• Long term weather forecasting

• Super resolution satellite images

Outline

44© 2017 IBM Corporation

Page 5: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Comparison of Approaches

TRADITIONAL PAIRS

Each data set is kept at a different place in different formats, projections, units….

PetaBytes of data stored in billions of “individual scenes”.

Analyst order scenes from each source: − Download, assemble, resample, reproject, align …

Data moved to application, do not scale.

Large scale, pre-processed big data store with pre-aligned layers−Common formats and projections – global

reference

−Spatial & temporal joins

Intelligent query system / discover tool.

Analyze without moving data to drastically accelerate query/discover.

Access to new layers of analytics. LandSat scenes/tiles from different

satellite passes Pre-aligned layers Global Mastergrid

https://pairs.res.ibm.com 5© 2017 IBM Corporation

Page 6: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

{"spatial":{

"type":"square", "aoi":null, "coordinates":[38.0,-122.0,48,-112]},

"temporal":{"intervals":[{"start":"2017-07-27T00:00:00Z","end":"2017-07-27T00:00:00Z"}]

},"layers":[

{"id":93,"alias":null,"aggregation":"Mean","filter":"GT 270"

},{"id":94,"alias":null, "aggregation":"Min“

}]

}

Interaction with PAIRS via RESTful APIs

Page 7: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

IBM

Clo

ud

Use

rs

IBM PAIRS Architecture

7

Automated data curation and ingestion

Scientists create the process and computer scales it up

Big data bus & scheduler

Data filtering & re-mapping/re-projection

Hadoop/HBase system for large scale data analytics

New data nodes added when data expand

California Continent of USA Global

Large scale analytics linking space and time automatically with the key design

A significant portion of geospatial analytics done on the data bring analytics to the data

Interactive web interface and API

Visualize & download data

© 2017 IBM Corporation

Page 8: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Z-order transfers 2D data into 1D Key is a combination of spatial and temporal information Global grid cell resolution spans from 0.8 m grid cell to 260 km grid cell All resolution layer are nested and aligned at lower left corner of cell grid

Levels Δθ, Δφ[degree]

Δy [km]Δx[km](φ=0O)

Δx [km] (φ=40O)

26 0.000008 0.00089 0.00067

25 0.000016 0.00178 0.00134

24 0.000032 0.00356 0.00268

23 0.000064 0.00712 0.00536

22 0.000128 0.01424 0.01072

21 0.000256 0.02848 0.02144

… … … …

1 268.43546 29863.444 22481.469

PAIRS Uses Global Reference System

8© 2017 IBM Corporation

Page 9: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

PAIRS Cluster

100TB 16TBHDFS / HBase1 PB

>24 Data nodesX86; > 260 cores;

4TB memory

Master backupX86; 8 cores;

96GB memoryX86; 32 cores;

~150GB memoryX86; 24 cores;

~100GB memory

9© 2017 IBM Corporation

Page 10: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

PAIRS “Scales” to Big Data

PAIRS queries (almost) independent of data size

Conventional systems require more time for larger data sizes

10© 2017 IBM Corporation

Page 11: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Upload Your Own Data to PAIRS

11

User data ingestion via FTP or REST API

aligns automatically user data with other PAIRS data and analytics

makes user data searchable along with other PAIRS data and analytics

Example: Drone images from Watson Research Center

Example: Curated images in PAIRS from Watson Research Center

PAIRS drone data overlaid on Google mapLocalization accuracy better than 1 m

11

© 2017 IBM Corporation

Page 12: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

(1) Introduction to PAIRS (Physical Analytics Information Repository System)

(2) Analytics Leveraging PAIRS• Industrial productivity

• Crop acreage estimate

• Long term weather forecasting

• Super resolution satellite images

Outline

12© 2017 IBM Corporation

Page 13: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

NDVI Prediction of early June 2012:Percentage of Area in Greene County, IACorn: 51 %Soybean: 29 %

PAIRS Analytics: Early Crop Recognition Greene County, IA - NASS Results

13© 2017 IBM Corporation

Page 14: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Acreage Prediction using Satellite Remote Sensing

NDVI 1, early seasonCorn NDVI rises faster than soy.

2. LateSeasonCorn<Soy

other

NDVI 2, Late seasonCorn NDVI rises faster than soy.

Corn vs. Soy – NDVI (normalized vegetation index)as a function of Day of Year

14© 2017 IBM Corporation

Page 15: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Corn vs. Soy classification using NDVIs– the Challenge of Inter-Year Variation

(Madison County, NA)

• Simplified version using only two NDVI values.

• If one applies same set of planes to divide corn/soy clusters for all years, it will lead to large error!

• How to deal with inter-year variation of NDVI profile?

• Mitigation: Increase the dimensionality of the classification model (including weather parameters) to help separate corn/soy clusters.

corn/soy

15© 2017 IBM Corporation

Page 16: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Insight: the Impact of Weather on NDVI Changes

2012

2009Averaged over growth seasonDoY 170-240

Dry, Warm

Wet, Cool

SVM Classifier

Predictor variables:• NDVI, early and late season data points• Mean Precipitation• Mean T_min, Tmax• Mean Solar Irradiance

Response variables (training target):• Crop Type (Corn/Soy)

16© 2017 IBM Corporation

Page 17: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

year ������ (x105) Actual (x105)2009 1.334 1.4002010 1.331 1.340

2011 1.448 1.4652012 1.420 1.4582013 1.478 1.5262014 1.450 1.4072015 1.408 1.4302016 1.424

year ������ (x105) Actual (x105)2009 1.878 1.7552010 1.755 1.760

2011 1.7609 1.8202012 1.787 1.8852013 1.839 1.8552014 1.833 1.7502015 1.749 1.7352016 1.801

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017120,000

130,000

140,000

150,000

160,000

Co

rn P

lante

d

Year

Prediction Actual

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017160,000

170,000

180,000

190,000

200,000

Corn

Pla

nte

d

Year

Predicted Actual

Mean Absolute Percentage Error Normalized by Corn and Soy Acreage

2009-2015

Madison County, NE Corn Planted Greene County, IA Corn Planted

Crop Acreage Prediction Results

17© 2017 IBM Corporation

Page 18: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

• Potential of Satellite Remote Sensing of Industrial Activity/Pollution

• Analyze industrial activity (and associated pollution) using

satellite remote sensing of plant heat generation.• Results are nearly impossible to counterfeit.

• Technology tested for steel and aluminum plants with independent validation.

Satellite Thermal IR

Surface Temp. Map

Plant heatdissipation

Productivity/ Pollution

Plant Selection

Atmospheric Analysis

Local Temp.Wind speed etc.

Radiative TransferModel

Heat BalanceModel

Physical modeling to extract useful features

Method:

Statistics

18 18© 2017 IBM Corporation

Page 19: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

19

10/15/2014 11/16/2014 12/18/2014

Heat Generation priori to Statistical Correction

APEC (Asia-Pacific Economic Cooperation)Forum Hosted in Beijing

Chengde Steel, Hebei, China

Thermal Infrared Monitoring of Steel Plants

~Weekly Resolution19© 2017 IBM Corporation

Page 20: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Drone data of a farm in Netherland – Lansat image for resolution comparison

20© 2017 IBM Corporation

Page 21: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Point Cloud generations from drone images

• Metrology-identify infrastructure elements-estimate size-locate them in 3D space

• Inspection of infrastructure-Rusting-Structural health

• Digital blueprints-create digital infra structures’ blueprint-integrate 3D images in models

Page 22: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

AF Chasea et al., Geospatial revolution and remote sensing LiDAR in Mesoamerican archaeology, PNAS 109 (32), 12921 (2012).

Use of LIDAR to Locate Ancient Structures for Archeological Exploration

© 2017 IBM Corporation 22

Page 23: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

LIDAR Data Pre-Processing

Tree Leaves Removed

Terrain Variation Removed

Structures Extracted

Raw LIDAR Input

23 © 2017 IBM Corporation 23

Page 24: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

24

Identification of Ancient Structures

Structure identified

Matching manual search of expert.

Error Rate: Missing ~5%

© 2017 IBM Corporation 24

Page 25: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

Summary

1. Curated data in common formats and given resolution layers

2. Completely scalable to larger data and queries (Big data platform)

3. Rapid Cross-layer queries capabilities

• Find all areas with clay soil and annual precipitation of more than 40 inches

4. Show-me-where capabilities

• Show me where there will be a wetter spring than average and where the population density is larger than 10000 people per square mile

5. Enabling Unique layers of analytics

• Industrial productivity

• Crop acreage estimate

• Drone image integration

25© 2017 IBM Corporation

https://pairs.res.ibm.com

Page 26: IBM PAIRS -A Big Physical Data Service to Accelerate ...€¦ · Automated data curation and ingestion Scientists create the process and computer scales it up Big data bus & scheduler

The Team

Levente Klein(Physical Modeling)

Vanessa Lopez(Mathematics,)

Siyuan Lu(Machine learning)

Hendrik Hamann( Physical Analytics)

Michael Schappert(Embedded System)

Fernando Marianno(Software Architect)

Xiaoyan Shao(Electrochemistry,

Data scientist)Ildar Khabibrakhmanov(Software Engineering)

Theodore van Kessel(Oil and Gas &

Instrumentation)

Rong Chang(Cloud Architecture)

Conrad Albrecht(Physics and Computation)

Marcus Freitag(Precision Agriculture)

Ramachandran Muralidhar(Corrosion Science & Pollution

Modeling)

and many more

Oki Gunawan(Solar and Robotics)

Jason Renwick(Drone 3D imaging)

Bruce Elmegreen(Astrophysics, Traffic)

Wang Zhou(Robotics, drones)

© 2016 IBM Corporation

Johannes Schmude(Physics and Computation)