lynnes-big data task force - june 2016-2 - amazon...

36
Big Data in the Earth Observing System Data and Informa8on System Chris Lynnes (System Architect) Ka4e Baynes (System Architect) Mark McInerney (Deputy Project Manager) NASA/GSFC, ESDIS Project

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Big  Data  in  the  Earth  Observing  System  Data  and  Informa8on  System  

Chris  Lynnes  (System  Architect)  Ka4e  Baynes  (System  Architect)  Mark  McInerney  (Deputy  Project  Manager)  NASA/GSFC,  ESDIS  Project  

Page 2: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

EOSDIS

Earth  Observing  System  Data  and  Informa8on  System  (EOSDIS)  

2

Applica8ons  

capture and

clean

data downlink

Educa8on  

process

archive

subset

distribute

Research  

Users

Page 3: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

EOSDIS

Earth  Observing  System  Data  and  Informa8on  System  (EOSDIS)  

3

Applica8ons  

capture and

clean

data downlink

Educa8on  

process

archive

subset

distribute

Research  

Users

Page 4: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Take  Home  Message  Preview  

1. Cloud  prototypes  are  underway  to  tackle  the  Volume  challenge  of  Big  Data…  2. ...But  advances  in  computer  hardware  or  cloud  won’t  help  (much)  with  Variety  3.   Interoperability  standards,  conven8ons,  and  community  engagement  are  the  key  to  addressing  Variety  

4

Page 5: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

V  is  for...  

5

Archive  Volume  

Distribu8on  Volume  

Petabytes  

...Volume  

Page 6: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Big  Data  Indicators  

6

EOSDIS FY2015 Metrics

Unique Data Products 9,462

Distinct Users of EOSDIS Data and Services 2.6 M

Average Daily Archive Growth 16 TB/day

Total Archive Volume (as of Sept. 30, 2015) 14.6 PB

End User Distribution Products 1.42 B

End User Average Daily Distribution Volume 32.1 TB/day

Page 7: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

EOSDIS  Cloud  Prototypes  

Archive  Mgmt  

Analy8cs  Support  

Applica8on  Hos8ng  

Key  

Page 8: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Archive  Cloud  Prototypes  

Key  

Benefits from Archive in the Cloud ‣  Cost savings for storage of Big Data?

‣  Avoid data downloading and local data mgmt

‣  Alaska Satellite Facility Web Object Storage prototype ‣  Distribute Sentinel radar data from Amazon storage

‣  Global Imagery Browse Service in the Cloud ‣  Ingest and Archive management prototype

Page 9: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Cloud  Analy8cs  Prototypes  

Analysis support toolbox to attract users to cloud analytics

‣  Community open source tools

‣  DAAC-developed tools

‣  Cloud analytics examples and recipes

‣  Initial cross-DAAC proof of concept in progress based on Python + Jupyter Hub

Benefits from Cloud Analytics ‣  Analyze data at scale

‣  Analyze datasets together easily

‣  Avoid data downloading and local mgmt

Page 10: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

1.  Vendor  Lock-­‐in  2.  Future  storage  costs  3. Uncapped  egress  costs  4.  Security  Restric8ons    5. Network  trust  

Terra  Incognita  

Page 11: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

V  is  for...  

11

...Variety

Page 12: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Instrument  Variety  

12

Page 13: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Satellite  Instrument  “Footprints”  

13 Microwave Limb Scanner (from Algorithm Theoretical Basis Document, Livesey and Wu, 1999)

Example Imaging Footprint Example LIDAR footprint

Example Limb Scanning Footprint

Page 14: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

AircraZ  and  In  Situ  

14

Page 15: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Same  Instrument,  Different  Satellite  

15

Aqua   Terra  

Aqua  -­‐  Terra  

Page 16: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Same  Instrument+Satellite,  Different  Algorithm  

16

MODIS on Aqua Aerosol Optical Depth Dark Target Algorithm Deep Blue Algorithm

Page 17: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Processing  Levels  

17

Wavelength (microns)

Rad

ianc

e

AIRS data for 2011-08-11

Level 1B Calibrated radiance at a pixel

Level 2 Carbon monoxide for one scene

Level 3 Global carbon monoxide for one night

Page 18: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Projec8ons  

18

Page 19: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Time  Aggrega8on  

19

Aerosol Optical Depth at 555 nm from Multi-angle Imaging Spectro-Radiometer

Daily

Monthly

Page 20: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Spa8al  Aggrega8on  

20

SeaWiFS  Deep-­‐Blue  Aerosol  Op4cal  Depth  2006-­‐10-­‐06    1.0  Degree  Resolu8on              0.5  Degree  Resolu8on  

Page 21: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Data  Formats  

●  Self-­‐Describing  API-­‐Based  ■  Hierarchical  Data  Format  (HDF)  ■  network  Common  Data  Form  (netCDF)  

●  Addi8onal  conven8ons  ■  HDF-­‐EOS  ■  Climate-­‐Forecast  coordinates  

●  Other  Standards  ■  Gridded  Binary  (GRIB)  ■  ICARTT  (Airborne)  

●  Binary  ●  ASCII  

21

Page 22: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Solu8ons  to  the  Variety  Problem  

22

1. Interoperable  discipline-­‐focused  DAACs  2. Common  Metadata  Repository  

3. OPeNDAP*  data  services  4. Community  engagement  

*Open-source Project for a Network Data Access Protocol

Page 23: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Discipline-­‐Focused    Distributed  Ac8ve  Archive  Centers  (DAACs)  

23

Page 24: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Different  DAACs  have  different    “-­‐Spheres  of  Influence”  

24

DAAC Atmo Hydro Cryo Litho Bio Anthropo Sub-Specialty

Alaska Satellite Facility ✓ ✓ SAR

Atm. Sciences Data Center ✓

Crustal Dynamics Data Info Sys ✓ Space geodesy

Global Hydrology Resource Ctr ✓ Weather events

Goddard Earth Sciences DISC ✓ ✓

Land Processes DAAC ✓ ✓

L1 and Atm Archive & Dist Sys ✓ MODIS, VIIRS

Nat. Snow Ice Data Ctr DAAC ✓

Oak Ridge Nat Lab DAAC ✓ Field experiments

Ocean Biology DAAC ✓ ✓

Physical Oceanography DAAC ✓

Socioeconomic Data Arch Ctr ✓

Page 25: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

The  Common  Metadata  Repository  presents  a  consistent  catalog  for  discovery  of  data  from  mul8ple  DAACs  

25

One  Metadata  System  to  rule  them  all,  One  Metadata  System  to  find  them,  One  Metadata  System  to  bring  them  all    And  in  cyberspace  bind  them  

Unified Metadata Model Collec8ons  Granules  Variables  

Visualiza8ons  Services  Documenta8on  

DAAC DAAC DAAC DAAC DAAC DAAC DAAC DAAC DAAC DAAC DAAC

Common Metadata Repository

API OpenSearch

Earthdata Search Client

metadata

Other Clients Other Clients

Page 26: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

OPeNDAP  

●  Open-­‐source  Project  for  a  Network  Data  Access  Protocol  

●  High-­‐performance  network  access  protocol  for  complex  science  data  

● Well-­‐supported  in  Earth  science  community  tools  ○  Free:  Panoply,  IDV,  McIDAS-­‐V,  nco,…  ○  Commercial:  ArcGIS,  Matlab,  IDL,…  

26

Page 27: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

OPeNDAP*  access  to  data  smoothes  out  format  heterogeneity  and  supports  subseing  

27

Hierarchical Data Format

version 4 Files

Hierarchical Data Format

version 5 Files

network Common

Data Form Files

ASCII Files flat binary files

hdf4 handler

hdf5 handler

netcdf handler

freeform handler

*Although the Hyrax implementation is shown, other OPeNDAP servers such as GrADS Data Server and THREDDS Data Server have similar capabilities but different architectures.

Hyrax server

netCDF Library Analysis Tool

Page 28: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Data Transformation

Map Projection

Spectral Subsetting

Spatial Subsetting

Format Conversion

Resampling Parameters

Data transformation applies fundamental changes and conversions to attributes of the original data to suit the application requirements of end-users

Data  transforma8on  op8ons  of  several  kinds  can  help  with  Variety  and  Volume  

Courtesy of B. Ramachandran, MODAPS/LAADS

Page 29: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Big  Earth  Data  Ini8a8ve  (BEDI)  

●  OSTP-­‐driven  mul8-­‐agency  effort  ●  Focus  on  datasets  in  Societal  Benefit  Areas  ●  Several  interoperability  aspects…  

29

Page 30: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

BEDI  in  EOSDIS  

●  Improve  dataset  consistency  across  EOSDIS  ○  Metadata  in  Common  Metadata  Repository    ○  Data  in  OPeNDAP  

●  Improve  machine  access  to  EOSDIS  ○  Developers’  portal  

●  How  To  Access  Common  Metadata  Repository  ●  How  to  Access  OpENDAP-­‐served  Data  

○  OPeNDAP  performance  ○  OPeNDAP  use  with  Cloud  storage  

30

Page 31: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Community  Engagement  on  Big  Data  

●  Earth  Science  Informa8on  Partners  (ESIP)  ○  Variety:    Clusters  on  Discovery,  Informa8on  Quality  ○  Volume:  Clusters  on  Earth  Science  Data  Analy8cs  and  Cloud  

Compu8ng  ●  Earth  Science  Data  Systems  Working  Groups  

○  Formed  of  DAACs,  ACCESS  and  MEaSUREs  award  winners  ○  Variety:    Working  Groups  on  Dataset  Interoperability,  Search  

Relevancy  ○  Volume:    OPeNDAP  Best  Prac8ces,  Cloud  Compu8ng  

●  User  Needs  efforts  ○  DAAC  User  Working  Groups  ○  American  Customer  Sa8sfac8on  Index  survey  ○  EOSDIS  User  Needs  Analysis  group  

31

Page 32: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Big-­‐Data-­‐Community  Engagement  

●  Big  Data  Theme  for  both  ESIP  2016  Mee8ngs  

●  Co-­‐Convening  AGU  2016  session  on  Big  Data  Analy8cs  

●  Program  commimee  for  IEEE  Workshop  on  Big  Data  in  Earth  and  Planetary  Sciences    

●  ESA’s  Big  Data  from  Space  (BiDS)  workshops  ○  “Improving  Earth  Science  Data  Discoverability  And  

Use  Through  Metadata  Rela8onship  Graphs,  Virtual  Collec8ons,  And  Search  Relevancy”  

32

Page 33: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

User  Needs  from  Community  Sources  

33

Depth  &  Detail  of  Insight  Shallow   Deep  

Sample  Size  

Small  

Large  

Advisory  Groups  (ASAC,  UWG)  

Help    Tickets  

ACSI  Survey  Scores  

Webinar  Feedback  Applica8ons  

Workshops  

ACSI  Survey  Comments  

Page 34: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Take  Home  Message  

1.   Cloud  prototypes  are  underway  to  tackle  the  Volume  challenge  of  Big  Data…  2. ...But  advances  in  computer  hardware  or  cloud  won’t  help  (much)  with  Variety  3.   Standards,  conven8ons,  and  community  engagement  are  the  key  to  addressing  Variety  

34

Page 35: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

Backup  Slides  

35

Page 36: Lynnes-Big Data Task Force - June 2016-2 - Amazon …smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms...Unique Data Products 9,462 Distinct Users of EOSDIS Data and Services

OPeNDAP  Enhancements  from  the  Big  Earth  Data  Ini8a8ve  

● More  OPeNDAP  for  EOSDIS  data  ● More  aggrega8on  along  8me  for  data  in  OPeNDAP  ○ Improved  performance  for  aggrega8on  in  Hyrax  

36