high-resolution national elevation dataset: cybergis challenges and opportunities for scalable...
TRANSCRIPT
![Page 1: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/1.jpg)
High-Resolution National Elevation Dataset:CyberGIS Challenges and Opportunities for Scalable
Spatial Data Access and Analytics
Yan Liu1,3,5, Babak Behzad1,2, Anand Padmanabhan1,3,5, Eric Shook1,3, Shaowen Wang1,2,3,4,5, and Yanli Zhao1,3
1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)2 Department of Computer Science
3 Department of Geography and Geographic Information Science4 Department of Urban and Regional Planning
5 National Center for Supercomputing Applications (NCSA)University of Illinois at Urbana-Champaign
Michael P. Finn and E. Lynn Usery
U.S. Geological SurveyU.S. Department of the Interior
![Page 2: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/2.jpg)
Outline• Introduction• NED data access
– Interfaces and performance issues• Computational challenges
– Data-intensive spatial analysis• Experience and solutions
– CyberGIS– Scalable spatial data access and analytics
• Concluding discussions
![Page 3: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/3.jpg)
National Elevation Dataset (NED)
• Digital elevation models (DEM)• Product of the USGS National Map• Resolutions: 3-meter, 10-meter, 30-meter• Formats: ArcGrid, GridFloat, IMG• Organized as 1 degree x 1 degree tiles• Sizes (U.S. continent)
– 10-meter: 936 tiles; 440GB raw files; 1TB with pyramid tiles
• http://nationalmap.gov/elevation.html
![Page 4: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/4.jpg)
NED Access Challenges
• Data integration and processing– Data are stored on multiple file/database servers– Data processing is needed to extract subsets of data
from the data collection• Downloading becomes complex, involving processing
operations such as location, extraction, aggregation, archiving, and transfer among data servers
• Computationally intensive
• User interface– Usability is crucial to make big data usable– Programmable interface for automatic downloading
![Page 5: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/5.jpg)
CyberGIS Analytics Based on NED• CyberGIS: high-performance
and collaborative GIS based on cyberinfrastructure– http://cybergis.org
• Viewshed analysis– http://sandbox.cigi.illinois.edu
• Web Mapping Service for online visualization– NED WMS layer built using
GeoServer– Pre-generated pyramid tiles for
20-level zoomingCyberGIS Gateway
![Page 6: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/6.jpg)
The Great Flood Project• A 75-minute multimedia work of original music and film inspired by the
1927 Mississippi River floods– http://www.ncsa.illinois.edu/News/Stories/ELLNORAflood/
• Contributors include– Bill Frisell, Grammy Award-winning guitarist and composer – Bill Morrison, Obie-winning experimental filmmaker– Illinois Emerging Digital Research and Education in Arts Media Institute (eDream)– Advanced Visualization Laboratory (AVL) at the National Center for
Supercomputing Applications (NCSA)– CyberInfrastructure and Geospatial Information Laboratory (CIGI), University of
Illinois at Urbana-Champaign• Used NED
– Approximately 70GB 10-meter NED tiles covering the Mississippi river valley were used for creating the 3D landscape animation
![Page 7: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/7.jpg)
Open YouTube URL http://www.youtube.com/watch?v=Lgy7mDJ_fVI
Relevant parts:0:00 – 0:24, historical maps;0:25 – 1:16, 3D digital map animation based on 1/3 arc sec NED
![Page 8: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/8.jpg)
NED Data Access
![Page 9: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/9.jpg)
NED Download: User Interface
• Download tool web interface– http://
cumulus.cr.usgs.gov/webappcontent/neddownloadtool/NEDDownloadToolDMS.html
• New interface– National Map Viewer:
http://viewer.nationalmap.gov/viewer/
![Page 10: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/10.jpg)
NED Downloading Process
File list
Click each URL
1. Queue a request 2. Launch data extractor
3. Extract data 4. Archive data files
5. Notify data readiness 6. User download
Please repeat 936 times to get all 1 degree x 1 degree tiles for U.S. continent!
![Page 11: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/11.jpg)
NED Downloading Web Service Interface
Start download
Check status
Download
Cleanup
![Page 12: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/12.jpg)
NED Downloader• Goal
– Provide an easy-to-use NED downloading utility by supporting batch downloads and managing downloading status transition automatically
• Software– Linux-based– Bash + PHP– Open source (MIT license)– Hosted on CyberGIS SVN
• http://svn.cybergis.org/pub/ned-downloader/
• Status– Used by the National Science Foundation CyberGIS project team for
NED data integration and the Great Flood project
![Page 13: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/13.jpg)
Computational Challenges in Related CyberGIS Analytics
![Page 14: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/14.jpg)
Why CyberGIS?
• Most of commonly used GIS software is based on sequential computing– Not scalable for big data analytics
• Many runtime Input/output (I/O) steps in an analysis workflow
• Transfer of big data to / from cyberinfrastructure resources
![Page 15: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/15.jpg)
Viewshed Analysis• Input DEM
– HTTP downloading– Data processing using GDAL commands
• High-performance viewshed computation– Exploiting Graphic Processing Units (GPU)
• Output transfer– GridFTP – a parallel file transfer protocol
• Computational bottlenecks– The test viewshed analysis (see figure) handled 3.9GB
raster data in total• 1.8GB input NED; 436MB output; 1.67GB runtime output
– Execution time: 4 minutes 55 seconds• Input data transfer – 21 seconds; input data processing -
114 seconds; • Computing - 65 seconds; • output data processing - 88 seconds; output transfer – 7
seconds
– Input/output data processing took 68.4% of analysis time
![Page 16: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/16.jpg)
Resolving Computational Bottlenecks
Input Data
Storage
TransferInput Files
Input Output Output Files
Output Data
StorageTransfer
Transfer Input Output
Input Output
CPUCPU
GPU
…
Input Processing
CPUCPU
GPU
…
CPUCPU
GPU
…
Analysis Output Processing
Transfer
Transfer
Transfer
• Reduce the number of runtime I/O steps• Employ high-performance I/O techniques
![Page 17: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/17.jpg)
Experience and Solutions
![Page 18: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/18.jpg)
CyberGIS Approach• Tightly couple geospatial data processing
libraries to eliminate unnecessary I/O operations
• Exploit parallel I/O for geospatial data processing
• Integrate high-performance data transfer capability in CyberGIS analytics
![Page 19: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/19.jpg)
Integrated CyberGIS Architecture
GDAL
OpenMPNetCDFGRASS
HDF5
Parallel File Systems Processors Network
MPI
CUDA
CyberGIS computational resources
Dependent Libraries
CyberGIS Software Environment
Applications Scalable Analytical Libraries
Scalable Data Libraries
Spatial Middleware
Geospatial Parallel Computing
Memory
![Page 20: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/20.jpg)
Highlights• Analytical libraries
– pRasterBlaster (a high-performance map reprojection library under joint development by CEGIS and CIGI)
• Data libraries– Parallel Geospatial I/O library (pGIO) with NetCDF/HDF5 support is to be
released soon– GDAL+MPI IO for parallel I/O of GeoTIFF format is under development
• Spatial middleware– GridFTP transfer between CyberGIS data source sites and XSEDE sites
• CEGIS <-> supercomputer centers (NCSA, SDSC, TACC)
• CyberGIS computational resources– CEGIS high-performance computers– CIGI cloud infrastructure– Key national cyberinfrastructure environments
• NSF XSEDE (http://xsede.org)• Open Science Grid (http://opensciencegrid.org)
![Page 21: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/21.jpg)
Parallel I/O Strategies
Storage Device
. . .
P0
P1
P2
Pn. . .
…
P0 P1 P2 … Pn
Storage Device
. . .
P0 P1
P2
PnStorage Device
Row-wise I/O Column-wise I/O Block-wise I/O
![Page 22: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/22.jpg)
High-Performance Data Transfer
Background image source: https://www.xsede.org/documents/10157/169907/xsedenet.pdf
CEGIS
![Page 23: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/23.jpg)
Data Transfer Service between USGS and XSEDE
• Technology– GridFTP, a secure and high-
performance data transfer protocol
• Data transfer service setup– USGS GridFTP server: usgs-
ybother.srv.mst.edu– Globus Toolkit 5– Data transfer capability
• Parallel data channels for large dataset transfer
• Data transfer is initiated in the CyberGIS Gateway as a third- party transfer
• Transfer rate: up to 100MB/second
XSEDE
![Page 24: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/24.jpg)
Concluding Discussions• Usability of NED can be significantly improved if
the data access interface can be made more friendly
• Big data require cyberinfrastructure and significant computational power for scalable data access and analytics
• CyberGIS has emerged as a new-generation GIS for resolving these challenges and represent significant opportunities for the National Map communities
![Page 25: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/25.jpg)
References• Canters, F. (2002). Small-Scale Map Projection Design. London: Taylor & Francis. • Finn, Michael P., and David M. Mattli (2012). User’s Guide for the mapIMG 3: Map
Image Reprojection Software Package. U. S. Geological Survey Open-File Report 2011-1306, 12 p..
• Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67.
• Wang, S., Anselin, L., Bhaduri, B., Crosby, C., Goodchild, M. F., Liu, Y., and Nyerges, T. L (2013). CyberGIS Software: A Synthetic Review and Integration Roadmap. International Journal of Geographical Information Science, DOI:10.1080/13658816.2013.776049
• Wang, S., and Liu, Y. (2009) TeraGrid GIScience Gateway: Bridging Cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23 (5): 631–656.
• Zhao, Y., Padmanabhan, A., and Wang, S. (2013) A Parallel Computing Approach to Viewshed Analysis of Large Terrain Data Using Graphics Processing Units. International Journal of Geographical Information Science, 27 (2): 363-384.
![Page 26: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/26.jpg)
DISCLAIMER & ACKNOWLEDGEMENT
• DISCLAIMER: Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government
ACKNOWLEDGEMENT: This work is supported in part by the National Science Foundation (NSF) under Grant Numbers: BCS-0846655 and OCI-1047916. Computational experiments used the NSF Extreme Science and Engineering Discovery Environment (XSEDE) (Award Number SES090019), which is supported by NSF under Grant Number OCI-1053575
![Page 27: High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu 1,3,5, Babak Behzad](https://reader036.vdocument.in/reader036/viewer/2022081513/55178cb055034645368b5501/html5/thumbnails/27.jpg)
U.S. Department of the InteriorU.S. Geological Survey
Comments / Questions?
Contact: [email protected] or [email protected]
University of Illinois at Urbana-ChampaignCyberInfrastructure and Geospatial Information LaboratoryDepartment of Computer ScienceDepartment of Geography and Geographic Information ScienceDepartment of Urban and Regional PlanningNational Center for Supercomputing Applications
High-Resolution National Elevation Dataset:CyberGIS Challenges and Opportunities for Scalable
Spatial Data Access and Analytics