foss4g 2015 – seoul, south korea – september 14th-19th, 2015 development of data archiving and...

40
FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015 DEVELOPMENT OF DATA ARCHIVING AND DISTRIBUTION SYSTEM FOR THE PHILIPPINES' LIDAR PROGRAM USING OBJECT STORAGE SYSTEMS Ken Abryl Eleazar Salanio Data Archiving and Distribution Component PHL-LiDAR 1 [email protected]

Upload: amberlynn-maryann-maxwell

Post on 03-Jan-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

DEVELOPMENT OF DATA ARCHIVING AND DISTRIBUTION SYSTEM FOR THE PHILIPPINES' LIDAR PROGRAM USING OBJECT STORAGE SYSTEMS

Ken Abryl Eleazar SalanioData Archiving and Distribution Component

PHL-LiDAR [email protected]

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Outline

• Introduction

• Related Work

• Working Design

• Ceph Object Storage System

• Archiving Process Flow

• Summary

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Introduction

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Introduction – The Philippine Hazard Setting

Image sources: https://en.wikipedia.org/wiki/Timeline_of_the_2014_Pacific_typhoon_season

https://en.wikipedia.org/wiki/Ring_of_Fire

2014 Typhoon Tracks

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Introduction – The Philippine Hazard Setting

• The Philippines is settled along the Pacific Typhoon Belt and the Ring of Fire

• It is prone to earthquakes, typhoons, and other hazards

• It is abundant in natural resources

• There is a need for mapping to assess disaster risk and accounting of natural resources

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Introduction – The Philippine Hazard Setting

• The Philippines’ Department of Science and Technology (DOST) with Higher Education Institutions (HEIs) organized programs for mapping: PHL-LiDAR 1 and PHL-LiDAR 2

• These programs are an extension of the Disaster Risk and Exposure Assessment for Mitigation (DREAM) LiDAR program

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Introduction – The Philippine Hazard Setting

PHL-LiDAR 1

Data Acquisition

Data Validation

Data Processing

Training & IEC

Data Archiving

Flood Modeling

PHL-LiDAR 2

Agriculture

Forest

Coastal

Energy

Hydrology

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Introduction – The Philippine Hazard Setting

• LiDAR mapping produces high-resolution geospatial data

• High resolution data acquisition leads to humongous data sizes

• Storage, indexing, retrieval and distribution proves a challenge

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Related Work

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Related Work - File Based Storage

• File-based storage systems are commonly used to store data

• Little setup needed

• Pervasive technology

• Complexity of directory structure increases with amount of data and processes

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Related Work - GIS-enabled RDBMS

• GIS-enabled Relational Database Management Systems or Geodatabases

• Two types of design:

o Spatial indexing is on a separate layer

o Specialized spatial columns

• Indexing overhead, especially on updates

• Scalability and query time issues

• Limited support for point cloud data

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Related Work - Combined Approach

• Combines LiDAR flat tiles, RDBMS, and distributed infrastructure

o RDBMS manages metadata

o LiDAR tiles are stored in dedicated and distributed storage

o Data processing is carried out by high-performance compute servers

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Related Work - Combined Approach• e.g. OpenTopography’s architecture• Comprised of various software and

hardware resources• Actual data sets are stored as ASPRS

LAS format on a dedicated storage server

• Metadata is stored on an IBM DB2 database

• Other datasets are stored on the SDSC Cloud platform

• Processing and visualization requests are handled by a dedicated large memory, multiprocessor system

Image source: http://www.opentopography.org/index.php/about/systemarch2

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Related Work - Combined Approach

• While highly appealing, raises concerns:o Cost and difficulty of infrastructure upgrade

o Internet connection speed and reliability

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Working Design

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Working Design - LiDAR Portal for Archiving and Distribution (LiPAD)

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Working Design -LiPAD

• LiPAD, the customized GeoNode platform, handles the web user interface

• GeoServer stores small shapefiles, raster, and vector data

• Large files are tiled, named and indexed by the coordinates of each tile and stored in Ceph

• Metadata of tiled files is indexed in LiPAD, represented by a tiled shapefile of the Philippines

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Working Design

• Added features:

o Authentication using Active Directory

o Metadata Indexing for Ceph Objects, represented as a grid

o Tiled selection of data

o Data cart

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

LiPAD Web Interface - Tile Selection

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

LiPAD Web Interface - Data Cart

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Ceph Object Storage System

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Ceph Object Storage - What is Object Storage?

• Data is managed as objects, storage containers with a file-like interface

• Objects are retrieved by their unique ID

• Offers storage size scalability

• Replicated backups

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Ceph Object Storage – Ceph Features

• Open source

• Compatible with OpenStack and Amazon AWS

• Support for broad spectrum of programming languages

• Runs on commodity hardware

• Designed to be self-healing and self-managing

• Representational State Transfer (REST) API

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Ceph Object Storage - Ceph Features

• Block storage or virtualized hard disks

• Object storage accessible via

o HTTP REST

o OpenStack Swift or Amazon S3

API

o C++, Java, Python, Ruby, PHP

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Ceph Object Storage - Architecture

• Object gateway services handles

requests applications

• Monitor nodes ensure high-availability

• Objects are stored inside Object

Storage Devices (OSDs)

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Archiving Process Flow

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Archiving Process Flow

• Post-processed data is tiled into 1x1 km tiles

• Each tile is named after the northing and easting values (EPSG:32651)

• Tiles are uploaded to Ceph and metadata is extracted

• Metadata is input into LiPAD

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Summary

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Summary

• There is a need for LiDAR mapping in the Philippines for hazard assessment and natural resource accounting

• Archiving, indexing and distributing these sizable data sets prove to be a challenge

• We use a combined approach, utilizing GeoNode with GeoServer, and Ceph Object Storage

• The setup also paves the road for migrating to a distributed computing platform

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

Ackermann, F., 1999. Airborne laser scanning—present status and future expectations. ISPRS Journal of Photogrammetry and Remote Sensing 54 (2-3), 64-67.

Al-Naami, K.M., and S. Seker, L. Khan, 2014. GISQF: An Efficient Spatial Query Processing System. 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 681-688.

Amazon Web Services, 2015. AWS | Amazon Simple Storage Service (S3) - Online Cloud Storage for Data & Files. Retrieved July 2015, from https://aws.amazon.com/s3/

Boundless, 2015. GeoExplorer — GeoExplorer. Retrieved July 2015, from http://suite.opengeo.org/opengeo-docs/geoexplorer/

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

Chen, Q., 2007. Airborne lidar data processing and information extraction. Photogrammetric Engineering & Remote Sensing, 73(2), 91-95.

Crosby, C.J., Arrowsmith, J R., Nandigam, and V., Baru, C., 2011. A Geoinformatics Approach To Online Access And Processing Of LIDAR Topography Data. In R. Keller and C. Baru, Eds., Geoinformatics: Cyberinfrastructure for the Solid Earth Sciences, pp. 251-265. London: Cambridge University Press.

David, N., Mallet, C., and Bretar, F., 2008. Library concept and design for lidar data processing. GEOgraphic Object Based Image Analysis (GEOBIA) Conference, Calgary, Canada.

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

Fox, A., Eichelberger, C., Hughes, J., and Lyon, S., 2013. Spatio-temporal indexing in nonrelational distributed databases. 2013 IEEE International Conference on Big Data, pp. 291–299.

GeoNode Development Team, 2013. About GeoNode — GeoNode 2.0 documentation. Retrieved July 2015, from http://docs.geonode.org/en/master/organizational/about.html#about

Inktank Storage, Inc., 2015. Welcome to Ceph — Ceph Documentation. Retrieved July 2015, from http://ceph.com/docs/master/

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

Isenburg, M., 2012. LASzip:Lossless compression of LiDAR Data. European LiDAR Mapping Forum.

Jones, T., 2010. Ceph: A Linux petabyte-scale distributed file system. Retrieved July 2015, from http://www.ibm.com/developerworks/library/l-ceph/

Levine, R., 1998. NAS Advantages: A VARs View. Retrieved July 2015, from http://www.infostor.com/index/articles/display/55961/articles/infostor/volume-2/issue-4/news-analysistrends/nas-advantages-a-vars-view.html

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

Lewis P., McElhinney C., and McCarthy T., 2012. LiDAR data management pipeline; from spatial database population to web-application visualization. Conference Proceedings at Com.Geo 2012, Washington DC, USA

Mesnier, M., Ganger, G. R., and Riedel, E., August 2003. Object-Based Storage. IEEE Communications Magazine, pp. 84–90.

Open Source Geospatial Foundation, 2014. About - GeoServer. Retrieved July 2015, from http://geoserver.org/about/

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

OpenTopography, 2015. NSF OpenTopography Facility | About. Retrieved July 2015, from http://www.opentopography.org/index.php/about/

Ramsey, P., 2013. LIDAR in PostgresSQL with PointCloud. Available online: http://boundlessgeo.com/wp-content/uploads/2013/10/pgpointcloud-foss4-2013.pdf

San Diego Supercomputer Center, 2015. SDSC Cloud. Retrieved July 2015, from https://cloud.sdsc.edu/hp/index.php

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

References

SwiftStack, Inc., 2015. OpenStack Swift | Enterprise Storage from SwiftStack. Retrieved July 2015, from https://swiftstack.com/openstack-swift/

The Apache Software Foundation, 2014. Welcome to Apache™ Hadoop®!. Retrieved July 2015, from https://hadoop.apache.org/

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Acknowledgements

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Acknowledgements

The authors would like to acknowledge the support of the Department of Science and Technology – Philippine Council for Industry, Energy and Emerging Technology Research and Development (DOST-PCIEERD) and the Phil-LiDAR 1 research and training staff.

FOSS4G 2015 – Seoul, South Korea – September 14th-19th, 2015

Thank you very much!