lessons learned with laser scanning point cloud management ... · laser scanning point cloud...

9
Lessons learned with laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New York University June 2018

Upload: others

Post on 24-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

Lessons learned with laser scanning point cloud management in Hadoop HBase Prof. Debra LaeferCenter for Urban Science + ProgressNew York UniversityJune 2018

Page 2: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

2

Laser scanning data

Page 3: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

3

Laser scanning data

2015 Dublin point cloud

• Spatial coverage : > 2 km2

• Number of points : > 1.4 billion points

• Size on disk : 30 GB in LAS format

• Precision : 3 cm

• Density : 300 points/m2

(horizontal)

Open-access: https://geo.nyu.edu/catalog/nyu_2451_38684

Page 4: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

4

SortedMap<RowKey, List<SortedMap<Column, List<Value, Timestamp>>>>1 2 3 4 5 6 7 8

(a) Low-level data storage structure in HBase

(Table, RowKey, Family, Column, Timestamp) Value

(b) A high-level view of HBase data structure

HBase – a distributed database

Apache HBase• Enable random access to data in the Hadoop Distributed

File System• Open-source implementation of Google’s Big Table• Is the database behind many Facebook services• HBase is: distributed, non-relational (aka NoSQL), key-

value based, column oriented

HBase’s underlying data structure

Page 5: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

5

Data models for point cloud management in HBase

Expectations:• Scalability (distributed)• Flexibility (schema-less)• Performance (due to parallelism)

4 data models:• 2 row-key arrangements: Dual

Hilbert code, and Single Hilbert code• 2 column structures: Grouped

Attributes and Separate Attributes4 data models

Page 6: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

6

Data ingestion

Data ingestion workflow

Page 7: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

7

Performance evaluation – Point queries

Point queries:• Model 3 is slowest; the

remaining models are comparable.

• More than 5 times faster than pgPointCloud

• All data models are scalable

• Difference between hot and cold queries is obvious

Hot point query response times

90M 365M 1420MData size:

Cold point query response times

Page 8: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

8

Performance evaluation – Range queries

Hot range query response times

90M 365M 1420MData size:

Cold range query response times

Range queries:• Model 4 outperforms all

other models• Model 3 is slowest• Difference with

pgPointCloud is less obvious

• All data models are scalable

Page 9: Lessons learned with laser scanning point cloud management ... · laser scanning point cloud management in Hadoop HBase Prof. Debra Laefer Center for Urban Science + Progress New

9

Concluding remarks

• 4 data models were investigated for storage, indexing, and

querying point clouds in a distributed, non-relational database.

• All HBase data models were scalable, including the flat, one-

point-per-row models, which previously hit the scalability wall in

relational implementation.

• Separation of point attributes to take advantage of the

schemaless feature of HBase introduced some overheads to

both data consumption and querying costs.

• Model 4, which resembles Oracle’s SDO_PC and

PostgreSQL’s PCPATCH, appears to be the most performant

data model. Model 4 does not fully utilize HBase’s

advantageous features.