lessons learned with laser scanning point cloud management ... · laser scanning point cloud...

Lessons learned with laser scanning point cloud management in Hadoop HBase Prof. Debra LaeferCenter for Urban Science + ProgressNew York UniversityJune 2018

2

Laser scanning data

3

Laser scanning data

2015 Dublin point cloud

• Spatial coverage : > 2 km2

• Number of points : > 1.4 billion points

• Size on disk : 30 GB in LAS format

• Precision : 3 cm

• Density : 300 points/m2

(horizontal)

Open-access: https://geo.nyu.edu/catalog/nyu_2451_38684

https://geo.nyu.edu/catalog/nyu_2451_38684

4

SortedMap<RowKey, List<SortedMap<Column, List<Value, Timestamp>>>>1 2 3 4 5 6 7 8

(a) Low-level data storage structure in HBase

(Table, RowKey, Family, Column, Timestamp) Value

(b) A high-level view of HBase data structure

HBase – a distributed database

Apache HBase• Enable random access to data in the Hadoop Distributed

File System• Open-source implementation of Google’s Big Table• Is the database behind many Facebook services• HBase is: distributed, non-relational (aka NoSQL), key-

value based, column oriented

HBase’s underlying data structure

5

Data models for point cloud management in HBase

Expectations:• Scalability (distributed)• Flexibility (schema-less)• Performance (due to parallelism)

4 data models:• 2 row-key arrangements: Dual

Hilbert code, and Single Hilbert code• 2 column structures: Grouped

Attributes and Separate Attributes4 data models

6

Data ingestion

Data ingestion workflow

7

Performance evaluation – Point queries

Point queries:• Model 3 is slowest; the

remaining models are comparable.

• More than 5 times faster than pgPointCloud

• All data models are scalable

• Difference between hot and cold queries is obvious

Hot point query response times

90M 365M 1420MData size:

Cold point query response times

8

Performance evaluation – Range queries

Hot range query response times

90M 365M 1420MData size:

Cold range query response times

Range queries:• Model 4 outperforms all

other models• Model 3 is slowest• Difference with

pgPointCloud is less obvious

• All data models are scalable

9

Concluding remarks

• 4 data models were investigated for storage, indexing, and

querying point clouds in a distributed, non-relational database.

• All HBase data models were scalable, including the flat, one-

point-per-row models, which previously hit the scalability wall in

relational implementation.

• Separation of point attributes to take advantage of the

schemaless feature of HBase introduced some overheads to

both data consumption and querying costs.

• Model 4, which resembles Oracle’s SDO_PC and

PostgreSQL’s PCPATCH, appears to be the most performant

data model. Model 4 does not fully utilize HBase’s

advantageous features.

lessons learned with laser scanning point cloud management ... · laser scanning point cloud...

Documents