managing massive amounts of spatio- temporal data … · managing massive amounts of...

38
MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of Technology

Upload: others

Post on 21-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING

Anita GraserCenter for Mobility Systems, AIT Austrian Institute of Technology

Page 2: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

ABOUT

Anita GraserScientist @ AIT Austrian Institute of Technology

− QGIS user since 2008

− MSc in Geomatics 2010

− QGIS Project Steering Committee since 2013

− OSGeo Director 2015-17

− Moderator on GIS.StackExchange.com

− Author of „Learning QGIS“ (1st ed 2013), „QGIS Map Design“

(2016) & „QGIS 2 Cookbook“ (2016)

@underdarkGIS

Page 3: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Austria‘s largest non-university research institute

− Energy

− Health & Bioresources

− Digital Safety & Security

− Vision, Automation & Control

− Mobility Systems

− Low-Emission Transport

− Technology Experience

− Innovation Systems & Policy

AIT

ANGESTELLTE

1,300

Page 4: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Application areas

− Road traffic → FCD, e.g. Waze, TomTom, Uber

− Air traffic → ADS-B, e.g. Flightradar

− Marine traffic → AIS, e.g. MarineTraffic

− Human movement → CDR, e.g. mobile network providers

→ Data-driven decision making

→ Technologically challenging

CONTEXT & MOTIVATION

411/07/2018

Page 5: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

SPATIAL DATA

511/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 6: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

SPATIAL RELATIONSHIPS

611/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 7: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

SPATIAL FUNCTIONS

711/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 8: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Big geo data

is anything

which is

crash ArcGIS

Small data is when

is fit in RAM.

Big is when is

crash because is

no fit in RAM

WHAT‘S „MASSIVE“ SPATIO-TEMPORAL DATA

Page 9: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

TRADITIONAL TOOLS

911/07/2018Scaling PostgreSQL and PostGIS http://s3.cleverelephant.ca/2017-cdb-postgis.pdf

Page 10: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

LOOKING FOR SCALABLE SOLUTIONS

10

ESRI GIS Tools

for Hadoophttps://github.com/E

sri/gis-tools-for-

hadoop

LocationSparkhttps://github.com/merlin

tang/SpatialSpark

STARK - Spatio-

Temporal Data

Analytics on Sparkhttps://github.com/dbis-

ilm/stark

SpatialSparkhttps://github.com/syoum

mer/SpatialSpark

GeoSparkhttps://github.com/DataS

ystemsLab/GeoSpark

PySpark & Geopandashttps://github.com/sabman/

PySparkGeoAnalysis

Page 11: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

OPENSOURCE & MATURE

1111/07/2018

https://projects.eclipse.org/wg/locationtech/projects

Page 12: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

WHAT IS GEOMESA?

1211/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 13: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

WHAT IS GEOMESA?

1311/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 14: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

WHAT IS GEOMESA?

1411/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 15: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

WHAT IS GEOMESA?

1511/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 16: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

WHAT IS GEOMESA?

1611/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 17: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Features

✓ Store gigabytes to petabytes of spatial data (tens of billions of points or more)

✓ Serve up tens of millions of points in seconds

✓ Ingest data faster than 10,000 records per second per node

✓ Scale horizontally easily (add more servers to add more capacity)

✓ Support Spark analytics

✓ Drive a map through GeoServer or other OGC Clients

GEOMESA

1711/07/2018

http://www.geomesa.org/documentation/user/introduction.html#what-is-geomesa

Page 18: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

… making 2/3D data sortable

→ Space-filling curves

SPATIO-TEMPORAL INDIZES

1811/07/2018

http://doi.ieeecomputersociety.org/10.1109/TVCG.2014.2298017

Page 19: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

GEOMESA Z-CURVE

1911/07/2018

http://www.geomesa.org/documentation/tutorials/geohash-substrings.html

Page 20: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

geomesa describe-schema -c geomesa.gdelt -f gdelt -u user -p password

INFO Describing attributes of feature 'gdelt'

globalEventId | String

eventCode | String

...

dtg | Date (Spatio-temporally indexed)

geom | Point (Spatially indexed)

User data:

geomesa.index.dtg | dtg

geomesa.indices | z3:4:3,z2:3:3,records:2:3

geomesa.table.sharing | false

GEOMESA COMMAND LINE

20

Page 21: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd

-q "globalEventId='671867776'"

Using GEOMESA_ACCUMULO_HOME = /opt/geomesa

id,globalEventId:String,...,dtg:Date,*geom:Point:srid=4326

d9e...,671867776,...,2007-07-13T00:00:00.000Z,POINT (-97 38)

GEOMESA COMMAND LINE

21

Page 22: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd

-q "CONTAINS(POLYGON ((0 0, 0 90, 90 90, 90 0, 0 0)),geom)" -m 3

Using GEOMESA_ACCUMULO_HOME = /opt/geomesa

id,globalEventId:String,...,dtg:Date,*geom:Point:srid=4326

139...,671713129,...,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)

9e8...,671928676,...,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)

d6c...,671817380,...,2017-07-09T00:00:00.000Z,POINT (5.43827 5.35886)

More complex queries & analyses → Spark(SQL)!

GEOMESA COMMAND LINE

22

Page 23: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

GEOMESA

2311/07/2018Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”

Page 24: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Option #1: DataFrame API

import org.locationtech.geomesa.spark.jts._

import spark.implicits. _

gdeltDf.where(st_contains(st_makeBBOX(0.0, 0.0, 90.0, 90.0), $"geom"))

Option #2: SparkSQL (mit UDFs)

SELECT * FROM gdelt

WHERE st_contains(st_makeBBOX(0.0, 0.0, 90.0, 90.0), geom)

GEOMESA

24

Page 25: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Save dataframe to GeoMesa table

val df = spark.sql(sqlQuery)

val dsParams = Map( "accumulo.instance.id" -> "...",

"accumulo.zookeepers" -> "...",

"accumulo.user" -> "...",

"accumulo.password" -> "...",

"accumulo.catalog" -> "tablename") )

df.write.format("geomesa").options(dsParams)

.option("geomesa.feature", "featurename").save()

GEOMESA

25

Page 26: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Example: Trajectory from points sorted by time

val someDF = Seq(

(1, Timestamp.valueOf("2018-01-01 12:00:00"), 2.5, geomFactory.createPoint(new Coordinate(0, 0))),

(1, Timestamp.valueOf("2018-01-01 12:05:00"), 3.5, geomFactory.createPoint(new Coordinate(1, 1))),

(2, Timestamp.valueOf("2018-01-01 12:00:00"), 5.5, geomFactory.createPoint(new Coordinate(0, 0))),

(2, Timestamp.valueOf("2018-01-01 12:05:00"), 5.5, geomFactory.createPoint(new Coordinate(1, 1)))

).toDF("id", "t", "sog", "pt")

+--+-------------------+---+-----------+

|id|t |sog|pt |

+--+-------------------+---+-----------+

|1 |2018-01-01 12:00:00|2.5|POINT (0 0)|

|1 |2018-01-01 12:05:00|3.5|POINT (1 1)|

|2 |2018-01-01 12:00:00|5.5|POINT (0 0)|

|2 |2018-01-01 12:05:00|5.5|POINT (1 1)|

+--+-------------------+---+-----------+

GEOMESA

26

Page 27: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Example: Trajectory from points sorted by time

someDF

.withColumn("collected", collect_list($"pt").over(Window.partitionBy("id").orderBy("t")))

.groupBy("id")

.agg(max($"collected").as("collected"))

.withColumn("line", st_makeLine($"collected"))

.show(false)

+--+------------------------------+-------------------------+

|id|collected |line |

+--+------------------------------+-------------------------+

|1 |[POINT (0 0), POINT (1 1)] |LINESTRING (0 0, 1 1) |

|2 |[POINT (10 10), POINT (11 11)]|LINESTRING (10 10, 11 11)|

+--+------------------------------+-------------------------+

GEOMESA

27

Page 28: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Example: Trajectory from points sorted by time

spark.sql("""WITH windowed AS (

SELECT id, collect_list(first(pt)) OVER (PARTITION BY id ORDER BY t) line

FROM temp

GROUP BY id, t)

SELECT id, max(line), st_makeline(max(line))

FROM windowed

GROUP BY id""").show(false)

+--+------------------------------+--------------------------+

|id|max(line) |UDF:st_makeLine(max(line))|

+--+------------------------------+--------------------------+

|1 |[POINT (0 0), POINT (1 1)] |LINESTRING (0 0, 1 1) |

|2 |[POINT (10 10), POINT (11 11)]|LINESTRING (10 10, 11 11) |

+--+------------------------------+--------------------------+

GEOMESA

28

Page 29: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

http://www.geomesa.org/documentation/user/spark/sparksql_functions.html

Geometry Constructors

• st_geometryFromText

• st_makeBBOX

• st_makeLine

• st_makePoint

• st_makePolygon

• …

Geometry Accessors

• st_geometryN

• st_isValid

• st_pointN

• st_x

• …

Geometry Outputs

• st_asGeoJSON

• st_asText

• …

Spatial Relationships

• st_area

• st_centroid

• st_closestPoint

• st_contains

• st_covers

• st_crosses

• st_disjoint

• st_distance

• st_distanceSphere

• st_distanceSpheroid

• st_equals

• st_intersects

• st_length

• st_lengthSphere

• st_lengthSpheroid

• st_overlaps

• st_relate

• st_touches

• st_within

Geometry Processing

• st_bufferPoint

• st_convexHull

• …

GEOMESA-SPARK-SQL MODULE

29

Page 30: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

BIG SPATIAL TECHNOLOGY STACK

30

Page 31: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

ACCESSING GEOMESA IN GEOSERVER

3111/07/2018

Page 32: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

GEOSERVER PREVIEW

3211/07/2018

Page 33: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

CONSUMING WFS IN QGIS

3311/07/2018

Page 34: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

EXAMPLE

TRAFFIC COUNTS

Page 35: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

EXAMPLE

TRAVEL TIME

Page 36: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

Based on similar trajectory search

EXAMPLE

TRAJECTORY PREDICTION

5 MIN 10 MIN 15 MIN

Graser, A., Schmidt, J., Widhalm, P. (2018) Predicting trajectories with probabilistic time geography and massive unconstrained movement data, GIScience Workshop on Analysis of Movement Data (AMD’18), 28. August 2018, Melbourne, Australia.

Page 37: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

EXAMPLE

INTERACTIVE ANIMATION

37

http://www.geomesa.org/

Page 38: MANAGING MASSIVE AMOUNTS OF SPATIO- TEMPORAL DATA … · MANAGING MASSIVE AMOUNTS OF SPATIO-TEMPORAL DATA USING Anita Graser Center for Mobility Systems, AIT Austrian Institute of

CONTACT

Anita Graser

[email protected]

@underdarkGIS

anitagraser.com