big data and analytics with arcgis - esri€¦ · hql drop table if exists logs; create external...

76
Big Data and Analytics with ArcGIS Canserina Kurnia Technical Manager Esri Global Asia Pacific

Upload: others

Post on 14-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Big Data and Analytics with ArcGISCanserina Kurnia

Technical Manager – Esri Global Asia Pacific

Page 2: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Agenda

• What is Big Data?

• What is Hadoop?

• How does Spatial integrate with Big Data and

Hadoop?

• How do I get started?

Page 3: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Story Time…

Page 4: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

U.S.

Demographic

Data

Page 5: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 6: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

FOR EACH LOCATION

FOR EACH DEMOGRAPHIC

⬇50 MILE HEATMAP

Page 7: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 8: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Traditional Means…

14 Days

850 GB Raster Files

Page 9: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Better Way ?

Page 10: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 11: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 12: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 13: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 14: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 15: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

What is BigData ?

Page 16: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 17: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 18: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 19: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

7 B I L L I O N

Page 20: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

50% LIVE IN CITIES !

Page 21: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

~70% By 2050 ! ! !

http://www.who.int/gho/urban_health/situation_trends/urban_population_growth_text/en/

Page 22: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 23: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 24: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Academics

Volume

Velocity Variety

Page 25: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Volume

Velocity

Variety

Veracity

Validity

Visualization

Vulnerability

Value

But then I’ve seen…

→ data at rest

→ data in motion

→ many types

→ data in doubt

→ data that is correct

→ data in patterns

→ data at risk

→ data that is meaningful

Page 26: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

“When the traditional

means are failing you”-Anonymous

Page 27: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

What are the new means?

Page 28: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

http://hadoop.apache.org

Page 29: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

What’s in a name ?

http://blog.pivotal.io/pivotal/products/demystifying-hadoop-in-5-pictures

Page 30: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

What Is Hadoop ?• Library / Framework

• Very Very Large Un/Structured

Dataset

• Multi Node Distributed Processing

• Resilient To Commodity Hardware

Failure

Page 31: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Hadoop Basic Stack

Hadoop Distributed File System (HDFS)

Yet Another Resource Negotiator (YARN)

Commodity Servers

MapReduce Hive HBase

Page 32: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Other Hadoop Projects• Avro - Serialization / RPC System

• HBase - Distributed Columnar Database

• Hive - Ad Hoc “SQL” Interface

• Pig - Data Flow Parallel Execution (AML)

• ZooKeeper - Coordination Service

• More….

Page 33: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

HDFS• Distributed File System

• Lots and Lots of Commodity Drives

• Fault Tolerant

• Loves Big Files

• “POSIX” Like Interface

Page 34: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

HDFS

NameNode

DataNode DataNode DataNode

HDFS Client

Page 35: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

HDFS Resilience !

HDFS

DataNode DataNode DataNode

Page 36: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

BigData

Program

Page 37: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

BigData

Program

Page 38: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

MapReduce

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

Page 39: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

What Is MapReduce ?• Parallel Fault Tolerant Framework

• Splits Large Input

• Invoke User Defined “Map” Function

• Shuffle and Sort

• Invoke User Defined “Reduce” Function

Page 40: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Data

Node

Task

Tracker

Data

Node

Task

Tracker

Data

Node

Task

Tracker

Name

Node

Job

TrackerClient

MapReduce & HDFS.jar

Page 41: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Thinking In MR

K1,V1

Map list(K2,V2)

Shuffle/Sort

K2,list(V2)

Reduce list(K3,V3)

(filter & transform) (group & aggregate)

Page 42: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 43: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Geo MapReduce

Page 44: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

DensityMapID1,X1,Y1

ID2,X2,Y2

ID3,X3,Y3

ID4,X4,Y4

Page 45: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

DensityMapfunction map(lineno,text)

{

tokens = text.split(‘,’)

cell = toCell(tokens[1],tokens[2])

emit( cell, 1)

}

function toCell(x,y)

{

// some math !!

return cell

}

function reduce(cell,iterator)

{

sum = 0

for( one : iterator)

sum += one

emit( cell, sum)

}

http://thunderheadxpler.blogspot.com/2013/03/bigdata-kernel-density-analysis-on.html

Page 46: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Writing MapReduce Is

Hard…

Page 47: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

http://www.cascading.org

Page 48: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Think of Data

as

Water In Pipes

Page 49: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Cascading pipeline

⬇MapReduce Job

Page 50: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

To CellGroupBy

count

X,Y

Collection

Cell

Count

Workflow Pipeline

RM

SourceSink

Filter

Page 51: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Cascading Pipe

// Pipe tap x,y input fields into spatial function

Pipe pipe = new Each("start", new Fields("X", "Y"), new SpatialDensity());

// Group by emitted ‘cell’ value

pipe = new GroupBy(pipe, “cell”);

// Count by group and name count ‘POPULATION’

pipe = new Every(pipe, Fields.GROUP, new Count(new Fields("POPULATION")));

http://thunderheadxpler.blogspot.com/2014/01/cascading-workflow-for-spatial-binning.html

Page 52: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

How About….

No Programing ???

Page 53: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 54: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Apache HIVE

“SQL”

⬇MapReduce Job

Page 55: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

HQLdrop table if exists logs;

create external table if not exists logs(

ip string,

method string,

uri string,

status string,

bytes int,

time_taken int,

referrer string,

user_agent string

) partitioned by (year int, month int, day int, hour int)

row format delimited

fields terminated by '\t'

lines terminated by '\n'

stored as textfile

location ‘hdfs://hadoop:8020/logs/';

Page 56: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Other AdHoc Engines• Cloudera Impala

• Facebook Presto

• SparkSQL

• Bypass MR generation / Direct HDFS Access

Page 57: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

What About Spatial ?

Page 58: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 59: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

GIS Tools For Hadoop• Computational Geometry Library

• Hive Spatial UDF Functions

• GeoProcessing Extensions to ArcMap

Page 60: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Geometry Library• Points / Lines / Polygons

• I/O (GeoJSON,WTK,WBT,Shape)

• Spatial Relations (inside, touches, intersects,…)

• Spatial Operations (buffer, cut, convex hull,…)

• In-Memory Spatial Index

Page 61: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

API Usage in BigData• Map-only jobs - GeoEnrichment

- Given set of locations

- Given demographic area

- Augment location with demographic attributes

Page 62: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

BigData Binning

Page 63: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

BigData Binning

Page 64: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

BigData Binning

Page 65: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Hive Spatial UDF• Uses Geometry API

• Constructor

- ST_POINT / ST_GeomFromGeoJSON

• Relations

- ST_Contains / ST_Buffer

• Accessor

- ST_Distance, ST_Area

Page 66: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Hive Spatial UDF

SELECT counties.name, count(*) total FROM countiesJOIN earthquakesWHERE ST_Contains(counties.boundaryshape, ST_Point(earthquakes.longitude, earthquakes.latitude))GROUP BY counties.nameORDER BY total desc;

Page 67: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

GP Extensions

ArcMap

HDFS

Hive/MapReduce

Workflow

Page 68: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 69: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

PROCESSING EVOLUTION

• Transaction - Batch

• Operational - Dashboard

• Analytics - Exploration

• Intelligent - Realtime / Predictive

Fixed

Schema

Variable

Schema

Page 70: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 71: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 72: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 73: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes
Page 74: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Big Data Partners

And More….

Page 75: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes

Blog Post: http://thunderheadxpler.blogspot.com

Thank you

Page 76: Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes