geospatial stream query processingpqy g using microsoft ...geospatial stream query processingpqy g...

1
Geospatial Stream Query Processing Geospatial Stream Query Processing i Mi f SQL S S I ih using Microsoft SQL Server StreamInsight using Microsoft SQL Server StreamInsight 1 1 2 1 1 Seyed Jalal Kazemitabar 1 Ugur Demiryurek 1 Mohamed Ali 2 Afsin Akdogan 1 Cyrus Shahabi 1 1 It td M di S t C t 2 Mi ft SQL S 1 Integrated Media Systems Center 2 Microsoft SQL Server University of Southern California Microsoft Corporation ICampus IWatch CT ICampus IWatch CT Streaming Engine Introduction Streaming Engine GeoInsight StreamInsight Architecture A real-world data-driven framework which enables: A real world data driven framework which enables: Fast query processing over stream data using Microsoft Fast query processing over stream data using Microsoft StreamInsight TM StreamInsight Running spatial queries over geospatial data Running spatial queries over geospatial data O li l i d di ti b d hi t i dt i i Online analysis and prediction based on historic data using our in- k t hi t hi memory sketching technique Stream flow in demo Q er Average er Q 3 dapte Value Filter Spatial Filter PCA PCA Predict Refine Q 1 Q 2 Q 5 Average Adapte Q 4 Q 6 Q 7 put Ad Value Filter Spatial Filter PCA PCA, Predict Refine Average tput A Inp Out Application Approach O li A l ti l R fi t dP di ti (OARP) Ui I Sk t h Online Analytical Refinement and Prediction (OARP) Using In-memory Sketches Hybrid queries over spatio-temporal windows provide great analysis Instead of storing the whole data in DB, store the sketches in memory functionality including: Principal component Analysis (PCA): a mathematical approach for analyzing Refinement functions Principal component Analysis (PCA): a mathematical approach for analyzing correlated data Refinement functions correlated data Smoothing noisy input data according to previously observed patterns A b f t ith ti fl Dt ti f li h t i db di th t hi hl A number of components with great influence Detection of anomalies characterized by sensor readings that are highly d itdf hi t i l l selected as coordinates deviated from historical mean values Improving PCA performance for aggregate queries by Prediction functions Improving PCA performance for aggregate queries by calculating the query result in transformed space P di ti ft t d b d i l b d tt calculating the query result in transformed space Predicting near future trends based on previously observed patterns Responding to anomalies and deliberately attempting to change future conditions Contribution/Experiments Contribution/Experiments PCA for Traffic Data PCA for Traffic Data Hi hd t i t High data compression rate 98% for highway data Extra short response time Challenges 2 milliseconds (compare to 58 sec.) Challenges 2 milliseconds (compare to 58 sec.) Highly accurate for Traffic Data Large Datasets and Spatial Queries Highly accurate for Traffic Data MSE for same query: 10 -4 Mph Large response time caused by disk I/O limits the availability of hybrid MSE for same query: 10 -4 Mph Large response time caused by disk I/O limits the availability of hybrid queries in real-time streaming applications Real Data Transformed Data queries in real time streaming applications What was the average speed in I-10 in LA county during summer 2009 from 4:00-5:00 pm?” 98% ta 98% eed e in dat Spe ariance % of Va Response Time for the indexed % Components Database Response Time for the indexed table containing data of one Time Time Components year (150 GB) : 58 Seconds! Conclusion and Future Work Limited support for geostreaming (continuous spatial queries) in current D li ti f f tf t hi h ti l i database technologies Demo application as a proof of concept for a system which runs spatial queries over real time data real-time data Implementing the fundamentals of Clever Transportation (CT) project as a platform for monitoring, querying, and analyzing real-time Los Angeles traffic data Devising a scalable spatial alarm continuous query suitable for location-based Devising a scalable spatial alarm continuous query suitable for location based services

Upload: others

Post on 09-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geospatial Stream Query ProcessingpQy g using Microsoft ...Geospatial Stream Query ProcessingpQy g using Microsoft SQL Serverusing Microsoft SQL Server i Mi f SQL S SIihStreamInsight

Geospatial Stream Query ProcessingGeospatial Stream Query Processingp Q y gi Mi f SQL S S I i husing Microsoft SQL Server StreamInsightusing Microsoft SQL Server StreamInsight

1 1 2 1 1Seyed Jalal Kazemitabar

1Ugur Demiryurek

1Mohamed Ali

2 Afsin Akdogan

1 Cyrus Shahabi

1y g y g y

1I t t d M di S t C t 2Mi ft SQL S1Integrated Media Systems Center 2Microsoft SQL Server University of Southern California Microsoft Corporation ICampus IWatch CTy p ICampus IWatch CT

Streaming EngineIntroduction Streaming Engine

GeoInsight• StreamInsight Architecture

g• A real-world data-driven framework which enables:A real world data driven framework which enables:

– Fast query processing over stream data using Microsoft– Fast query processing over stream data using Microsoft StreamInsightTMStreamInsight

Running spatial queries over geospatial data– Running spatial queries over geospatial data

O li l i d di ti b d hi t i d t i i– Online analysis and prediction based on historic data using our in-k t hi t h imemory sketching technique

• Stream flow in demo

Q

er

Average

er

Q3

dapt

e

Value Filter Spatial Filter PCA PCA PredictRefineQ1 Q2 Q5

Average Ada

pte

Q4 Q6 Q7

put A

d Value Filter Spatial Filter PCA PCA, PredictRefine Average

tput

A

Inp

Out

Application Approachpp

O li A l ti l R fi t d P di ti (OARP)

pp

U i I Sk t hOnline Analytical Refinement and Prediction (OARP) Using In-memory SketchesHybrid queries over spatio-temporal windows provide great analysis • Instead of storing the whole data in DB, store the sketches in memory y q p p p g yfunctionality including:

g , yy g

• Principal component Analysis (PCA): a mathematical approach for analyzing• Refinement functions

• Principal component Analysis (PCA): a mathematical approach for analyzing correlated data• Refinement functions correlated data

– Smoothing noisy input data according to previously observed patternsA b f t ith t i fl

g y p g p y p

D t ti f li h t i d b di th t hi hl• A number of components with great influence

– Detection of anomalies characterized by sensor readings that are highly d i t d f hi t i l l

selected as coordinatesdeviated from historical mean values

• Improving PCA performance for aggregate queries by• Prediction functions

Improving PCA performance for aggregate queries by

calculating the query result in transformed space

P di ti f t t d b d i l b d tt

calculating the query result in transformed space

– Predicting near future trends based on previously observed patterns

– Responding to anomalies and deliberately attempting to change future conditions

Contribution/ExperimentsContribution/Experiments

PCA for Traffic DataPCA for Traffic Data

Hi h d t i t• High data compression rate

– 98% for highway data

• Extra short response time

Challengesp

– 2 milliseconds (compare to 58 sec.)Challenges 2 milliseconds (compare to 58 sec.)

• Highly accurate for Traffic DataLarge Datasets and Spatial Queries

• Highly accurate for Traffic Data

MSE for same query: 10-4 Mphg p Q

• Large response time caused by disk I/O limits the availability of hybrid– MSE for same query: 10-4 Mph

Large response time caused by disk I/O limits the availability of hybrid queries in real-time streaming applications Real Data Transformed Dataqueries in real time streaming applications

“What was the average speed in I-10 in LA county during summer 2009 from 4:00-5:00 pm?”98% ta98%

eed

e in

dat

Spe

aria

nce

% o

f Va

Response Time for the indexed %

ComponentsDatabaseResponse Time for the indexedtable containing data of one

Time TimeComponentsg

year (150 GB) : 58 Seconds!

Conclusion and Future Work

• Limited support for geostreaming (continuous spatial queries) in current D li ti f f t f t hi h ti l i

pp g g ( p q )database technologies Demo application as a proof of concept for a system which runs spatial queries over

real time datag

real-time data

Implementing the fundamentals of Clever Transportation (CT) project as a platform for monitoring, querying, and analyzing real-time Los Angeles traffic data

• Devising a scalable spatial alarm continuous query suitable for location-basedDevising a scalable spatial alarm continuous query suitable for location based services