analytics on time series data · 2017-11-09 · §goal:predict driver actions 1 sec before they...

34
Analytics on Time Series Data Jure Leskovec Stanford University Chan Zuckerberg Biohub Pinterest

Upload: others

Post on 23-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Analytics on Time Series Data

Jure LeskovecStanford UniversityChan Zuckerberg BiohubPinterest

Page 2: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

2Jure Leskovec

Page 3: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Sensors are Everywhere

§ Sequences of time stamped observations

Jure Leskovec, Stanford 3

Sensors are everywhere

I In many applications, we generate large sequences of timestampedobservations

– “Sensors” have a broad definition

Introduction 2

Page 4: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Sensor Data: Time Series§ Sensors generate lots of time-series

data

Jure Leskovec, Stanford 4

Network inference from time series data

I Convert a sequence of timestamped sensor observations into atime-varying network

Introduction 5

Page 5: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Sensors à Time Series

§ But such time series data are:§ High-dimensional§ Unlabeled§ High-velocity§ Dynamic§ Heterogeneous

Jure Leskovec (@jure), Stanford University 5

Network inference from time series data

I Convert a sequence of timestamped sensor observations into atime-varying network

Introduction 5

Page 6: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Sensors: Much data, no insights§ It is hard to obtain insights:

§ Sensor readings are interdependent and correlated

§ As the environment changes dependencies might change

§ Sensors might fail§ Readings might be asynchronous

Jure Leskovec, Stanford 6

Raw Data Structured Data Learning Algorithm Model

Downstream prediction task

Feature Engineering

Automatically learn the features

Page 7: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Sensors: Much data, no insights

§ Sensor data is hard to work with§ The algorithms must be:

§ Scalable: Large amounts of raw data over long time series

§ Robust: Must apply to lots of different applications

Jure Leskovec, Stanford 7

Page 8: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Challenge: ML & Insights

8

Raw Data Structured Data Learning Algorithm Model

Downstream prediction task

Feature Engineering

Automatically learn the features

§ (Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time!

Jure Leskovec (@jure), Stanford University

Page 9: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

9

How do we describe the structure of the time series so we can obtain insights and make predictions?

Page 10: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Discovering Structure§ Value in “breaking down” the time series

into a sequence of states:§ Allows us to draw interpretable conclusions

from the data

10

Page 11: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Segmentation and Clustering§ In general, the “states” are not predefined

§ We do know what they are, nor what they refer to…

§ Instead, we need to discover these states in an unsupervised way, while simultaneously segmenting the time series into the states!

11

Page 12: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

How to Describe StatesNetworks are great to encode

structure and dependencies in data

Jure Leskovec, Stanford 12

Network inference from time series data

I Convert a sequence of timestamped sensor observations into atime-varying network

Introduction 5

Sens

ors

Depe

nden

cies

Network Inference via the Time-Varying Graphical Lasso. D. Hallac, Y. Park, S. Boyd, J. Leskovec.ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2017.

Page 13: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Our Approach§ Our approach: Given a set of sensor

data, we learn temporal dependency networks between different sensors

§ The network is not static but can change over time§ Allows us to gain insights into the

process and detect anomalies

Jure Leskovec, Stanford 13

Page 14: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

States and Dependencies

14

State

Sensor Dependency network of state A, B

Page 15: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

TICC Problem Setup§ Formal definition:

where,

15

Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data. D. Hallac, S. Vare, S. Boyd, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2017

Page 16: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

TICC: Scalability

§ Can scale to problems with tens of millions of observations!

Jure Leskovec, Stanford 16

Scalability

I ADMM can solve for millions of variables in minutes!

– Centralized solver (CVXPY) explodes computationally

Number of Unknowns TVGL Interior-Point100 0.9 37.9360 0.9 5362.7200,000 48.3 -5 million 706.4 -

Results 18

CVXPYSnapVX

SnapVX: A Network-Based Convex Optimization Solver. D. Hallac, C. Wong, S. Diamond, A. Sharang, R. Sosič, S. Boyd, J. Leskovec. Journal of Machine Learning Research (JMLR), 18(4):1−5, 2017.

Page 17: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Case Study: Cars§ Car driving sessions:

§ 36,000 samples @ 10Hz

§ We observed 7 sensors:§ Brake pedal position§ Forward (X-)acceleration§ Lateral (Y-)acceleration§ Steering wheel angle§ Vehicle velocity§ Engine RPM§ Gas pedal position

17

Page 18: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Cars – “Turning” State

18

Page 19: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Cars– “Stopping” State

19

Page 20: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

TICC Results: Car Data§ We run TICC with K=5 states

(selected via BIC)§ The betweenness centrality score of

each node in each state/network:

20

Page 21: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Resulting StatesGreen = straight, White = slowing down, Red = turning, Blue = speeding upResults are very consistent across the data!

26

Page 22: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

22

Predicting the Future(but without feature engineering)

Page 23: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Problem Setup§ Dataset: Automobile data containing

1,400 sensors recording at 10 Hz.

§ Goal: Predict driver actions 1 sec before they occur§ Left/Right blinker§ Accelerate (gas pedal > threshold)§ Hard braking (brake pedal < threshold)

23

Driver Identification Using Automobile Sensor Data from a Single Turn. D. Hallac, A. Sharang, R. Stahlmann, A. Lamprecht, M. Huber, M. Roehder, R. Sosic, J. Leskovec IEEE International Conference on Intelligent Transportation Systems (ITSC), 2016.

Page 24: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Neural Networks§ Multi-layer RNN architecture:

§ The output of the first LSTM is passed as input to a second LSTM)

§ 3,000h of driving, 1400 sensors, at 10Hz=150 billion datapoints

24

Page 25: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Time-Series Analysis

2/21/17 25

SparseTSV SNAPBinaryFormat

AsynchronousSignalArray

Task-SpecificData

ProcessResults

Space-EfficientDataConversion

DataFiltering

Analysis

Page 26: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Results – Brake Pedal

26

0.934 Test Set AUC (0.935 Training AUC)

§ AUC of predicting whether the driver will press the break pedal:

Page 27: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Predicting Driver Actions

27

Page 28: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Predicting Driver Actions

28

Page 29: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Predicting Driver Actions

29

Page 30: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Failure Prediction§ Dataset: Boiler data containing 100

sensors at 1Hz for 6-12 months§ Goal: Predict component failures 7

days before they occur

30

Page 31: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Preliminary Results§ Very promising results across various

deep learning models!

31

Page 32: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Conclusion§ Complex engineered

systems§ High-dimensional unlabeled

time series data collected in real-time

§ We need tools to understand these data as well as to make accurate predictions

Jure Leskovec, Stanford University 32

Sensors are everywhere

I In many applications, we generate large sequences of timestampedobservations

– “Sensors” have a broad definition

Introduction 2

Page 33: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

Thanks!§ Joint work with D. Halla,c Y. Park, S. Boyd, R.

Sosic, S. Vare, A. Sharang, C. Wong, S. Diamond.

§ Papers:§ Network Inference via the Time-Varying Graphical Lasso. D. Hallac, Y. Park, S. Boyd, J. Leskovec.ACM SIGKDD

International Conference on Knowledge Discovery and Data Mining (KDD), 2017.

§ Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data. D. Hallac, S. Vare, S. Boyd, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2017.

§ Learning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields. Y. Park, D. Hallac, S. Boyd, J. Leskovec. Artificial Intelligence and Statistics Conference (AISTATS), 2017.

§ SnapVX: A Network-Based Convex Optimization Solver. D. Hallac, C. Wong, S. Diamond, A. Sharang, R. Sosič, S. Boyd, J. Leskovec. Journal of Machine Learning Research (JMLR), 18(4):1−5, 2017.

§ Driver Identification Using Automobile Sensor Data from a Single Turn. D. Hallac, A. Sharang, R. Stahlmann, A. Lamprecht, M. Huber, M. Roehder, R. Sosic, J. Leskovec IEEE International Conference on Intelligent Transportation Systems (ITSC), 2016.

§ Network Lasso: Clustering and Optimization in Large Graphs. D. Hallac, J. Leskovec, S. Boyd. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015.

Jure Leskovec, Stanford University 33

Page 34: Analytics on Time Series Data · 2017-11-09 · §Goal:Predict driver actions 1 sec before they occur §Left/Right blinker §Accelerate (gas pedal > threshold) §Hard braking (brake

http://snap.stanford.edu@jure

THANKS!

Jure Leskovec, Stanford University 34