imcsummit 2015 - day 2 developer track - implementing a highly scalable in-memory stock prediction...

37
© Copyright 2014 Pivotal. All rights reserved. © Copyright 2014 Pivotal. All rights reserved. A Stock Prediction System using open-source software Fred Melo [email protected] @fredmelo_br 1 William Markito [email protected] @william_markito

Upload: 2015-in-memory-computing-summit

Post on 17-Aug-2015

500 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.

A Stock Prediction System using open-source software

Fred Melo [email protected] @fredmelo_br

1

William Markito [email protected]

@william_markito

Page 2: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD
Page 3: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

© Copyright 2014 Pivotal. All rights reserved.

It's all about DATA

Data SourcesLook for patterns

Prediction

Page 4: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

© Copyright 2014 Pivotal. All rights reserved. 4

Page 5: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

© Copyright 2014 Pivotal. All rights reserved. 5

Machine Learning is the answer

Neural Networks

Clustering Genetic Algorithms

Page 6: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

© Copyright 2014 Pivotal. All rights reserved. 6

Train with historical dataset

Apply model to the new input

Applying Machine Learning

Page 7: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Hard to add new data sources

Why?

Hard to scale

Why so hard?

Hard to make it real-time

Page 8: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Traditional models are reactive and static

HDFS

Data Lake

Store Analytics

Hard to change Labor intensive

Inefficient

No real-time information ETL based Data-source specific

Page 9: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Stream-based, real-time closed-loop analytics are needed

HDFSData LakeExpert System /

Machine Learning

In-Memory Real-Time Data

Continuous Learning Continuous Improvement

Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

Page 10: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Info

Analysis

Look at past trends (for similar input)

Evaluate current input

Score / Predict

Neural Network

How can it be addressed?

Page 11: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Info

Analysis

Filter

[ json ]

Neural Network

How can it be addressed?

Page 12: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Info

Analysis

Filter EnrichNeural Network

How can it be addressed?

Page 13: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Info

Analysis

Neural NetworkFilter Enrich Transform

How can it be addressed?

Page 14: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Info

Analysis

Filter Enrich Transform

Neural Network

How can it be addressed?

Page 15: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Info

Analysis

Filter Enrich Transform

Transform

Neural Network

How can it be addressed?

Page 16: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Neural Network

In-Memory Data GridReal-time scoring

How can it be addressed?

Train

Page 17: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Neural Network

In-Memory Data Grid

Front-end

Update Push

How can it be addressed?

Page 18: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Ingest Transform SinkSpringXD

Store / Analyze

Fast Data

Distributed Computing

Predict / Machine Learning

Other Sources and Destinations

JMS

Streaming real-time analytics architecture

Page 19: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Page 20: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

SpringXD

INGEST / SINK PROCESS ANALYZE

• Little or no coding required

• Dozens of built-in connectors

• Seamless integration with Kafka, Sqoop

• Create new connectors easily using Spring

• Call Spark, Reactor or RxJava

• Built-in configurable filtering, splitting and transformation

• Out-of-box configurable jobs for batch processing

• Import and invoke PMML jobs easily

• Call Python, R, Madlib and other tools

• Built-in configurable counters and gauges

Data Stream Pipelining

Page 21: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

SpringXD

XD NodesXD NodesXD NodesXD Nodes

Ingest

SpringXD

Split Filter Transform Sink

XD admin

XD Nodes

Ingest Split Filter Transform Sink

Stream Deployment

Messaging

Scale-Out and HA Architecture

Page 22: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Page 23: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Geode client-server architecture

GemFire'Server'

Par,,oned'Region'

GemFire'Server'

Par,,oned'Region'

GemFire'Locator'

!

GemFire'Client'

Local'Cache'

Connec,on'pool'

Send!address!and!load!informa.on!to!locator!

Send,!receive!cache!data.!Receive!server!events!

Request!server!informa.on!from!locator.!Locator!responds!with!least!loaded!server!address.!

Page 24: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Partitioned Regions

GemFire Server1 GemFire Server2

Primary'to'redundant'replica1on'

Primary

0 2 4 6

Redundant

1 3 5 7

Region A Region B Region A Region B

Primary

1 3 5 7

Redundant

0 2 4 6

Page 25: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Event handling

GemFire'Server'

Region'A''''''''''''

subscrip4on

Region'A'pool6name=ServerPool''''''''''' X

GemFire'Client'1'

pool'"ServerPool"'(with'or'without''

subscrip4ons'enabled)

Region'A'pool6name=ServerPool''''''''''' X

GemFire'Client'2'

pool'"ServerPool"'(with'subscrip4ons'enabled,'

'interest'register'in'X,'receiveValues=true)

X Distributed'System

Update'/'Create

1

2

3 3

4

X

The pool propagates the event to the cache server,

where the region is updated.

The server distributes the event to its peers and also places it into

the subscription queue for Client 2.

Page 26: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Page 27: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Neural Networks

Page 28: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Neural Networks

Page 29: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Neural Network

Page 30: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Neural Network

Page 31: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Page 32: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD
Page 33: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Demo Time

Page 34: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

SpringXD

shell - R

Transformer

geode-json client

geode-json client

http-client

http-server

obj-to-json

splitter

splitter

Simulator

tap

Page 35: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

SpringXD

http://projectgeode.org http://projects.spring.io/spring-xd https://registry.hub.docker.com/ http://www.r-project.org

Page 36: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD
Page 37: IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

A NEW PLATFORM FOR A NEW ERA