imcsummit 2015 - day 2 developer track - implementing a highly scalable in-memory stock prediction...
TRANSCRIPT
© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.
A Stock Prediction System using open-source software
Fred Melo [email protected] @fredmelo_br
1
William Markito [email protected]
@william_markito
© Copyright 2014 Pivotal. All rights reserved.
It's all about DATA
Data SourcesLook for patterns
Prediction
© Copyright 2014 Pivotal. All rights reserved. 4
© Copyright 2014 Pivotal. All rights reserved. 5
Machine Learning is the answer
Neural Networks
Clustering Genetic Algorithms
© Copyright 2014 Pivotal. All rights reserved. 6
Train with historical dataset
Apply model to the new input
Applying Machine Learning
Hard to add new data sources
Why?
Hard to scale
Why so hard?
Hard to make it real-time
Traditional models are reactive and static
HDFS
Data Lake
Store Analytics
Hard to change Labor intensive
Inefficient
No real-time information ETL based Data-source specific
Stream-based, real-time closed-loop analytics are needed
HDFSData LakeExpert System /
Machine Learning
In-Memory Real-Time Data
Continuous Learning Continuous Improvement
Continuous Adapting
Data Stream Pipeline
Multiple Data Sources Real-Time Processing Store Everything
Info
Analysis
Look at past trends (for similar input)
Evaluate current input
Score / Predict
Neural Network
How can it be addressed?
Info
Analysis
Filter
[ json ]
Neural Network
How can it be addressed?
Info
Analysis
Filter EnrichNeural Network
How can it be addressed?
Info
Analysis
Neural NetworkFilter Enrich Transform
How can it be addressed?
Info
Analysis
Filter Enrich Transform
Neural Network
How can it be addressed?
Info
Analysis
Filter Enrich Transform
Transform
Neural Network
How can it be addressed?
Neural Network
In-Memory Data GridReal-time scoring
How can it be addressed?
Train
Neural Network
In-Memory Data Grid
Front-end
Update Push
How can it be addressed?
Ingest Transform SinkSpringXD
Store / Analyze
Fast Data
Distributed Computing
Predict / Machine Learning
Other Sources and Destinations
JMS
Streaming real-time analytics architecture
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
HTTP
Machine Learning
Fast DataFilter
Predict SinkHTTP
Split
Dashboard
Push
Demo Architecture
SpringXD
INGEST / SINK PROCESS ANALYZE
• Little or no coding required
• Dozens of built-in connectors
• Seamless integration with Kafka, Sqoop
• Create new connectors easily using Spring
• Call Spark, Reactor or RxJava
• Built-in configurable filtering, splitting and transformation
• Out-of-box configurable jobs for batch processing
• Import and invoke PMML jobs easily
• Call Python, R, Madlib and other tools
• Built-in configurable counters and gauges
Data Stream Pipelining
SpringXD
XD NodesXD NodesXD NodesXD Nodes
Ingest
SpringXD
Split Filter Transform Sink
XD admin
XD Nodes
Ingest Split Filter Transform Sink
Stream Deployment
Messaging
Scale-Out and HA Architecture
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
HTTP
Machine Learning
Fast DataFilter
Predict SinkHTTP
Split
Dashboard
Push
Demo Architecture
Geode client-server architecture
GemFire'Server'
Par,,oned'Region'
GemFire'Server'
Par,,oned'Region'
GemFire'Locator'
!
GemFire'Client'
Local'Cache'
Connec,on'pool'
Send!address!and!load!informa.on!to!locator!
Send,!receive!cache!data.!Receive!server!events!
Request!server!informa.on!from!locator.!Locator!responds!with!least!loaded!server!address.!
Partitioned Regions
GemFire Server1 GemFire Server2
Primary'to'redundant'replica1on'
Primary
0 2 4 6
Redundant
1 3 5 7
Region A Region B Region A Region B
Primary
1 3 5 7
Redundant
0 2 4 6
Event handling
GemFire'Server'
Region'A''''''''''''
subscrip4on
Region'A'pool6name=ServerPool''''''''''' X
GemFire'Client'1'
pool'"ServerPool"'(with'or'without''
subscrip4ons'enabled)
Region'A'pool6name=ServerPool''''''''''' X
GemFire'Client'2'
pool'"ServerPool"'(with'subscrip4ons'enabled,'
'interest'register'in'X,'receiveValues=true)
X Distributed'System
Update'/'Create
1
2
3 3
4
X
The pool propagates the event to the cache server,
where the region is updated.
The server distributes the event to its peers and also places it into
the subscription queue for Client 2.
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
HTTP
Machine Learning
Fast DataFilter
Predict SinkHTTP
Split
Dashboard
Push
Demo Architecture
Neural Networks
Neural Networks
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Neural Network
Neural Network
Transform Sink
SpringXD
Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native
HTTP
Machine Learning
Fast DataFilter
Predict SinkHTTP
Split
Dashboard
Push
Demo Architecture
Demo Time
SpringXD
shell - R
Transformer
geode-json client
geode-json client
http-client
http-server
obj-to-json
splitter
splitter
Simulator
tap
SpringXD
http://projectgeode.org http://projects.spring.io/spring-xd https://registry.hub.docker.com/ http://www.r-project.org
A NEW PLATFORM FOR A NEW ERA