reliable and scalable data streaming in multi-hop architecture

Computer Measurement Group, India 1Computer Measurement Group, India 1

www.cmgindia.org

Reliable and Scalable Data Streaming in Multi-Hop Architecture

Sudhir Sangra, BMC SoftwareLalit Shukla , BMC Software

http://www.cmgindia.org/

Computer Measurement Group, India 2

Contents

• Introduction• Challenges• Approaches– Acknowledge based– Store and forward– Distributed consumer model (Peer to Peer Acknowledge)

• Conclusion• Questions


Introduction

• Multi-Hop architecture for various reasons but not limited to– Time to market - Buy decision – multiple component to

integrate• Integration of the different component using different

technologies underline, to provide end to end solution• Independent products performance and scalability needs

to be remained intact while new features are added to solve the customer problem and provide value

– Multiple hop to increase scalability, failover, load balance• Fully developed solution organically with technology parity

in end to end multi-hop architecture for streaming and scalability but still there are challenges to handle when middleware need to be stateless


Challenges

• Real time data streaming consumes resources on both the end (provider and consumer) impact scalability– Multi hop protocol difference – No TCP-IP like protocol to handle guaranteed delivery in

multi-hop environment

• Network bandwidth (Concerns when on WAN) specially on Software as a Service model– In-consistence detection trigger full data sync by provider– Provider failover from one middleware to another

middleware initiated the full sync of data

• Priority data such as event data is treated as metric data for transmission leading to delay in IT problem resolution


System behavior

The end to end system performance used to be function of the state of middle ware. Analytics resource requirement (CPU/Memory) were reflecting the spikes and used to take long time to settle after fixing the fault at the middle ware or network


System behavior

Database Performance was also a function of middleware state and network. Abrupt failure of any components were very expensive for DB operations.


Approach

• There could be various approaches to provide reliability and scalability for data streaming but we will be focusing in context of performance and monitoring domain where data can be categorized in 3 different sets– Event– Performance metric– Discovered resource instance


End to End Acknowledge Model Approach• For each message sent, acknowledge is needed for reliability delivery, in

case acknowledge is not received message is retransmitted. – Provide and consumer is sole responsible , middleware is stateless– Message are not discarded from provider until consumer sends an ACK or time to

live timer expires

• Instead of network layer TCP/IP protocol, application protocol is devised to handle the sequencing and ACK’ing as provider and consumer are not directly connected over socket

• Round trip to receive ACK message is used to determine the time interval for retransmission of message. This allows to take the server load in consideration

• Dynamic sliding window to handle the bust of messages– Message is added to window if window is empty and also added to disk cache– In case window is full, message is written to disk cache

• Discovered resources instances messages are best suitable for this model as – No tolerance in data loss


End to End Acknowledge Model Approach


Store and Forward Model Approach• In case of network glitch, server under maintenance mode or

server unavailability due to unknown reasons, data should not be lost– Application layer is transparent to these network issues– Virtual socket abstract the file system and TCP/IP socket to peer– Reestablishment of the connection via same hop or different hop in

case of fail over will not re-send the whole state but only delta which is generated after the link was broken increase scalability

• Basically suitable for “performance metric” for which acknowledgment is not desired because – Huge data– Some % of tolerance is acceptable for data loss

• Messages such as discovered resources instance piggy back on this model but does not degrade the model because change in system characteristics/resources are not frequent


Store and Forward Model Approach


Distributed Data Consumer Model Approach

• Event messages are highest priority message. Loss in these messages leads to business loss

• These message requires high grade reliability and message loss which are in transition is not acceptable

• Peer – to – Peer acknowledgment distributed consumer model is best suited for these messages– Hop connected on TCP/IP socket is responsible for acknowledge– 100 % fault tolerant and reliability– Each hop is designated as processing hop or presentation hop– Event processing happens on each hop which reduces the

network bandwidth and/or resources needed on end consumer to process the message because pre-processing of message is already done


Distributed Data Consumer Model Approach


System behavior

The system resources utilization is now uniformly distributed and no longer remains the function of faulty middleware states.

Process pronet_cntlSun 01/26/14 10:00 AM to Mon 01/27/14 10:00 AM


Conclusion

• Categorizing the data and applying the relevant model provided a significant improvement – The architecture has increased the number of allowed

attributes under analytics from 1.2 M to 1.7 M, which is nearly 30% more than the traditional integration method of pulling data from the source nodes with no or minimal data loss

– Number of device supported went from 200 to 1000 which is 5 X improvement with no or minimal data loss

– The architecture became stateless and thus could fit to extend linearly with a mix of distributed analytics and distributed middleware


Questions

?s

reliable and scalable data streaming in multi-hop architecture

Documents

end provider

data sync

event data

metric data

scalable data streaming

sync of datapriority

system performance

function of middleware