improving mobile payments with real time spark
TRANSCRIPT
Improving Mobile Payments with Real time
Spark
● Madhukara Phatak
● Big data consultant and trainer at datamantra.io
● Consult in Hadoop, Spark and Scala
● www.madhukaraphatak.com
Agenda● Mobile as drive for big data● Our customer solution● Existing data solution● Improved solution● Technical details● Future enhancements● Q & A
Mobile as Big data drive● Mobile has changed the way in which we interact with
world● Most of the buy/sell happens on mobile today
○ Myntra went fully mobile○ Flipkart and amazon say their 50% buy happens on
mobile○ Quikr and OLX is mobile based selling platform○ Ola etc
Challenges in Mobile● Customers expect the service to available 24/7 ● Tiny screens make very challenging to typical software
flows● Flaky connectivity of mobile networks makes it tougher● Constant moving results in drop in interactions● No more downtime● Everything has to be done in realtime
Mobile payments ● Almost every app earlier mentioned needs some kind of
payment● Getting payments right on mobile is very hard● Globally 21% of online shoppers abandon their basket
due to payment failures or delays● Some companies are building sdk’s to help the app
developers● Our customer is one of them
Why mobile payments are hard?
Too many inputs
Terrible interface by Banks
OTP vs Password
Our customer solution
Our customer solution● Mobile sdk for applications simplify the payments● SDK provides better user interface like big buttons to
generate OTP or other flows ● SDK also helps in filling up different kind of forms given
by different banks using consistent UI● Better user experience across applications ● Application sends anonymous payments details across
apps to our customer servers
Some numbers● 40 + customers● Over 1 million transactions per month as per March● Around 55% success rate ( 5 % above average)● Supports major banks, payment gateways and wallet
providers● Soon will be available in other than mobile payment
space
Why data matters?● As number of transaction increases, things will go
wrong● There are so many different combinations to go wrong● Example
○ Airtel OTP failing with state bank netbanking○ Customers stuck in password page○ Not able to read OTP from some specific
● Understanding customer pain and reacting to it is paramount
● Every help results in payment
Initial BI solution
Events
Hourly Push
JSON Data
S3FS
Session Wise Aggregations
Initial BI solution ● Phone sdk pushes events like transaction initiation,
payment complete to logging servers● Logging servers roll log for every one hour and push to
s3● A single node spark machine aggregates data by
sessions and pushes it to mysql● Google BigQuery is used for adhoc querying
Challenges with BI solution● Batch processing
● Geared towards more of report generation oriented flow
● Very minimal use of Spark API’s as team was not well aware of it’s potential
● No integration with mobile sdk for feed back loop
Requirements for consulting● Bring the same reporting calculations to real time● Understanding the user behaviour and tracking his/her
flow over a session● Closing the loop by providing automatic alerts based on
the metric calculations● Some new specific business cases like loyalty
management etc● Improving team expertise on spark
Choosing Spark streaming● Company was already invested in Spark so spark
streaming was no brainer● Also porting spark batch code to streaming was mostly
straight forward as both talk same API● Company used python as Spark API language which
was supported by streaming also● So we didn't consider storm we went ahead with Spark
streaming
First version
Events
Five Minute Push
JSON Data
FileStream
Session Wise Aggregations
First version ● We used fileStream API of spark streaming which
allowed us to poll a s3 bucket for every few mins● A new rolling appender was added to log servers to
push logs to s3 every 5 mins● Exact same batch code was used for calculations which
made transition very easy● All downstream applications remained same
Second version
Events
JSON Data
Session Wise Aggregations
HourlyPush Realtime
Amazon Kinesis● A kafka like distributed message queue by Amazon● It’s used as managed kafka source on AWS web
services● Highly scalable and low latency support● Persistence with fault tolerance across multiple
availability zones● Great integration with Spark
Second version● Amazon kinesis is added as real time stream source● Logging server push logs to kinesis as they arrive● Streaming application pulls the data from kinesis for
every few mins● Multiple partitions support added for parallel streams
Challenges with Python● Spark streaming API for python was introduced in 1.2
whereas spark-streaming for Scala/Java is available from 0.8
● No aws kinesis connector was available as of March● Team has to write it’s own● No support for python in Spark job server
Challenges from batch to streaming● Session typically last from 1-10 mins. Batch is easy
most of the time session is done for a one hour data but challenging for real time data
● Designing state for session● Designing checkpointing and deciding on interval● Weird checkpointing issues with s3 due to eventual
consistency
Improvements to batch code● Most of the code was written in rdd paradigm as it was
only know to team● Team was trained on spark sql and spark streaming● Majority code was ported to Spark sql based solution to
improve readability and maintainability● Recently moved into Dataframe based code
Third version
Events
JSON Data
Session Wise Aggregations
HourlyPush Realtime
Choosing Mesos● Mesos is a great cluster manager for Spark only
workloads● Has specific coarse-grain mode which is dedicated for
the real time systems● Minimal overhead compared to YARN● Easy to setup on EC2
Fourth version
Events
JSON Data
Session Wise Aggregations
HourlyPush Realtime
Grafana● Added grafana for visualization and dashboards● Graphana = Graphite + influxDB● Moved away from mysql to time series database influx
DB● Scales much better compared to mysql● Data scientists or product managers can monitor
customers using these dashboards● Integrates with mobile sdk