how credit karma makes real-time decisions for 60 million users with akka streams and actors
TRANSCRIPT
1 Proprietary & Confidential1 Proprietary & Confidential
Using Akka Streams
For Real Time Decision MakingDustin LyonsEngineering Manager, Data Platform
2 Proprietary & Confidential
● Engineer turned Engineering Manager at Credit Karma
● Data & Analytics on the Platform team● Build things that make decisions on
where data should go● Lover of science fiction, sushi, and
electronic music
Who I am
3 Proprietary & Confidential
Credit Karma is a free financial assistant, helping over 60 million people make progress.
4 Proprietary & Confidential
1. Data Infrastructure at Credit Karma: Past and current2. Mo’ data, mo’ problems3. Akka Streams saves the day4. Results and learnings5. Q&A
Agenda for today
5 Proprietary & Confidential
Data scale (MB/min) @ Credit Karma
6 Proprietary & Confidential
Credit Karma data platform: PHP days
PHP Scripts
7 Proprietary & Confidential
New tools to help with scale
8 Proprietary & Confidential
Credit Karma data platform: Scala in 2014
Data Warehouse Import
9 Proprietary & Confidential
New tools to help with concurrency
10 Proprietary & Confidential
Credit Karma data platform: Akka in 2015
Analytics Export Service+
Data Warehouse Import
11 Proprietary & Confidential
Credit Karma data platform: Akka in 2015
Analytics Export Service+
Data Warehouse Import
12 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer Workers
Kafka Importer Workers
Analytics Export Service
HTTP Ingest Server
13 Proprietary & Confidential
Analytics export service
14 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer Workers
Kafka Importer Workers
Analytics Export Service
HTTP Ingest Server
15 Proprietary & Confidential
Analytics export service
16 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service
17 Proprietary & Confidential
Data warehouse import
18 Proprietary & Confidential
Marble maze
19 Proprietary & Confidential
Marble maze
20 Proprietary & Confidential
Marble maze
21 Proprietary & Confidential
Marble maze
22 Proprietary & Confidential
Marble maze
1Reading from file
23 Proprietary & Confidential
Marble maze
1
2
Reading from file
Waiting for external service
24 Proprietary & Confidential
Marble maze
1
3
2
Reading from file
Objects sit in heap
Waiting for external service
25 Proprietary & Confidential
Marble maze
1
3
2
Reading from file
Objects sit in heap
Waiting for external service
4 Database Insert
26 Proprietary & Confidential
Backpressure
27 Proprietary & Confidential
What is backpressure?
Backpressure refers to the buildup of data at an I/O switch when buffers are full and not able to receive additional data.
No additional data packets are transferred until the bottleneck of data has been eliminated or the buffer has been emptied.
28 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer Workers
Kafka Importer Workers
Analytics Export Service
HTTP Ingest Server
29 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer Workers
Kafka Importer Workers
Analytics Export Service
HTTP Ingest Server
30 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer Workers
Kafka Importer Workers
Analytics Export Service
HTTP Ingest Server
31 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service
32 Proprietary & Confidential
Akka Streams: Backpressure in action
Actor Actor
Data
Demand
33 Proprietary & Confidential
Akka Streams: Creating a stream
Source Flow Sink
34 Proprietary & Confidential
Akka Streams: Built in stages
Built In Sources• actorRef • actorPublisher• fromIterator • fromFile• Apply (from a Seq)
Built In Processing Stages• map • filter• grouped • drop/take• dropWhile/takeWhile • sliding
Built In Sinks• head • last• seq • foreach• actorRef • actorSubscriber• reduce • fold
Backpressure Aware Stages• mapAsync • buffer (Backpressure)• batch • buffer (Drop)
• buffer (Fail)
Reference: http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html
35 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer Workers
Kafka Importer Workers
Analytics Export Service
HTTP Ingest Server
36 Proprietary & Confidential
Analytics export service
Coordinator
Analytics Export Service
HTTP Ingest ServerAkka Stream
37 Proprietary & Confidential
Analytics export service
38 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service
39 Proprietary & Confidential
Data warehouse import
Extractors
Data Warehouse Import Service
Akka Stream
40 Proprietary & Confidential
Data warehouse import service
41 Proprietary & Confidential
Analytics export service heap (before)
GiB
=>
Time =>
28 GiB
Red: Heap SpaceBlue: Used Heap SpacePurple: Max Heap Space
42 Proprietary & Confidential
Analytics export service heap (after)
GiB
=>
Time =>
28 GiB
43 Proprietary & Confidential
Data warehouse import
44 Proprietary & Confidential
Data warehouse import
45 Proprietary & Confidential
Data warehouse import
46 Proprietary & Confidential
• Akka Streams allowed us to move data with increased throughput and optimal performance
• No longer getting paged for JVM out of memory or spending time tuning our services
• Reduced the SLA for data delivery to our business stakeholders
Final results
47 Proprietary & Confidential
• Akka Actors: Great for low latency• Akka Streams: Optimized for high throughput and solving back pressure
• Built on top of Akka Actors• Don’t try to build high throughput systems with an actor system, you’ll just start
building Akka Streams
Lessons learned
48 Proprietary & Confidential48 Proprietary & Confidential
Thank you!
Q&ADustin LyonsEngineering Manager, Data Platform