couchbase meetup jan 2016
TRANSCRIPT
Michael Kehoe Senior Site Reliability Engineer
LinkedIn’s Big Data Pipeline with Kafka, Hadoop and
Couchbase
3
$ whoami Michael Kehoe
• Sr Site Reliability Engineer (SRE)
• Member of CBVT• B.E. (Electrical Engineering)
fromthe University of Queensland,Australia
4
Kafka @ LinkedIn
• Kafka was created by LinkedIn• Kafka is a publish-subscribe
system as a distributed commit log
• Processes 500+ TB/ day (~500 billion messages)
5
LinkedIn’s use of Kafka
• Monitoring• Pub-SubMessaging• Analytics• Buildingblockfor(log)distributed
application• Samza• Espresso• Pinot
6
Kafka to Hadoop (Analytics)Use Case
• LinkedIntracksdatatobetterunderstandhowmembersuseourproducts
• InformationsuchaswhichpagegotviewedandwhichcontentgotclickedonaresentintoaKafkaclusterineachdatacenter
• SomeoftheseeventsareallcentrallycollectedandpushedontoourHadoopgridforanalysisanddailyreportgeneration
7
Couchbase @ LinkedIn
• About80separateserviceswithoneormoreclustersinmultipledatacenters
• Upto~70serversinacluster• Single&Multi-tenantclusters
8
Hadoop to Couchbase
• Ourprimaryuse-caseforHadoopCouchbaseisforbuilding(warming)/restoringCouchbasebuckets
• LinkedInbuiltit’sownin-housesolutiontoworkwithourETLprocessesetc
9
Jobs ClusterClusters & Numbers
• Usedforread-scaling,>150kQPS,27nodeclusters
• WeuseHadooptopre-builddatabypartition• Couchbaseaveragelatencyis2-3ms
• 99thpercentileis~8-12ms
10
Questions?Thank You
©2014 LinkedIn Corporation. All Rights Reserved.