the open source... behind the tweets
DESCRIPTION
TRANSCRIPT
#twitterflight
October 22, 2014 #twitterflight
The Open Source… Behind the Tweets
Open source is everywhere!On your phone, in your car… and within Twitter! !
http://www4.mercedes-benz.com/manual-cars/ba/foss/content/en/assets/FOSS_licences.pdf
iOS: General->About->Legal->Legal Notices !
Vine: General->About->Legal !
Chris AniszczykHead of Open Source
@cra
Twitter runs on Open Source
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanoutwrite search batch fin
Tweet!
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
https://dev.twitter.com/rest/reference/post/statuses/updateYour first stop as a tweet: Twitter Front End (TFE)
A fancy reverse proxy for HTTP traffic built on the JVMHandles authentication, rate limits and more!Powered by the open source project Netty: http://netty.io
tweet fanoutwrite search batch fin
Netty at TwitterNetty is open source Java NIO framework
Used heavily at Twitter Healthy adopter community:
http://netty.io/wiki/adopters.html !
Cloudhopper sends billions of SMS messages per month using Netty
https://github.com/twitter/cloudhopper-smpp !
We contributed SPDY support to Netty: http://netty.io/news/2012/02/04/3-3-1-spdy.html
*https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
Twitter backend architecture is *service-oriented (on the JVM)Core services are built on top of Finagle (using an API framework)
Finagle is written in Scala and built on top of Nettyhttps://github.com/twitter/finagle
tweet fanoutwrite search batch
*http://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture
fin
Finagle at TwitterWhy Scala?
Scala enables succinct expression (vs Java) Less typing is less reading; brevity enhances clarity Two open source Scala/Finagle guides from Twitter:
https://twitter.github.io/effectivescala/ https://twitter.github.io/scala_school/
!
Finagle is our fault tolerant protocol-agnostic RCP framework built on Netty
Emphasizes services modularity via async futures Handles failover semantics, metrics, logging etc…
*https://blog.twitter.com/2014/netty-at-twitter-with-finagle
Finagle Service Example// #1 Create a client for each service!val timelineSvc = Thrift.newIface[TimelineService](...)!val tweetSvc = Thrift.newIface[TweetService](...)!val authSvc = Thrift.newIface[AuthService](...)! !// #2 Create new Filter to authenticate incoming requests!val authFilter = Filter.mk[Req, AuthReq, Res, Res] { (req, svc) =>! authSvc.authenticate(req) flatMap svc(_)!}! !// #3 Create a service to convert an authenticated timeline request to a json response!val apiService = Service.mk[AuthReq, Res] { req =>! timelineSvc(req.userId) flatMap {tl =>! val tweets = tl map tweetSvc.getById(_)! Future.collect(tweets) map tweetsToJson(_) }! }! }!!// #4 Start a new HTTP server on port 80 using the authenticating filter and our service!Http.serve(":80", authFilter andThen apiService)!
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanoutwrite search batch fin
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
Tweets need to be stored somewhere (via a Finagle-based core service)TBird: persistent storage for tweets
Built originally on Gizzard: https://github.com/twitter/gizzardTweets stored in sharded and replicated MySQL
TFlock: track relations between users and tweetsBuilt originally on FlockDB: https://github.com/twitter/flockdb
tweet fanoutwrite search batch fin
MySQL at TwitterMaintain a public fork of v5.5/v5.6
Goal is to“work” with upstream https://github.com/twitter/mysql
Co-founded the WebScaleSQL.org effort
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanoutwrite search batch fin
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
When a tweet is generated it needs to be written to all relevant timelinesTimelines are essentially a list of tweet ids (heavily cached)Fanout is the process where tweets are delivered to timelinesFor caching we rely on the open source project Redis
https://github.com/antirez/redis
tweet fanoutwrite search batch fin
Redis at TwitterRedis is used for caching timelines and more!
Added custom logging, data structures We are working to upstream some changes…
@thinkingfish gave a fantastic talk on this: https://www.youtube.com/watch?v=rP9EKvWt0zo
!
Open Source Proxy for Redis: Twemproxy https://github.com/twitter/twemproxy Used by Vine, Pinterest, Wikimedia, Snapchat etc…
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanoutwrite search batch fin
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
Everyone searches for tweets: https://dev.twitter.com/rest/public/searchIn fact, one of the most heavily trafficked search engines in the world
Back in the day, Twitter search was built on MySQLToday, Twitter search is an optimized real-time search/indexing technology
Powered by Apache Lucene: http://lucene.apache.org!
!
tweet fanoutwrite search batch fin
Lucene (earlybird) at TwitterEarlybird* is Twitter’s real-time search engine built on top of Apache Lucene !
We optimized Lucene (cut corners) to handle tweets only since that’s all we do
e.g., less space: 140 characters only need 8 bits !
Read about Blender, our search front-end https://blog.twitter.com/2011/twitter-search-now-3x-faster
*http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanoutwrite search batch fin
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanoutwrite search batch
Hadoop is used for many things at Twitter, like counting words :)scribe logs, batch processing, recommendations, trends, user modeling and more!10,000+ hadoop servers, 100,000+ daily hadoop jobs,10M+ daily hadoop tasks
Parquet is a columnar storage format for Hadoophttps://parquet.incubator.apache.org
Scalding is our Scala DSL for writing Hadoop jobshttps://github.com/twitter/scalding
!
!
fin
Parquet/Scalding at TwitterParquet* is a columnar storage format
Initially a collaboration between Twitter/Cloudera Inspired by Google Dremel paper** Now at Apache: http://parquet.incubator.apache.org/
!
Scalding built on top of Scala and Cascading https://github.com/Cascading/cascading Makes it easier* to write Hadoop jobs (using Scala)
*https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
Scalding Exampleimport com.twitter.scalding._!!// can’t have a Hadoop example without word count!!class WordCountJob(args : Args) extends Job(args) {! TextLine( args("input") )! .flatMap('line -> 'word) { line : String => line.split("""\s+""") }! .groupBy('word) { _.size }! .write( Tsv( args("output") ) )!}
https://github.com/twitter/scalding/wiki/Rosetta-Code
Life of a TweetWhat open source technology do we use behind the scenes when we tweet?
tweet fanout finwrite search batch
Sharing is caring, contribute!Lets all make Twitter better! !
!
!
opensource.twitter.com https://github.com/twitter
New Open Source API SamplesHack on the samples and improve them! https://github.com/twitterdev (t.co/code)
!
Also, later today check out the lightning talk by Andrew Noonan later about the “Twitter’s developer toolbox” !
Thank You
Q&AThe Open Source Behind the Tweets http://opensource.twitter.com !
Hope you learned something new! Come see us at the @TwitterOSS Booth!
Chris Aniszczyk (@cra)
Resourceshttps://opensource.twitter.com https://github.com/twitter/finagle https://github.com/twitter/zipkin https://github.com/twitter/scalding https://github.com/twitter/mysql https://github.com/twitter/twemproxy https://twitter.github.io/scala_school http://webscalesql.org http://mesos.apache.org http://parquet.incubator.apache.org !
October 22, 2014 #twitterflight
Backup Slides
Where does it all run?Main concept: Datacenter as a computer Aggregation and not virtualization !
!
!
mesos.apache.orgaurora.incubator.apache.orgmasters
framework
offer hostname 4 CPUs 4 GB RAM
offer hostname 4 CPUs 4 GB RAM
offer hostname 4 CPUs 4 GB RAM
offer hostname 4 CPUs 4 GB RAM
Profiles
Search / S&R
Trends / S&R
Home timeline / TLS
PTw / Ads
Contact import / Growth
Compose
DMs / Social Discover / S&R
WtF / S&R