the open source... behind the tweets

34
#twitterflight

Upload: chris-aniszczyk

Post on 02-Dec-2014

514 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The Open Source... Behind the Tweets

#twitterflight

Page 2: The Open Source... Behind the Tweets

October 22, 2014 #twitterflight

The Open Source… Behind the Tweets

Page 3: The Open Source... Behind the Tweets

Open source is everywhere!On your phone, in your car… and within Twitter! !

http://www4.mercedes-benz.com/manual-cars/ba/foss/content/en/assets/FOSS_licences.pdf

iOS: General->About->Legal->Legal Notices !

Vine: General->About->Legal !

Page 4: The Open Source... Behind the Tweets

Chris AniszczykHead of Open Source

@cra

Page 5: The Open Source... Behind the Tweets

Twitter runs on Open Source

Page 6: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanoutwrite search batch fin

Page 7: The Open Source... Behind the Tweets

Tweet!

Page 8: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

https://dev.twitter.com/rest/reference/post/statuses/updateYour first stop as a tweet: Twitter Front End (TFE)

A fancy reverse proxy for HTTP traffic built on the JVMHandles authentication, rate limits and more!Powered by the open source project Netty: http://netty.io

tweet fanoutwrite search batch fin

Page 9: The Open Source... Behind the Tweets

Netty at TwitterNetty is open source Java NIO framework

Used heavily at Twitter Healthy adopter community:

http://netty.io/wiki/adopters.html !

Cloudhopper sends billions of SMS messages per month using Netty

https://github.com/twitter/cloudhopper-smpp !

We contributed SPDY support to Netty: http://netty.io/news/2012/02/04/3-3-1-spdy.html

*https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead

Page 10: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

Twitter backend architecture is *service-oriented (on the JVM)Core services are built on top of Finagle (using an API framework)

Finagle is written in Scala and built on top of Nettyhttps://github.com/twitter/finagle

tweet fanoutwrite search batch

*http://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture

fin

Page 11: The Open Source... Behind the Tweets

Finagle at TwitterWhy Scala?

Scala enables succinct expression (vs Java) Less typing is less reading; brevity enhances clarity Two open source Scala/Finagle guides from Twitter:

https://twitter.github.io/effectivescala/ https://twitter.github.io/scala_school/

!

Finagle is our fault tolerant protocol-agnostic RCP framework built on Netty

Emphasizes services modularity via async futures Handles failover semantics, metrics, logging etc…

*https://blog.twitter.com/2014/netty-at-twitter-with-finagle

Page 12: The Open Source... Behind the Tweets

Finagle Service Example// #1 Create a client for each service!val timelineSvc = Thrift.newIface[TimelineService](...)!val tweetSvc = Thrift.newIface[TweetService](...)!val authSvc = Thrift.newIface[AuthService](...)! !// #2 Create new Filter to authenticate incoming requests!val authFilter = Filter.mk[Req, AuthReq, Res, Res] { (req, svc) =>! authSvc.authenticate(req) flatMap svc(_)!}! !// #3 Create a service to convert an authenticated timeline request to a json response!val apiService = Service.mk[AuthReq, Res] { req =>! timelineSvc(req.userId) flatMap {tl =>! val tweets = tl map tweetSvc.getById(_)! Future.collect(tweets) map tweetsToJson(_) }! }! }!!// #4 Start a new HTTP server on port 80 using the authenticating filter and our service!Http.serve(":80", authFilter andThen apiService)!

Page 13: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanoutwrite search batch fin

Page 14: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

Tweets need to be stored somewhere (via a Finagle-based core service)TBird: persistent storage for tweets

Built originally on Gizzard: https://github.com/twitter/gizzardTweets stored in sharded and replicated MySQL

TFlock: track relations between users and tweetsBuilt originally on FlockDB: https://github.com/twitter/flockdb

tweet fanoutwrite search batch fin

Page 15: The Open Source... Behind the Tweets

MySQL at TwitterMaintain a public fork of v5.5/v5.6

Goal is to“work” with upstream https://github.com/twitter/mysql

Co-founded the WebScaleSQL.org effort

Page 16: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanoutwrite search batch fin

Page 17: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

When a tweet is generated it needs to be written to all relevant timelinesTimelines are essentially a list of tweet ids (heavily cached)Fanout is the process where tweets are delivered to timelinesFor caching we rely on the open source project Redis

https://github.com/antirez/redis

tweet fanoutwrite search batch fin

Page 18: The Open Source... Behind the Tweets

Redis at TwitterRedis is used for caching timelines and more!

Added custom logging, data structures We are working to upstream some changes…

@thinkingfish gave a fantastic talk on this: https://www.youtube.com/watch?v=rP9EKvWt0zo

!

Open Source Proxy for Redis: Twemproxy https://github.com/twitter/twemproxy Used by Vine, Pinterest, Wikimedia, Snapchat etc…

Page 19: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanoutwrite search batch fin

Page 20: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

Everyone searches for tweets: https://dev.twitter.com/rest/public/searchIn fact, one of the most heavily trafficked search engines in the world

Back in the day, Twitter search was built on MySQLToday, Twitter search is an optimized real-time search/indexing technology

Powered by Apache Lucene: http://lucene.apache.org!

!

tweet fanoutwrite search batch fin

Page 21: The Open Source... Behind the Tweets

Lucene (earlybird) at TwitterEarlybird* is Twitter’s real-time search engine built on top of Apache Lucene !

We optimized Lucene (cut corners) to handle tweets only since that’s all we do

e.g., less space: 140 characters only need 8 bits !

Read about Blender, our search front-end https://blog.twitter.com/2011/twitter-search-now-3x-faster

*http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf

Page 22: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanoutwrite search batch fin

Page 23: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanoutwrite search batch

Hadoop is used for many things at Twitter, like counting words :)scribe logs, batch processing, recommendations, trends, user modeling and more!10,000+ hadoop servers, 100,000+ daily hadoop jobs,10M+ daily hadoop tasks

Parquet is a columnar storage format for Hadoophttps://parquet.incubator.apache.org

Scalding is our Scala DSL for writing Hadoop jobshttps://github.com/twitter/scalding

!

!

fin

Page 24: The Open Source... Behind the Tweets

Parquet/Scalding at TwitterParquet* is a columnar storage format

Initially a collaboration between Twitter/Cloudera Inspired by Google Dremel paper** Now at Apache: http://parquet.incubator.apache.org/

!

Scalding built on top of Scala and Cascading https://github.com/Cascading/cascading Makes it easier* to write Hadoop jobs (using Scala)

*https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

Page 25: The Open Source... Behind the Tweets

Scalding Exampleimport com.twitter.scalding._!!// can’t have a Hadoop example without word count!!class WordCountJob(args : Args) extends Job(args) {! TextLine( args("input") )! .flatMap('line -> 'word) { line : String => line.split("""\s+""") }! .groupBy('word) { _.size }! .write( Tsv( args("output") ) )!}

https://github.com/twitter/scalding/wiki/Rosetta-Code

Page 26: The Open Source... Behind the Tweets

Life of a TweetWhat open source technology do we use behind the scenes when we tweet?

tweet fanout finwrite search batch

Page 27: The Open Source... Behind the Tweets

Sharing is caring, contribute!Lets all make Twitter better! !

!

!

opensource.twitter.com https://github.com/twitter

Page 28: The Open Source... Behind the Tweets

New Open Source API SamplesHack on the samples and improve them! https://github.com/twitterdev (t.co/code)

!

Also, later today check out the lightning talk by Andrew Noonan later about the “Twitter’s developer toolbox” !

Page 29: The Open Source... Behind the Tweets

Thank You

Page 30: The Open Source... Behind the Tweets

Q&AThe Open Source Behind the Tweets http://opensource.twitter.com !

Hope you learned something new! Come see us at the @TwitterOSS Booth!

Chris Aniszczyk (@cra)

Page 31: The Open Source... Behind the Tweets

Resourceshttps://opensource.twitter.com https://github.com/twitter/finagle https://github.com/twitter/zipkin https://github.com/twitter/scalding https://github.com/twitter/mysql https://github.com/twitter/twemproxy https://twitter.github.io/scala_school http://webscalesql.org http://mesos.apache.org http://parquet.incubator.apache.org !

Page 32: The Open Source... Behind the Tweets

October 22, 2014 #twitterflight

Backup Slides

Page 33: The Open Source... Behind the Tweets

Where does it all run?Main concept: Datacenter as a computer Aggregation and not virtualization !

!

!

mesos.apache.orgaurora.incubator.apache.orgmasters

framework

offer hostname 4 CPUs 4 GB RAM

offer hostname 4 CPUs 4 GB RAM

offer hostname 4 CPUs 4 GB RAM

offer hostname 4 CPUs 4 GB RAM

Page 34: The Open Source... Behind the Tweets

Profiles

Search / S&R

Trends / S&R

Home timeline / TLS

PTw / Ads

Contact import / Growth

Compose

DMs / Social Discover / S&R

WtF / S&R