The Open Source... Behind the Tweets

#twitterﬂight

October 22, 2014 #twitterﬂight The Open Source… Behind the Tweets

Open source is everywhere! On your phone, in your car…
and within Twitter! ! http://www4.mercedes-benz.com/manual-cars/ba/foss/content/en/assets/FOSS_licences.pdf iOS: General->About->Legal->Legal Notices ! Vine: General->About->Legal !

Chris Aniszczyk Head of Open Source @cra

Twitter runs on Open Source

Life of a Tweet What open source technology do we
use behind the scenes when we tweet? tweet fanout write search batch ﬁn

Tweet!

use behind the scenes when we tweet? https://dev.twitter.com/rest/reference/post/statuses/update Your first stop as a tweet: Twitter Front End (TFE) A fancy reverse proxy for HTTP traffic built on the JVM Handles authentication, rate limits and more! Powered by the open source project Netty: http://netty.io tweet fanout write search batch fin

Netty at Twitter Netty is open source Java NIO framework
Used heavily at Twitter Healthy adopter community: http://netty.io/wiki/adopters.html ! Cloudhopper sends billions of SMS messages per month using Netty https://github.com/twitter/cloudhopper-smpp ! We contributed SPDY support to Netty: http://netty.io/news/2012/02/04/3-3-1-spdy.html *https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead

use behind the scenes when we tweet? Twitter backend architecture is *service-oriented (on the JVM) Core services are built on top of Finagle (using an API framework) Finagle is written in Scala and built on top of Netty https://github.com/twitter/ﬁnagle tweet fanout write search batch *http://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture ﬁn

Finagle at Twitter Why Scala? Scala enables succinct expression (vs
Java) Less typing is less reading; brevity enhances clarity Two open source Scala/Finagle guides from Twitter: https://twitter.github.io/eﬀectivescala/ https://twitter.github.io/scala_school/ ! Finagle is our fault tolerant protocol- agnostic RCP framework built on Netty Emphasizes services modularity via async futures Handles failover semantics, metrics, logging etc… *https://blog.twitter.com/2014/netty-at-twitter-with-ﬁnagle

Finagle Service Example // #1 Create a client for each
service! val timelineSvc = Thrift.newIface[TimelineService](...)! val tweetSvc = Thrift.newIface[TweetService](...)! val authSvc = Thrift.newIface[AuthService](...)! ! // #2 Create new Filter to authenticate incoming requests! val authFilter = Filter.mk[Req, AuthReq, Res, Res] { (req, svc) =>! authSvc.authenticate(req) flatMap svc(_)! }! ! // #3 Create a service to convert an authenticated timeline request to a json response! val apiService = Service.mk[AuthReq, Res] { req =>! timelineSvc(req.userId) flatMap {tl =>! val tweets = tl map tweetSvc.getById(_)! Future.collect(tweets) map tweetsToJson(_) }! }! }! ! // #4 Start a new HTTP server on port 80 using the authenticating filter and our service! Http.serve(":80", authFilter andThen apiService)!

use behind the scenes when we tweet? Tweets need to be stored somewhere (via a Finagle-based core service) TBird: persistent storage for tweets Built originally on Gizzard: https://github.com/twitter/gizzard Tweets stored in sharded and replicated MySQL TFlock: track relations between users and tweets Built originally on FlockDB: https://github.com/twitter/ﬂockdb tweet fanout write search batch ﬁn

MySQL at Twitter Maintain a public fork of v5.5/v5.6 Goal
is to“work” with upstream https://github.com/twitter/mysql Co-founded the WebScaleSQL.org eﬀort

use behind the scenes when we tweet? When a tweet is generated it needs to be written to all relevant timelines Timelines are essentially a list of tweet ids (heavily cached) Fanout is the process where tweets are delivered to timelines For caching we rely on the open source project Redis https://github.com/antirez/redis tweet fanout write search batch ﬁn

Redis at Twitter Redis is used for caching timelines and
more! Added custom logging, data structures We are working to upstream some changes… @thinkingﬁsh gave a fantastic talk on this: https://www.youtube.com/watch?v=rP9EKvWt0zo ! Open Source Proxy for Redis: Twemproxy https://github.com/twitter/twemproxy Used by Vine, Pinterest, Wikimedia, Snapchat etc…

use behind the scenes when we tweet? Everyone searches for tweets: https://dev.twitter.com/rest/public/search In fact, one of the most heavily traﬃcked search engines in the world Back in the day, Twitter search was built on MySQL Today, Twitter search is an optimized real-time search/indexing technology Powered by Apache Lucene: http://lucene.apache.org ! ! tweet fanout write search batch ﬁn

Lucene (earlybird) at Twitter Earlybird* is Twitter’s real-time search engine
built on top of Apache Lucene ! We optimized Lucene (cut corners) to handle tweets only since that’s all we do e.g., less space: 140 characters only need 8 bits ! Read about Blender, our search front-end https://blog.twitter.com/2011/twitter-search-now-3x-faster *http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf

use behind the scenes when we tweet? tweet fanout write search batch Hadoop is used for many things at Twitter, like counting words :) scribe logs, batch processing, recommendations, trends, user modeling and more! 10,000+ hadoop servers, 100,000+ daily hadoop jobs,10M+ daily hadoop tasks Parquet is a columnar storage format for Hadoop https://parquet.incubator.apache.org Scalding is our Scala DSL for writing Hadoop jobs https://github.com/twitter/scalding ! ﬁn

Parquet/Scalding at Twitter Parquet* is a columnar storage format Initially
a collaboration between Twitter/Cloudera Inspired by Google Dremel paper** Now at Apache: http://parquet.incubator.apache.org/ ! Scalding built on top of Scala and Cascading https://github.com/Cascading/cascading Makes it easier* to write Hadoop jobs (using Scala) *https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

Scalding Example import com.twitter.scalding._! ! // can’t have a Hadoop
example without word count!! class WordCountJob(args : Args) extends Job(args) {! TextLine( args("input") )! .flatMap('line -> 'word) { line : String => line.split("""\s+""") }! .groupBy('word) { _.size }! .write( Tsv( args("output") ) )! } https://github.com/twitter/scalding/wiki/Rosetta-Code

use behind the scenes when we tweet? tweet fanout ﬁn write search batch

Sharing is caring, contribute! Lets all make Twitter better! !
! ! opensource.twitter.com https://github.com/twitter

New Open Source API Samples Hack on the samples and
improve them! https://github.com/twitterdev (t.co/code) ! Also, later today check out the lightning talk by Andrew Noonan later about the “Twitter’s developer toolbox” !

Thank You

Q&A The Open Source Behind the Tweets http://opensource.twitter.com ! Hope
you learned something new! Come see us at the @TwitterOSS Booth! Chris Aniszczyk (@cra)

Resources https://opensource.twitter.com https://github.com/twitter/ﬁnagle https://github.com/twitter/zipkin https://github.com/twitter/scalding https://github.com/twitter/mysql https://github.com/twitter/twemproxy https://twitter.github.io/scala_school http://webscalesql.org http://mesos.apache.org
http://parquet.incubator.apache.org !

October 22, 2014 #twitterﬂight Backup Slides

Where does it all run? Main concept: Datacenter as a
computer Aggregation and not virtualization ! ! ! mesos.apache.org aurora.incubator.apache.org masters framework offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM

Proﬁles Search / S&R Trends / S&R Home timeline /
TLS PTw / Ads Contact import / Growth Compose DMs / Social Discover / S&R WtF / S&R

The Open Source... Behind the Tweets

The Open Source... Behind the Tweets

Chris Aniszczyk

More Decks by Chris Aniszczyk

Other Decks in Technology

Featured

Transcript

#twitterﬂight

October 22, 2014 #twitterﬂight The Open Source… Behind the Tweets

Open source is everywhere! On your phone, in your car…

Chris Aniszczyk Head of Open Source @cra

Twitter runs on Open Source

Life of a Tweet What open source technology do we

Tweet!

Life of a Tweet What open source technology do we

Netty at Twitter Netty is open source Java NIO framework

Life of a Tweet What open source technology do we

Finagle at Twitter Why Scala? Scala enables succinct expression (vs

Finagle Service Example // #1 Create a client for each

Life of a Tweet What open source technology do we

Life of a Tweet What open source technology do we

MySQL at Twitter Maintain a public fork of v5.5/v5.6 Goal

Life of a Tweet What open source technology do we

Life of a Tweet What open source technology do we

Redis at Twitter Redis is used for caching timelines and

Life of a Tweet What open source technology do we

Life of a Tweet What open source technology do we

Lucene (earlybird) at Twitter Earlybird* is Twitter’s real-time search engine

Life of a Tweet What open source technology do we

Life of a Tweet What open source technology do we

Parquet/Scalding at Twitter Parquet* is a columnar storage format Initially

Scalding Example import com.twitter.scalding._! ! // can’t have a Hadoop

Life of a Tweet What open source technology do we

Sharing is caring, contribute! Lets all make Twitter better! !

New Open Source API Samples Hack on the samples and

Thank You

Q&A The Open Source Behind the Tweets http://opensource.twitter.com ! Hope

October 22, 2014 #twitterﬂight Backup Slides

Where does it all run? Main concept: Datacenter as a

Proﬁles Search / S&R Trends / S&R Home timeline /