Slide 1

Slide 1 text

Open Source and The Twitter Stack Chris Aniszczyk (@cra) http://aniszczyk.org 10th International Conference on Open Source Systems #OSS2014

Slide 2

Slide 2 text

Reminder: Please Tweet! @cra #OSS2014

Slide 3

Slide 3 text

Agenda Twitter Scale Open Source Philosophy History and Evolution Twitter Stack Sampling Concluding Thoughts

Slide 4

Slide 4 text

What is Twitter? Twitter is a public real-time information network that connects you to what you find interesting The heart of Twitter: tweets

Slide 5

Slide 5 text

sizeof(1 tweet) = 140 characters 200 bytes doesn’t sound like much? ≈

Slide 6

Slide 6 text

ᴿᴿᴿᴿᴿ‿⁂ᴿᴿᴿᴿᴿᴿᴿᴿ ᴿᴿ‿‬‬‬‬‬⁀ᴿᴿᴿᴿᴿᴿ ᴿᴿ‬‬‬‬‬⁂ᴿᴿTHANKᴿᴿ ᴿᴿ‬‬‬‬‬ᴿᴿᴿᴿYOUᴿᴿ ᴿᴿ⁁‬‬‬‬⁀ᴿᴿᴿSTEVEᴿ ᴿᴿᴿ⁁‬⁂⁁‬⁂ᴿᴿᴿᴿᴿᴿ #ThankYouSteve #TwitterArt 6 Oct via web Favorite Retweet Reply @tw1tt3rart TW1TT3Rart “Creativity comes from constraint” “Brevity is the soul of the wit”

Slide 7

Slide 7 text

What is the scale of Twitter?

Slide 8

Slide 8 text

500,000,000 Tweets / Day 3,500,000,000 Tweets / Week

Slide 9

Slide 9 text

3.5B Tweets / Week 6000+ Tweets / Second (steady state) However, there are peaks! ≈

Slide 10

Slide 10 text

Miyazaki 2011 25,088 TPS (NYE 2013: 33,338 TPS) όϧε! (“Death to Twitter”)

Slide 11

Slide 11 text

Miyazaki 2013 25,088 TPS 143,199 TPS https://blog.twitter.com/2013/new-tweets-per-second-record-and-how όϧε! (“Death to Twitter”)

Slide 12

Slide 12 text

Open Source Craft and Culture How we roll...

Slide 13

Slide 13 text

Open Source Craft (operating principles) Use Open Assume Open Define Secret Sauce Measure Everything Default to GitHub Default to Permissive Acquire and Open Pay it Forward

Slide 14

Slide 14 text

Use Open Use and benchmark open source software by default. When starting a new initiative, always evaluate open source options before going to reinvent the wheel. (e.g., if redis doesn’t work for you, you better have solid evidence)

Slide 15

Slide 15 text

Twitter Runs on Open Source

Slide 16

Slide 16 text

Define Secret Sauce Don’t open source anything that represents a core business value. Define your secret sauce so there’s a shared understanding that can guide decisions. Embed this secret sauce within your culture and company via training.

Slide 17

Slide 17 text

Secret Sauce, what is it? What’s yours?

Slide 18

Slide 18 text

If you know your secret sauce...

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Assume Open Assume that what you are developing will be opened in the future. Pretend the whole world will be watching. Use reasonable third party dependencies to prevent pain down the road. (we mostly use Apache’s Third Party Guidelines as a starting point)

Slide 22

Slide 22 text

Default to GitHub The GitHub community is the largest open source community, with over three million users. You would be stupid to ignore that fact. Embrace social coding tools to lower the barrier to contribution and participation.

Slide 23

Slide 23 text

Foundations are Good* We just prefer not to default to them. We view them as a place for stable projects that grow into maturity, not to incubate new projects. Our goal is to gain traction first as fast as possible. If not, fail fast and carry on.

Slide 24

Slide 24 text

Default to Permissive

Slide 25

Slide 25 text

Be Permissive For outbound open source software, we default to OSI permissive licenses (the ALv2 in the majority of cases). We do this so we can maximize adoption and participation, which we favor instead of control.

Slide 26

Slide 26 text

See http://blogs.the451group.com/opensource/2011/12/19/the-future-of-commercial-open-source-business-strategies/ Embrace the Trend

Slide 27

Slide 27 text

See http://antirez.com/news/48 Notes from Antirez (BSD) “First of all, open source for me is not a way to contribute to the free software movement, but to contribute to humanity. This means a lot of things, for instance I don't care about what people do with my code, nor if they'll release back their modifications. I simply want people to use my code in one way or the other. Especially I want people to have fun, learn new stuff, and make money with my code. For me other people making money out of something I wrote is not something that I lost, it is something that I gained.”

Slide 28

Slide 28 text

Acquire and Open* Include open sourcing software in M&A discussions, especially if you’re mainly acquiring talent or shelving the product. There’s no need for software to go to waste.

Slide 29

Slide 29 text

Acquire and Open: RedPhone See https://github.com/WhisperSystems/RedPhone

Slide 30

Slide 30 text

Acquire and Open: Clutch.IO See http://engineering.twitter.com/2012/10/open-sourcing-clutchio.html See http://www.prweb.com/releases/2012/10/prweb10067693.htm

Slide 31

Slide 31 text

Measure Everything If you can’t measure what you’re doing, you have no idea what you’re doing. We measure everything inside of Twitter (affectionately called birdbrain) and make it accessible to everyone.

Slide 32

Slide 32 text

Pay it Forward Support open source organizations and projects important to your business, it’s the right and smart thing to do. This can be financially or simply staffing projects that are strategic to you.

Slide 33

Slide 33 text

Open Source Craft* Use Open Assume Open Define Secret Sauce Measure Everything Default to GitHub Default to Permissive Acquire and Open Pay it Forward Note: This fits in a tweet

Slide 34

Slide 34 text

Twistory Evolving the Twitter Stack

Slide 35

Slide 35 text

2006: A simple idea...

Slide 36

Slide 36 text

Routing Presentation Logic Storage Monorail (Ruby on Rails) MySQL

Slide 37

Slide 37 text

2008: Growing Pains

Slide 38

Slide 38 text

Routing Presentation Logic Storage Monorail (Ruby on Rails) MySQL Tweet Store Flock Redis Memcache Cache

Slide 39

Slide 39 text

2009+: Crazy Growth 2006 2009 2013 250M 500M 2010

Slide 40

Slide 40 text

2010: World Cup Woes https://blog.twitter.com/2010/2010-world-cup-global-conversation http://bits.blogs.nytimes.com/2010/06/15/twitter-suffers-from-a-number-of-technical-glitches

Slide 41

Slide 41 text

What was wrong? Fragile monolithic Rails code base: managing raw database and memcache connections to rendering the site and presenting the public APIs Throwing machines at the problem: instead of engineering solutions Trapped in an optimization corner: trade off readability and flexibility for performance

Slide 42

Slide 42 text

Whale Hunting Expeditions We organized archeology digs and whale hunting expeditions to understand large scale failures

Slide 43

Slide 43 text

Re-envision the system? We wanted big infra wins: in performance, reliability and efficiency (reduce machines to run Twitter by 10x) Failure is inevitable in distributed systems: we wanted to isolate failures across our infrastructure Cleaner boundaries with related logic in one place: desire for a loosely coupled services oriented model at the systems level

Slide 44

Slide 44 text

Ruby VM Reflection Started to evaluate our front end server tier: CPU, RAM and network Rails machines were being pushed to the limit: CPU and RAM maxed but not network (200-300 requests/host) Twitter’s usage was growing: it was going to take a lot of machines to keep up with the growth curve

Slide 45

Slide 45 text

JVM Experimentation We started to experiment with the JVM... Search (Java via Lucene) http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html FlockDB: Social Graph (Scala) https://blog.twitter.com/2010/introducing-flockdb https://github.com/twitter/flockdb ...and we liked it, enamored by JVM performance! We weren’t the only ones either: http://www.slideshare.net/pcalcado/from-a-monolithic-ruby-on-rails-app-to-the-jvm

Slide 46

Slide 46 text

The JVM Solution Level of trust with the JVM with previous experience JVM is a mature and world class platform Huge mature ecosystem of libraries Polyglot possibilities (Java, Scala, Clojure, etc)

Slide 47

Slide 47 text

Decomposing the Monolith Created services based on our core nouns: Tweet service User service Timeline service DM service Social Graph service ....

Slide 48

Slide 48 text

Routing Presentation Logic Storage MySQL Tweet Store Flock Redis Memcached Cache TFE (reverse proxy) Monorail Tweet Service User Service Timeline Service SocialGraph Service DM Service User Store API Web Search Feature X Feature Y HTTP THRIFT THRIFT*

Slide 49

Slide 49 text

Twitter Stack A peak at some of our technology Finagle, Zipkin, Scalding and Mesos

Slide 50

Slide 50 text

Services: Concurrency is Hard Decomposing the monolith: each team took slightly different approaches to concurrency Different failure semantics across teams: no consistent back pressure mechanism Failure domains informed us of the importance of having a unified client/server library: deal with failure strategies and load balancing

Slide 51

Slide 51 text

Hello Finagle! http://twitter.github.io/finagle Used by Twitter, Nest, Soundcloud, Foursquare and more!

Slide 52

Slide 52 text

Finagle Programming Model Takes care of: service discovery, load balancing, retrying, connection pooling, stats collection, distributed tracing Future[T]: modular, composable, async, non-blocking I/O http://twitter.github.io/effectivescala/#Concurrency

Slide 53

Slide 53 text

Tracing with Zipkin Zipkin hooks into the transmission logic of Finagle and times each service operation; gives you a visual representation where most of the time to fulfill a request went. https://github.com/twitter/zipkin

Slide 54

Slide 54 text

Hadoop with Scalding Services receive a ton of traffic and generate a ton of use log and debugging entries. @Scalding is a open source Scala library that makes it easy to specify MapReduce jobs with the benefits of functional programming! https://github.com/twitter/scalding

Slide 55

Slide 55 text

Counting Words with Java*

Slide 56

Slide 56 text

Counting Words with Scalding https://github.com/twitter/scalding/wiki/Rosetta-Code

Slide 57

Slide 57 text

Data Center Evils The evils of single tenancy and static partitioning Different jobs... different utilization profiles... Can we do better? STATIC PARTITIONING DATACENTER 0% 33% 0% 33% 0% 33%

Slide 58

Slide 58 text

Borg and The Birth of Mesos Google was generations ahead with Borg/Omega “The Datacenter as a Computer” http://research.google.com/pubs/pub35290.html (2009) engineers focus on resources needed; mixed workloads possible Learn from Google and work w/ university research! http://wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos DATACENTER

Slide 59

Slide 59 text

Mesos, Linux and cgroups Apache Mesos: kernel of the data center obviates the need for virtual machines* isolation via Linux cgroups (CPU, RAM, network, FS) reshape clusters dynamically based on resources multiple frameworks; scalability to 10,000s of nodes

Slide 60

Slide 60 text

Data Center Computing Reduce CapEx/OpEx via efficient utilization of HW http://mesos.apache.org 0% 33% 0% 33% 0% 33% 0% 25% 50% 75% 100% reduces latency! reduces CapEx and OpEx!

Slide 61

Slide 61 text

How did it all turn out? Not bad... not bad at all... Where did the fail whale go?

Slide 62

Slide 62 text

Site Success Rate Today :) 2006 2010 2014 World Cup not a lot of traffic Off the monorail 99._% 100%

Slide 63

Slide 63 text

Performance Today :)

Slide 64

Slide 64 text

Growth Continues Today... 2500+ Employees Worldwide 50% Employees are Engineers 255M+ Active Users 500M+ Tweets per Day 35+ Languages Supported 76% Active Users are on Mobile 100+ Open Source Projects

Slide 65

Slide 65 text

Concluding Thoughts Lessons Learned

Slide 66

Slide 66 text

Lesson #1 Embrace open source best of breed solutions are open these days learn from your peers code and university research don’t only consume, give back to enrich ecosystem: http://opensource.twitter.com

Slide 67

Slide 67 text

Lesson #2 Incremental change always wins increase chance of success by making small changes small changes add up with minimized risk loosely coupled micro services work

Slide 68

Slide 68 text

Lesson #3 “Data center as a computer” is the future direction of infrastructure Efficient use of hardware saves money Better programming model (large cluster as single resource) Check out Apache Mesos: http://mesos.apache.org

Slide 69

Slide 69 text

Thanks for listening! (hope you learned something new) remember, feel free to tweet me #GWOCon @cra / @TwitterOSS [email protected]

Slide 70

Slide 70 text

Resources https://github.com/twitter/finagle https://github.com/twitter/zipkin https://github.com/twitter/scalding http://mesos.apache.org http://wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos http://mesosphere.io/2013/09/26/docker-on-mesos/ http://typesafe.com/blog/play-framework-grid-deployment-with-mesos http://strata.oreilly.com/2013/09/how-twitter-monitors-millions-of-time-series.html http://research.google.com/pubs/pub35290.html http://nerds.airbnb.com/hadoop-on-mesos/ http://www.youtube.com/watch?v=0ZFMlO98Jk