Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finagle, linkerd, and Apache Mesos: Twitter-style microservices at scale

Finagle, linkerd, and Apache Mesos: Twitter-style microservices at scale

Finagle (Twitter's Apache-licensed RPC stack) and Apache Mesos are two core technologies used by Twitter to scale its multi-service architecture to high-volume traffic loads. In this talk, we describe how Twitter used Finagle and Mesos together to address the challenges of scaling its application. We introduce linkerd, an Apache-licensed proxy form of Finagle, which extends Finagle's operational model to non-JVM or polyglot multi-service applications. Finally, we show how linkerd can be used to "wrap" applications running in Apache Mesos to provide higher-level, service-based semantics around scalability, reliability, and fault-tolerance for multi-service or microservice applications---even in the presence of high traffic loads and unreliable hardware.

Oliver Gould

May 12, 2016
Tweet

More Decks by Oliver Gould

Other Decks in Technology

Transcript

  1. oliver gould • cto @ buoyant
 open-source microservice infrastructure •

    previously, tech lead @ twitter:
 observability, traffic • core contributor: finagle • creator: linkerd • loves: dogs @olix0r
 [email protected]
  2. overview • 2010: A Failwhale Odyssey • Automating the Datacenter

    • Microservices: A Silver Bullet • Finagle: The Once and Future Layer 5 • Introducing linkerd • Demo • Q&A
  3. Twitter, 2010 107 users 107 tweets/day 102 engineers 101 services

    101 deploys/week 102 hosts 0 datacenters 101 user-facing outages/week https://blog.twitter.com/2010/measuring-tweets
  4. The Monorail, 2010 103 of RPS 102 of RPS/host 101

    of RPS/process hardware lb the monorail mysql memcache kestrel
  5. timelines Aurora (or Marathon, or …) host Mesos host host

    host host host users notifications x800 x300 x1000
  6. Resilience is an imperative: our software runs on the truly

    dismal computers we call datacenters. Besides being heinously
 complex… they are unreliable and prone to
 operator error. Marius Eriksen @marius
 RPC Redux
  7. resilience in microservices software you didn’t write hardware you can’t

    touch network you can’t configure break in new and surprising ways and your customers shouldn’t notice
  8. datacenter [1] physical [2] link [3] network [4] transport aurora,

    marathon, … mesos 
 canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application rpc [5] session [6] presentation json, protobuf, thrift, … http/2, mux, …
  9. programming finagle val users = Thrift.newIface[UserSvc](“/s/users”)
 val timelines = Thrift.newIface[TimelineSvc](“/s/timeline”)

    Http.serve(“:8080”, Service.mk[Request, Response] { req => for { user <- users.get(userReq(req)) timeline <- timelines.get(user) } yield renderHTML(user, timeline) })
  10. operating finagle transport security service discovery circuit breaking backpressure deadlines

    retries tracing metrics keep-alive multiplexing load balancing per-request routing service-level objectives Observe Session timeout Retries Request draining Load balancer Monitor Observe Trace Failure accrual Request timeout Pool Fail fast Expiration Dispatcher
  11. “It’s slow”
 is the hardest problem you’ll ever debug. Jeff

    Hodges @jmhodges
 Notes on Distributed Systems for Young Bloods
  12. lb algorithms: • round-robin • fewest connections • queue depth

    • exponentially-weighted moving average (ewma) • aperture load balancing at layer 5
  13. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db
  14. layer 5 routing applications refer to logical names
 requests are

    bound to concrete names
 delegations express routing /s/users /io.l5d.zk/prod/users /s => /io.l5d.zk/prod/http
  15. github.com/buoyantio/linkerd microservice rpc proxy layer-5 router aka l5d built on

    finagle & netty pluggable http, thrift, … etcd, consul, kubernetes, marathon, zookeeper, … …
  16. magic resiliency sprinkles transport security service discovery circuit breaking backpressure

    deadlines retries tracing metrics keep-alive multiplexing load balancing per-request routing service-level objectives Service B instance linkerd Service C instance linkerd Service A instance linkerd
  17. namerd released in March centralized routing policy delegates logical names

    to service discovery pluggable etcd kubernetes zookeeper …
  18. master dc/os marathon zookeeper node node public node node …

    linkerd linkerd linkerd linkerd ELB ELB namerd
  19. master dc/os marathon zookeeper node node public node node …

    linkerd linkerd linkerd linkerd ELB ELB namerd web (x1) gen (x3) word (x3) word-growthhack (x3)
  20. linkerd roadmap • Netty4.1 • HTTP/2+gRPC linkerd#174 • TLS client

    certs • Richer routing policies • Announcers • More configurable everything