Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finagle, linkerd, and Mesos
Magic Operability Sprinkles for Microservices

Finagle, linkerd, and Mesos
Magic Operability Sprinkles for Microservices

Finagle and Mesos are two core technologies used by Twitter and many other companies to scale application infrastructure to high traffic workloads. In this talk, we describe how these two technologies work together to form applications that are both highly scalable and resilient to failure. We introduce linkerd, an open-source proxy form of Finagle, which extends Finagle's operational model to non-JVM or polyglot microservices. Finally, we show how linkerd can be used to "wrap" applications running on Mesos to provide higher-level, service-based semantics around scalability, reliability, and fault-tolerance for microservices---even in the presence of unpredictable traffic volumes and unreliable hardware.

Oliver Gould

June 02, 2016
Tweet

More Decks by Oliver Gould

Other Decks in Programming

Transcript

  1. Finagle, linkerd, and Mesos
 Magic Operability Sprinkles for Microservices oliver

    gould
 cto, buoyant MesosCon North America, June 2 2016 from
  2. oliver gould • founding cto @ buoyant
 open-source microservice infrastructure

    • previously, tech lead @ twitter:
 observability, traffic • core contributor: finagle • creator: linkerd • likes: dogs • dislikes: being woken up by a pager @olix0r
 [email protected]
  3. overview • 2010: A Failwhale Odyssey • Automating the Datacenter

    • Microservices: A Silver Bullet • Finagle: The Once and Future Layer 5 • Introducing linkerd • Demo • Q&A
  4. Twitter, 2010 107 users 107 tweets/day 102 engineers 101 services

    101 deploys/week 102 hosts 0 datacenters 101 user-facing outages/week https://blog.twitter.com/2010/measuring-tweets
  5. timelines Aurora (or Marathon, or …) host Mesos host host

    host host host users notifications x800 x300 x1000
  6. Resilience is an imperative: our software runs on the truly

    dismal computers we call datacenters. Besides being heinously
 complex… they are unreliable and prone to
 operator error. Marius Eriksen @marius
 RPC Redux
  7. resilience in microservices software you didn’t write hardware you can’t

    touch network you can’t configure break in new and surprising ways and your customers shouldn’t notice
  8. datacenter [1] physical [2] link [3] network [4] transport aurora,

    marathon, … mesos 
 canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application rpc [5] session [6] presentation json, protobuf, thrift, … http/2, mux, …
  9. programming finagle val users = Thrift.newIface[UserSvc](“/s/users”)
 val timelines = Thrift.newIface[TimelineSvc](“/s/timeline”)

    Http.serve(“:8080”, Service.mk[Request, Response] { req => for { user <- users.get(userReq(req)) timeline <- timelines.get(user) } yield renderHTML(user, timeline) })
  10. operating finagle transport security service discovery circuit breaking backpressure deadlines

    retries tracing metrics keep-alive multiplexing load balancing per-request routing service-level objectives Observe Session timeout Retries Request draining Load balancer Monitor Observe Trace Failure accrual Request timeout Pool Fail fast Expiration Dispatcher
  11. layer 5 naming applications refer to logical names
 requests are

    bound to concrete names
 delegations express routing /s/users /#/io.l5d.zk/prod/users /s => /#/io.l5d.zk/prod/http
  12. “It’s slow”
 is the hardest problem you’ll ever debug. Jeff

    Hodges @jmhodges
 Notes on Distributed Systems for Young Bloods
  13. lb algorithms: • round-robin • fewest connections • queue depth

    • exponentially-weighted moving average (ewma) • aperture load balancing at layer 5
  14. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db
  15. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db 800ms! 600ms!
  16. github.com/buoyantio/linkerd microservice rpc proxy layer-5 router aka l5d built on

    finagle & netty pluggable http, thrift, … etcd, consul, kubernetes, marathon, zookeeper, … …
  17. magic resiliency sprinkles transport security service discovery circuit breaking backpressure

    deadlines retries tracing metrics keep-alive multiplexing load balancing per-request routing service-level objectives Service B instance linkerd Service C instance linkerd Service A instance linkerd
  18. namerd released in March centralized routing policy delegates logical names

    to service discovery pluggable etcd kubernetes zookeeper …
  19. master dc/os marathon zookeeper node node public node node …

    linkerd linkerd linkerd linkerd ELB ELB namerd
  20. master dc/os marathon zookeeper node node public node node …

    linkerd linkerd linkerd linkerd ELB ELB namerd web (x1) gen (x3) word (x3) word-growthhack (x3) gen-growthhack (x3)
  21. linkerd roadmap • Netty4.1 • HTTP/2+gRPC linkerd#174 • TLS client

    certs, SPIFEE • Deadlines • Announcers • All configurable everything