Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The once and future layer 5: Resilient, Twitter-style microservices

625beff353c7c2b068b26d1a57566e05?s=47 Oliver Gould
September 22, 2016

The once and future layer 5: Resilient, Twitter-style microservices

What is required to operate microservices at scale? Beyond containers, schedulers, and frameworks, what is actually required to turn hundreds of services, tens of thousands of machines, and millions of requests per second into a unified, performant application? Oliver Gould explores the evolution of Twitter’s stack from monolith to highly distributed microservices and the surprising glue that held it all together: layer 5 in the OSI model, the oft-overlooked session layer.

Oliver offers an overview of Finagle, the high-scale RPC library developed at Twitter and adopted by Pinterest, SoundCloud, ING Bank, and other companies, tracing Finagle’s evolution from a simple library into something much more: a unified, global mechanism for operability and control over a highly disaggregated application architecture. Oliver explains how this mechanism provides Twitter with higher-level, service-based semantics around scalability, reliability, and fault tolerance and how the control over layer 5 afforded by Finagle allowed Twitter to solve some of the most surprising and difficult problems with its highly distributed architecture—when the software architecture diagram and the org chart intersected.

Oliver concludes by introducing linkerd, an open source proxy form of Finagle, which extends Finagle’s operational model to non-JVM or polyglot microservices, and demonstrates how linkerd can be used to “wrap” multiservice applications, independent of application language(s) or infrastructure, to obtain many of the benefits that Finagle provides for Twitter.

625beff353c7c2b068b26d1a57566e05?s=128

Oliver Gould

September 22, 2016
Tweet

Transcript

  1. The Once and Future Layer 5 Resilient, Twitter-style microservices oliver

    gould
 cto, buoyant Velocity NYC, September 22 2016 from
  2. oliver gould • founding cto @ buoyant
 open-source microservice infrastructure

    • previously, tech lead @ twitter: • observability • traffic • core contributor: finagle • creator: linkerd • likes: dogs • dislikes: being woken up for computers @olix0r
 ver@buoyant.io
  3. overview • 2010: Riding the Whale • “Microservices” • The

    Once and Future Layer 5 • Introducing linkerd • Demotime!
  4. 2010 A FAILWHALE ODYSSEY

  5. Twitter, 2010 107 users 107 tweets/day 102 engineers 101 services

    101 deploys/week 102 hosts 10-1 datacenters 101 user-facing outages/week https://blog.twitter.com/2010/measuring-tweets
  6. None
  7. None
  8. Events https://blog.twitter.com/2013/new-tweets-per-second-record-and-how

  9. Asymmetry Photo by @troy

  10. Provisioning

  11. microservices everything’s easier with (not)

  12. scaling teams growing software

  13. flexibility

  14. performance correctness monitoring debugging efficiency security operability
 resilience

  15. microservices are difficult.

  16. Resilience is an imperative: our software runs on the truly

    dismal computers we call datacenters. Besides being heinously
 complex… they are unreliable and prone to
 operator error. Marius Eriksen @marius
 RPC Redux
  17. resilience in microservices software you didn’t write hardware you can’t

    touch network you can’t configure break in new and surprising ways and your customers shouldn’t notice
  18. resilient microservices means resilient communication

  19. datacenter [1] physical [2] link [3] network [4] transport kubernetes

    
 canal, weave, … aws, azure, digitalocean, gce, … business languages, libraries [7] application rpc [5] session [6] presentation json, protobuf, thrift, … http/2, mux, …
  20. layer 5 dispatches requests onto layer 4 connections

  21. finagle THE ONCE AND FUTURE LAYER 5

  22. github.com/twitter/finagle RPC library (JVM) asynchronous built on Netty scala functional

    strongly typed first commit: Oct 2010 (@nk + @marius)
  23. used by…

  24. programming finagle val users = Thrift.newIface[UserSvc](“/s/users”)
 val timelines = Thrift.newIface[TimelineSvc](“/s/timeline”)

    Http.serve(“:8080”, Service.mk[Request, Response] { req => for { user <- users.get(userReq(req)) timeline <- timelines.get(user) } yield renderHTML(user, timeline) })
  25. your server is a function trait Service[Req, Rsp] { def

    apply(req: Req): Future[Rsp] def close(deadline: Time): Future[Unit] }
  26. your server is a function trait ServiceFactory[Req, Rsp] { def

    apply(conn: ClientConnection): Future[Service[Req, Rsp]] def close(deadline: Time): Future[Unit] }
  27. your server is a function trait Filter[InReq, OutRsp, OutReq, InRsp]

    { def apply(req: InReq, service: Service[OutReq, InRsp]): Future[OutRsp] def andThen[A, B](f: Filter[OutReq, InRsp, A, B]): Filter[OutReq, InRsp, A, B] def andThen[A, B](s: Service[A, B]): Service[OutReq, InRsp] def andThen[A, B](sf: ServiceFactory[A, B]): ServiceFactory[OutReq, InRsp] }
  28. your server is a function val service: Service[http.Request, http.Response] =

    recordHandletime andThen traceRequest andThen logRequest andThen timeouts andThen myService val server: ListeningServer = Http.serve(“:8080”, service) val client: ServiceFactory[http.Request, http.Response] = retries andThen Http.newClient(“127.1:8080”)
  29. operating finagle transport security service discovery circuit breaking backpressure deadlines

    retries tracing monitoring keep-alive multiplexing load balancing per-request routing service-level objectives Observe Session timeout Retries Request draining Load balancer Monitor Observe Trace Failure accrual Request timeout Pool Fail fast Expiration Dispatcher
  30. layer 5 naming

  31. layer 5 naming applications refer to logical names
 requests are

    bound to concrete names
 delegations express routing /s/users /#/io.l5d.zk/prod/users/http /s => /#/io.l5d.zk/prod/http
  32. per-request routing: staging GET / HTTP/1.1
 Host: mysite.com
 Dtab-local: /s/B

    => /s/B2
  33. per-request routing: debug proxy GET / HTTP/1.1
 Host: mysite.com
 Dtab-local:

    /s/E => /s/P/s/E
  34. “It’s slow”
 is the hardest problem you’ll ever debug. Jeff

    Hodges @jmhodges
 Notes on Distributed Systems for Young Bloods
  35. the more components you deploy, the more problems you have

  36. the more components you deploy, the more problems you have

  37. the more components you deploy, the more problems you have

  38. tracing

  39. tracing

  40. tracing

  41. lb algorithms: • round-robin • fewest connections • queue depth

    • exponentially-weighted moving average (ewma) • aperture load balancing at layer 5
  42. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db
  43. timeouts & retries timelines users web db timeout=400ms retries=3 timeout=400ms

    retries=2 timeout=200ms retries=3 timelines users web db 800ms! 600ms!
  44. deadlines timelines users web db timeout=400ms deadline=323ms deadline=210ms 77ms elapsed

    113ms elapsed
  45. retries typical: retries=3

  46. retries typical: retries=3 worst-case: 300% more load!!!

  47. budgets typical: retries=3 better:
 retryBudget=20% worst-case: 300% more load!!! worst-case:

    20% more load
  48. cancellation timelines users web db timelines users web db timeout!

  49. cancellation timelines users web db timelines users web db timeout!

  50. Nacking timelines users web db timelines users web db

  51. Nacking timelines users web db timelines users web db nack!

  52. Nacking timelines users web db timelines users web db requeue

  53. magic ops sprinkles transport security service discovery circuit breaking backpressure

    deadlines retries tracing metrics keep-alive multiplexing load balancing per-request routing service-level objectives Observe Session timeout Retries Request draining Load balancer Monitor Observe Trace Failure accrual Request timeout Pool Fail fast Expiration Dispatcher
  54. None
  55. So just rewrite everything in Finagle?

  56. linkerd

  57. github.com/buoyantio/linkerd microservice rpc proxy layer-5 router aka l5d built on

    finagle & netty pluggable http, thrift, … consul, etcd, k8s, marathon, zk, … …
  58. magic operability sprinkles transport security service discovery circuit breaking backpressure

    deadlines retries tracing metrics keep-alive multiplexing load balancing per-request routing service-level objectives Service B instance linkerd Service C instance linkerd Service A instance linkerd
  59. namerd service discovery service delegates logical names to service discovery

    centralized routing policy pluggable consul, etcd, k8s, zk, …
  60. namerd

  61. demo: gob’s microservice

  62. kubernetes

  63. host kubelet pod app: b ip: 10.2.3.4 container container pod

    app: a ip: 10.1.2.3 container
  64. host app: a app: b app: a host app: b

    app: a app: b service-a
  65. web word gen l5d l5d l5d

  66. web word gen gen-v2 l5d l5d l5d l5d

  67. web word gen gen-v2 l5d l5d l5d l5d namerd

  68. github.com/buoyantio/linkerd-examples

  69. linkerd roadmap • HTTP/2+gRPC linkerd#174 • Deadline Enforcement (in progress)

    • Dark Traffic • Improved namerd API • All configurable everything
  70. more at linkerd.io slack: slack.linkerd.io email: ver@buoyant.io twitter: • @olix0r

    • @linkerd thanks!