Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Twitch's GraphQL Transformation (with notes)

323bbb2dfc5bb16b9c444e9e5d62a865?s=47 Tony Ghita
October 26, 2017

Twitch's GraphQL Transformation (with notes)

Twitch has over six hundred engineers, millions of concurrent viewers, and is one of the highest trafficked sites on the internet today. Our API handles hundreds of thousands of requests per second and powers many different clients. The API aggregates hundreds of underlying services together in a singular, (hopefully) coherent package.

Adopting a GraphQL API enabled us to rewrite our aging web and mobile clients within a few months, giving users a much snappier experience. We’ve built tooling to optimize for developer velocity on the service side as well. Code generation keeps the cost of adding types and fields minimal, and automated integration tests ensure changes are safe. We’ve wrangled dataloaders into shape to provide efficient, high-performance operations across a multitude of backing services.

And perhaps most importantly, GraphQL has acted as a guiding force to a more standardized and flexible service ecosystem. We’ve been pushed to reconsider past decisions around service aspects like authorization and pagination, and have come out with much improved systems.

Presented at GraphQL Summit 2017.

323bbb2dfc5bb16b9c444e9e5d62a865?s=128

Tony Ghita

October 26, 2017
Tweet

Transcript

  1. Our GraphQL Transformation

  2. tonyghita tonyghita @tonyghita My name is Tony Ghita. Here are

    my Github, Twitch, and Twitter handles respectively. I'm an engineer at Twitch on the API platform team, leading GraphQL API implementation efforts. I noticed some of the other speakers shared some pictures of their pets, so before we start I'd like to share some pictures of my rebellious teenage puppy, Finn.
  3. This is my dog Finn. Sometimes he's pretty mischievous.

  4. Most times he's just a good boy.

  5. Our GraphQL Transformation Back to our regularly scheduled programming.

  6. social video For those of you not familiar with Twitch:

    - it's a social video platform - for gamers
  7. This is Twitch. And this is our newest web app.

    We're currently ramping up traffic to this app as I speak, and we hope to be serving it to 90% of users by the end of tomorrow. Our website has been completely re-written over just the past the 6 months. It's powered by our GraphQL API, written in TypeScript, uses Apollo Client to cache GraphQL requests and renders using React. 
 I'm going to talk about how we got here and what I wish I knew at the start.
  8. Let's talk about Twitch on GraphQL.

  9. paradigm shift Adding GraphQL to our API ecosystem required us

    to make a hard paradigm shift. For a lot of us, REST has been synonymous with APIs for our entire careers.
  10. GET /users/:id/friends query getFriends { currentUser { friends { name

    } } } Making the leap from REST to GraphQL forced us to consider our systems from a different perspective. We started thinking in terms of data and relationships between data instead of endpoints.
  11. missing unit test This new perspective changed how we thought

    about aspects of our service oriented architecture. Adding a GraphQL API to our ecosystem was like adding a missing unit test. The different perspective exposed missed opportunities for a more scalable architecture.
  12. improved services Adopting GraphQL has acted as a guiding force

    to a more standardized and flexible service ecosystem. We’ve been pushed to reconsider past decisions around service aspects like authorization and pagination, and have come out with much improved systems overall (much like this Mario who has become an apex predator).
  13. Hoping to share with you our journey adopting GraphQL and

    some things I wish I knew when I was starting out. I'm hoping this encourages you to begin your own journey or continue on if you're already started.
 

  14. 600+ nerds I'm one of 600 engineers. Something like 2x

    growth over the past year, lots of new faces.
  15. api gateway /users /streams /games /friends service service service endpoints

    And our architecture looks something like this. Requests flow through the API gateway first. Endpoints aggregate data from many backend services. My team works on that part. Hi team!
  16. api gateway /users /streams /games /friends service service service endpoints

    100+ We have a lot of backend services. Probably just over 100 at this point.
  17. api gateway /users /streams /games /friends service service MONOLITH endpoints

    100+ We also have one monolith we've been migrating away from. Hopefully this sounds like a familiar architecture to you.
  18. api gateway /users /streams /games /friends service service MONOLITH endpoints

    100+ Our API gateway is written in Go. Those are Go's mascots, gophers.
  19. api gateway /users /streams /games /friends service service MONOLITH endpoints

    100+ Most of our backend services are also written in Go.
  20. REST, JSON Our API was a well-worn JSON REST API,

    like I imagine most companies have been implementing for the last decade or so.
  21. sum m it 2016 sum m it 2017 proof-of-concept type

    com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION Here's a rough timeline of the steps we took to build a GraphQL API. We've gone from knowing next-to-nothing before GraphQL Summit 2016 to creating our own GraphQL API powering a majority of Twitch API traffic today.
 
 I'm going to take you through the steps we took to get there and what I learned along the way.
  22. Let's start at the beginning of our journey. This is

    a picture of last year's GraphQL Summit.
  23. sum m it 2017 proof-of-concept type com position & pagination

    authN / authZ ??? live beta PROTOTYPE PRODUCTION sum m it 2016 We came to last year's GraphQL Summit with just a vague idea of what GraphQL was.

  24. As we listened to the talks, we realized that GraphQL

    solved many of the issues we were experiencing at the API gateway. Issues like versioning, over-fetching data, providing a good developer experience. We came away with all sorts of ideas on how we could use GraphQL to make our lives as API gateway engineers easier. And we left inspired to spend some time building a proof-of-concept to prove that it could work within our existing systems.
  25. sum m it 2016 sum m it 2017 type com

    position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION proof-of-concept So we took our GraphQL inspiration and set out to build a proof-of-concept to see how it would work in our system. Worth noting, at this point we thought we would use GraphQL to make it easier for API gateway contributors to write REST endpoints.
  26. – Cruise Industry News Quarterly, 1999 “... if Henry Ford

    canvassed people on whether or not he should build a motor car, they'd probably tell him what they really wanted was a faster horse” In this case, we started approaching the adoption of GraphQL in search of a faster horse, instead of a motor car. Side note: famously misattributed to Henry Ford, no proof he's actually ever said this.
  27. type Query { users(ids: [ID!], names: [String!]): [User] } type

    User { id: ID name: String } The first type in the prototype was something like this, with a single type and single query. Fetching data from a single service.
 Nothing complex.
  28. resolve types & fields load data efficiently integrate services This

    simple prototype was enough to learn: - how to resolve types and fields in a GraphQL API. - how to load data efficiently (dataloader!) - and how to integrate our backend services with a GraphQL API.
  29. sum m it 2016 sum m it 2017 proof-of-concept authN

    / authZ ??? production beta PROTOTYPE PRODUCTION type com position & pagination So I took some baby steps and successfully implemented a simple type and query, and learned a ton about how GraphQL works (especially in our choice of implementation). I thought a good next step was to implement enough types to load a logged-out version our the front page.
 This would let us test more complex interactions between types (like composition and pagination) without worrying yet about authentication and authorization.
  30. streams games This is a screenshot I took of the

    logged-out front page a while back. If you squint you can just make out that we'll need to create "Stream" and "Game" types.
  31. streams games users You'll have to trust me on this,

    but we can see that streams actually are composed from games and users.
  32. type Query { users(ids: [ID!], names: [String!]): [User] streams(first: Int

    = 10): [Stream] games(first: Int = 10): [Game] } type User { id: ID name: String } type Stream { id: ID broadcaster: User game: Game name: String viewers: Int } type Game { id: ID name: String viewers: Int } This is getting to be a bit too much to fit on a slide, but here we have - types that resolve other types - our first paginated queries Most details have been omitted, but hopefully this gives you a good idea of how the data needs were weaseled out (i.e. client-driven development).
  33. type composition pagination This step was enough to get familiarized

    with the intricacies of resolver type composition, and also pagination.
  34. pagination Type composition was straightforward. But once we had to

    put pagination into the schema... well, this is where we first really had to think about pagination as an all-encompassing idea that spanned the whole of an API.
  35. offset + limit cursors timestamp Most (but not all) of

    our services had pagination implemented using offset and limit pagination.
 Some services have begun to outgrow offset limit and graduated to cursors. Other services used timestamp offsets or even more nebulous some undocumented amount of integer time units.

  36. REST /v1 /v2 /v3 /v4 /v5 This manifested in our

    REST API organically as service owners wrote endpoints to expose their own data. This resulted in an interface that isn't really straightforward for users. Offset+limit in one place, cursors in another, etc etc.
  37. "a01s4==" We decided to standardize on cursor based pagination for

    efficiency and ease of use. At the GraphQL layer, we translate opaque cursors until service owners can update their APIs.
 
 At that point, we just pass the data from the API consumer to the service.
  38. type Query { games(first: Int, after: Cursor): GameConnection } query

    getFeaturedGames { games(first: 10, after: "a01s4 ==") { edges { cursor
 node { id, name, viewersCount } } } } We decided to standardize on cursor based pagination for efficiency and ease of use. At the GraphQL layer, we translate opaque cursors until service owners can update their APIs.
 
 At that point, we just pass the data from the API consumer to the service.
  39. standardize the interface GraphQL gave us the perfect place to

    enforce consistency in pagination schemes while allowing the services to adopt cursor-based pagination on their own roadmap. And pagination is only one example where we saw this kind of opportunity to standardize our API.
 We've seen this standardization eventually work itself down to the backend.
  40. relay cursor connection spec bit.ly/gqlPage We decided to standardize on

    the relay cursor connection specification.
 
 Encourage you to look into it because it allows for very flexible and efficient pagination that our previous offset + limit and page cursor implementation.
  41. sum m it 2016 sum m it 2017 proof-of-concept type

    com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION Until now we punted on implementing authorization in our GraphQL API. We had some idea that it wouldn't quite fit our current auth system very well. We were finally at the point where we had to face the challenge head on.

  42. "auth" Historically, Twitch has treated "auth" as singular concept. But

    when we talk about auth, we really mean authentication (i.e. who is making the request) and authorization (i.e. what is the authenticated user allowed to do or see).
  43. api gateway endpoints /users /streams /games /friends service service service

    Here's our rough architecture diagram again. Requests flow through the API gateway first. Endpoints aggregate data from many backend services.
  44. authentication api gateway authorization authorization authorization authorization service service service

    endpoints The way our old API works is that we authenticate the request to so we know who is making the request. Each endpoint requests the authorization aggregates data from one or more backend services.
  45. authentication api gateway authorization authorization authorization authorization service service service

    endpoints Authorization tokens are sent to the services if the user was authorized. The services then double check that all the expected authorization data is there.
  46. authentication api gateway authorization authorization authorization authorization service service service

    endpoints Likewise, unauthorized requests
  47. authentication api gateway service service service endpoint /graphql We could

    try taking two passes at each query... one to analyze the authorization requirements and one to execute the query.
  48. authentication api gateway service service service endpoint /graphql authorization? We

    could try taking two passes at each query... one to analyze the authorization requirements and one to execute the query.
  49. authentication api gateway service service service endpoint /graphql authorization? But

    do we ship this massive authorization token to every service? Most services probably won't care about other service-related auth data.
  50. authentication api gateway service service service endpoint /graphql authorization? authorization?

    authorization? authorization? We could try to make it so that every resolver calls the authorization service and we merge tokens together to send to the service. Everything we thought of fit this style of authorization was probably doable, but seemed fairly hard.
  51. authentication api gateway service service service endpoint /graphql authorization? authorization?

    authorization? authorization? We could try to make it so that every resolver calls the authorization service and we merge tokens together to send to the service.
  52. Everything we thought of fit this style of authorization was

    probably doable, but seemed fairly hard to implement right. And if we ever wanted to create a new API, we'd end up doing all this work over again. We'd already duplicated a bunch of code creating v1-5 of the REST API.
  53. authentication api gateway service service service endpoint /graphql authorization authorization

    authorization We ended up taking Dan Schafer and team's advice to keep authorization logic out of the API layer. This approach greatly simplifies API development. At this point, we're just passing data back and forth.
  54. separation of concerns push logic to services And I think

    that's where you want to be as an API gateway team. Passing data back and forth. It also encourages a good separation of concerns between the API and backends.
  55. easy Sounds easy right?

  56. :\ Unfortunately this meant we had to rethink our current

    approach.
 
 It also means convincing a ton of service owners to move business logic into their service.
 And we'd just spent some time convincing teams to use this centralized authorization system.
  57. ;) But it really feels like the right thing to

    do, and we're certain the end result will be better than what we would've ended up with otherwise.
  58. keep it simple Don't make it more complicated than you

    can afford. If you have the logical abstractions, make use of them. If you don't, consider planning to make them. Try to keep the API layer as thin as possible, allow services domain over their own business logic.
  59. bit.ly/gqlAuth For a more in depth look, I recommend watching

    Dan Schafer's 2016 ReactEurope talk on GraphQL.
  60. sum m it 2016 sum m it 2017 proof-of-concept type

    com position & pagination authN / authZ production beta PROTOTYPE PRODUCTION ??? By now we had a really compelling prototype. I was introduction GraphQL to different teams, and they were getting hyped. We went from (what seemed to me) wondering if anyone would care to "we must have GraphQL everything" immediately.

  61. Time Contributors Me 50+ contributors Until about 5 or 6

    months ago, I was the only one adding types and functionality to the GraphQL API. We got lucky with finding support. Our web and desktop client team was looking to rewrite the website as a major company initiative. GraphQL was the perfect upgrade to these clients' data fetching components. Suddenly we had bunch of contributions to the API from teams all over the company. I was caught way off guard.
  62. Time Contributors Me 50+ contributors Until about 5 or 6

    months ago, I was the only one adding types and functionality to the GraphQL API. We got lucky with finding support. Our web and desktop client team was looking to rewrite the website as a major company initiative. GraphQL was the perfect upgrade to these clients' data fetching components. Suddenly we had bunch of contributions to the API from teams all over the company. I was caught way off guard.
  63. what i wish i knew: scaling contributions I have some

    words of advice for my past self, and hopefully some of you who are currently prototyping GraphQL APIs and showing it around to colleagues, lest you find yourself successful.
  64. document good practices It's impossible to give advice on what

    to do specific to your backend implementation. There are so many tradeoffs to make that largely depend on the scale you operate. What works for one company may not work for another. As you figure out what works for you, write it down.
 Explain why you're taking the current approach.
 When the next developer picks up the codebase they'll have a great reference to work off of, and won't be so lost.
 
 Not everyone gets the advantage of months of proof-of-concept work to figure it out.
  65. iterate on a styleguide Create a style guide, so as

    you discover best practices for your GraphQL implementation you have a place to iterate on. We're continuously working on ours, as we find things that work well and things that break down in certain situations.
  66. write linters Take your style guidelines and codify them in

    the form of linters. This will automate away mechanical feedback in pull requests, and help get new developers up to speed without requiring your time.
  67. boilerplate Fight boilerplate. In my experience writing a GraphQL server

    in Golang, there's a ton of boilerplate involved in getting some types working in the API.
  68. type User { id: ID! name: String! } func (r

    *UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk }
  69. type User { id: ID! name: String! } func (r

    *UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk }
  70. type User { id: ID! name: String! } func (r

    *UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk }
  71. generate code Luckily, go is great for code generation. Aggressively

    generate code from the schema, according to the best practices you've documented. Keep iterating on code generation. The dream is to have full schema-to-resolver code generation. We're still only generating resolver skeletons, but hope to convince teams to switch to strongly typed APIs and well-understood standards for their services to make this a reality.
  72. type User { id: ID! name: String! } func (r

    *UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk } Right now we're at the point where we can just about generate everything in simple resolvers like this except for the part where you need to know what the service's data looks like. My hope is that we can convince teams to go beyond their REST JSON APIs and adopt some kind of IDL of their own. Then, we could generate all resolver code just by knowing the mapping of GraphQL schema IDL to backend service IDL.
  73. automate knowledge share I'm going to make a bold statement

    and claim that scaling out GraphQL knowledge to colleagues is on order of magnitude harder than any of the technical challenges in building a GraphQL API. Automate knowledge share.
  74. – African Proverb “If you want to go fast, go

    alone. If you want to go far, go together.”
  75. go.twitch.tv Encourage you to check out go.twitch.tv and compare experience

    with twitch.tv. Currently in beta.
  76. what i wish i knew: nullability & errors BONUS

  77. sum m it 2016 sum m it 2017 proof-of-concept type

    com position & pagination authN / authZ PROTOTYPE PRODUCTION ??? production beta Since running a beta version of the website against production traffic, we've discovered some sticky points around API consumer's contract with nullable types, and error usage that I'd like to share so you can avoid them.

  78. nullability I made a pretty subtle mistake in our early

    prototype with respect to nullability. We didn't catch it until we were ramping up to speed on our new application.
  79. query getNullabilityFail { user(name: "does not exist") { followers {

    nodes { id, name } } } } type Query { user(name: String): User } Let's say we have a query to find a user by their name. 
 Semantically speaking, if we specify a name that is not associated with any user, like "does not exist", we should expect to receive null for the user. That's exactly what our schema says should happen based on the types.
  80. query getNullabilityFail { user(name: "does not exist") { followers {

    nodes { id, name } } } } { "data": { "user": { "followers": { "nodes": [] }
 } } } But when we execute the query, we get back a different response than we expect.
  81. query getNullabilityFail { user(name: "does not exist") { followers {

    nodes { id, name } } } } { "data": { "user": { "followers": { "nodes": [] }
 } } } X This isn't what we want, because it's not how we defined the schema. Early on, I implemented resolvers to lazy-load data when the fields are resolved.
 This let's us skip a call to the user service if some other service also takes user name. However, this doesn't account for invalid or missing user names.
  82. query getNullabilityFail { user(name: "does not exist") { followers {

    nodes { id, name } } } } { "data": { "user": null } } Instead we wan to load data at the node level. This allows us to adhere to the schema's semantics.
  83. errors Do the minimum to make errors useful. Assume no

    one will inspect errors,
  84. be lazy Do the minimum to make errors useful.

  85. assume Assume no one will take the time to inspect

    errors. Many popular clients default to to throwing the entire response away if an error occurs.
  86. mutations bit.ly/gqlMutate Validation errors have a nice home on mutation

    payload objects. Reserve the errors array for service-type issues where you're comfortable throwing everything away.
  87. next?

  88. thanks!