Twitch's GraphQL Transformation (with notes)

Slide 1

Slide 1 text

Our GraphQL Transformation

Slide 2

Slide 2 text

tonyghita tonyghita @tonyghita My name is Tony Ghita. Here are my Github, Twitch, and Twitter handles respectively. I'm an engineer at Twitch on the API platform team, leading GraphQL API implementation eﬀorts. I noticed some of the other speakers shared some pictures of their pets, so before we start I'd like to share some pictures of my rebellious teenage puppy, Finn.

Slide 3

Slide 3 text

This is my dog Finn. Sometimes he's pretty mischievous.

Slide 4

Slide 4 text

Most times he's just a good boy.

Slide 5

Slide 5 text

Our GraphQL Transformation Back to our regularly scheduled programming.

Slide 6

Slide 6 text

social video For those of you not familiar with Twitch: - it's a social video platform - for gamers

Slide 7

Slide 7 text

This is Twitch. And this is our newest web app. We're currently ramping up traﬃc to this app as I speak, and we hope to be serving it to 90% of users by the end of tomorrow. Our website has been completely re-written over just the past the 6 months. It's powered by our GraphQL API, written in TypeScript, uses Apollo Client to cache GraphQL requests and renders using React.   I'm going to talk about how we got here and what I wish I knew at the start.

Slide 8

Slide 8 text

Let's talk about Twitch on GraphQL.

Slide 9

Slide 9 text

paradigm shift Adding GraphQL to our API ecosystem required us to make a hard paradigm shift. For a lot of us, REST has been synonymous with APIs for our entire careers.

Slide 10

Slide 10 text

GET /users/:id/friends query getFriends { currentUser { friends { name } } } Making the leap from REST to GraphQL forced us to consider our systems from a diﬀerent perspective. We started thinking in terms of data and relationships between data instead of endpoints.

Slide 11

Slide 11 text

missing unit test This new perspective changed how we thought about aspects of our service oriented architecture. Adding a GraphQL API to our ecosystem was like adding a missing unit test. The diﬀerent perspective exposed missed opportunities for a more scalable architecture.

Slide 12

Slide 12 text

improved services Adopting GraphQL has acted as a guiding force to a more standardized and ﬂexible service ecosystem. We’ve been pushed to reconsider past decisions around service aspects like authorization and pagination, and have come out with much improved systems overall (much like this Mario who has become an apex predator).

Slide 13

Slide 13 text

Hoping to share with you our journey adopting GraphQL and some things I wish I knew when I was starting out. I'm hoping this encourages you to begin your own journey or continue on if you're already started.   

Slide 14

Slide 14 text

600+ nerds I'm one of 600 engineers. Something like 2x growth over the past year, lots of new faces.

Slide 15

Slide 15 text

api gateway /users /streams /games /friends service service service endpoints And our architecture looks something like this. Requests ﬂow through the API gateway ﬁrst. Endpoints aggregate data from many backend services. My team works on that part. Hi team!

Slide 16

Slide 16 text

api gateway /users /streams /games /friends service service service endpoints 100+ We have a lot of backend services. Probably just over 100 at this point.

Slide 17

Slide 17 text

api gateway /users /streams /games /friends service service MONOLITH endpoints 100+ We also have one monolith we've been migrating away from. Hopefully this sounds like a familiar architecture to you.

Slide 18

Slide 18 text

api gateway /users /streams /games /friends service service MONOLITH endpoints 100+ Our API gateway is written in Go. Those are Go's mascots, gophers.

Slide 19

Slide 19 text

api gateway /users /streams /games /friends service service MONOLITH endpoints 100+ Most of our backend services are also written in Go.

Slide 20

Slide 20 text

REST, JSON Our API was a well-worn JSON REST API, like I imagine most companies have been implementing for the last decade or so.

Slide 21

Slide 21 text

sum m it 2016 sum m it 2017 proof-of-concept type com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION Here's a rough timeline of the steps we took to build a GraphQL API. We've gone from knowing next-to-nothing before GraphQL Summit 2016 to creating our own GraphQL API powering a majority of Twitch API traﬃc today.    I'm going to take you through the steps we took to get there and what I learned along the way.

Slide 22

Slide 22 text

Let's start at the beginning of our journey. This is a picture of last year's GraphQL Summit.

Slide 23

Slide 23 text

sum m it 2017 proof-of-concept type com position & pagination authN / authZ ??? live beta PROTOTYPE PRODUCTION sum m it 2016 We came to last year's GraphQL Summit with just a vague idea of what GraphQL was. 

Slide 24

Slide 24 text

As we listened to the talks, we realized that GraphQL solved many of the issues we were experiencing at the API gateway. Issues like versioning, over-fetching data, providing a good developer experience. We came away with all sorts of ideas on how we could use GraphQL to make our lives as API gateway engineers easier. And we left inspired to spend some time building a proof-of-concept to prove that it could work within our existing systems.

Slide 25

Slide 25 text

sum m it 2016 sum m it 2017 type com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION proof-of-concept So we took our GraphQL inspiration and set out to build a proof-of-concept to see how it would work in our system. Worth noting, at this point we thought we would use GraphQL to make it easier for API gateway contributors to write REST endpoints.

Slide 26

Slide 26 text

– Cruise Industry News Quarterly, 1999 “... if Henry Ford canvassed people on whether or not he should build a motor car, they'd probably tell him what they really wanted was a faster horse” In this case, we started approaching the adoption of GraphQL in search of a faster horse, instead of a motor car. Side note: famously misattributed to Henry Ford, no proof he's actually ever said this.

Slide 27

Slide 27 text

type Query { users(ids: [ID!], names: [String!]): [User] } type User { id: ID name: String } The ﬁrst type in the prototype was something like this, with a single type and single query. Fetching data from a single service.  Nothing complex.

Slide 28

Slide 28 text

resolve types & fields load data efficiently integrate services This simple prototype was enough to learn: - how to resolve types and fields in a GraphQL API. - how to load data efficiently (dataloader!) - and how to integrate our backend services with a GraphQL API.

Slide 29

Slide 29 text

sum m it 2016 sum m it 2017 proof-of-concept authN / authZ ??? production beta PROTOTYPE PRODUCTION type com position & pagination So I took some baby steps and successfully implemented a simple type and query, and learned a ton about how GraphQL works (especially in our choice of implementation). I thought a good next step was to implement enough types to load a logged-out version our the front page.  This would let us test more complex interactions between types (like composition and pagination) without worrying yet about authentication and authorization.

Slide 30

Slide 30 text

streams games This is a screenshot I took of the logged-out front page a while back. If you squint you can just make out that we'll need to create "Stream" and "Game" types.

Slide 31

Slide 31 text

streams games users You'll have to trust me on this, but we can see that streams actually are composed from games and users.

Slide 32

Slide 32 text

type Query { users(ids: [ID!], names: [String!]): [User] streams(first: Int = 10): [Stream] games(first: Int = 10): [Game] } type User { id: ID name: String } type Stream { id: ID broadcaster: User game: Game name: String viewers: Int } type Game { id: ID name: String viewers: Int } This is getting to be a bit too much to ﬁt on a slide, but here we have - types that resolve other types - our ﬁrst paginated queries Most details have been omitted, but hopefully this gives you a good idea of how the data needs were weaseled out (i.e. client-driven development).

Slide 33

Slide 33 text

type composition pagination This step was enough to get familiarized with the intricacies of resolver type composition, and also pagination.

Slide 34

Slide 34 text

pagination Type composition was straightforward. But once we had to put pagination into the schema... well, this is where we ﬁrst really had to think about pagination as an all-encompassing idea that spanned the whole of an API.

Slide 35

Slide 35 text

offset + limit cursors timestamp Most (but not all) of our services had pagination implemented using offset and limit pagination.  Some services have begun to outgrow offset limit and graduated to cursors. Other services used timestamp offsets or even more nebulous some undocumented amount of integer time units. 

Slide 36

Slide 36 text

REST /v1 /v2 /v3 /v4 /v5 This manifested in our REST API organically as service owners wrote endpoints to expose their own data. This resulted in an interface that isn't really straightforward for users. Oﬀset+limit in one place, cursors in another, etc etc.

Slide 37

Slide 37 text

"a01s4==" We decided to standardize on cursor based pagination for eﬃciency and ease of use. At the GraphQL layer, we translate opaque cursors until service owners can update their APIs.    At that point, we just pass the data from the API consumer to the service.

Slide 38

Slide 38 text

type Query { games(first: Int, after: Cursor): GameConnection } query getFeaturedGames { games(first: 10, after: "a01s4 ==") { edges { cursor  node { id, name, viewersCount } } } } We decided to standardize on cursor based pagination for eﬃciency and ease of use. At the GraphQL layer, we translate opaque cursors until service owners can update their APIs.    At that point, we just pass the data from the API consumer to the service.

Slide 39

Slide 39 text

standardize the interface GraphQL gave us the perfect place to enforce consistency in pagination schemes while allowing the services to adopt cursor-based pagination on their own roadmap. And pagination is only one example where we saw this kind of opportunity to standardize our API.  We've seen this standardization eventually work itself down to the backend.

Slide 40

Slide 40 text

relay cursor connection spec bit.ly/gqlPage We decided to standardize on the relay cursor connection specification.    Encourage you to look into it because it allows for very flexible and efficient pagination that our previous offset + limit and page cursor implementation.

Slide 41

Slide 41 text

sum m it 2016 sum m it 2017 proof-of-concept type com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION Until now we punted on implementing authorization in our GraphQL API. We had some idea that it wouldn't quite ﬁt our current auth system very well. We were ﬁnally at the point where we had to face the challenge head on. 

Slide 42

Slide 42 text

"auth" Historically, Twitch has treated "auth" as singular concept. But when we talk about auth, we really mean authentication (i.e. who is making the request) and authorization (i.e. what is the authenticated user allowed to do or see).

Slide 43

Slide 43 text

api gateway endpoints /users /streams /games /friends service service service Here's our rough architecture diagram again. Requests ﬂow through the API gateway ﬁrst. Endpoints aggregate data from many backend services.

Slide 44

Slide 44 text

authentication api gateway authorization authorization authorization authorization service service service endpoints The way our old API works is that we authenticate the request to so we know who is making the request. Each endpoint requests the authorization aggregates data from one or more backend services.

Slide 45

Slide 45 text

authentication api gateway authorization authorization authorization authorization service service service endpoints Authorization tokens are sent to the services if the user was authorized. The services then double check that all the expected authorization data is there.

Slide 46

Slide 46 text

authentication api gateway authorization authorization authorization authorization service service service endpoints Likewise, unauthorized requests

Slide 47

Slide 47 text

authentication api gateway service service service endpoint /graphql We could try taking two passes at each query... one to analyze the authorization requirements and one to execute the query.

Slide 48

Slide 48 text

authentication api gateway service service service endpoint /graphql authorization? We could try taking two passes at each query... one to analyze the authorization requirements and one to execute the query.

Slide 49

Slide 49 text

authentication api gateway service service service endpoint /graphql authorization? But do we ship this massive authorization token to every service? Most services probably won't care about other service-related auth data.

Slide 50

Slide 50 text

authentication api gateway service service service endpoint /graphql authorization? authorization? authorization? authorization? We could try to make it so that every resolver calls the authorization service and we merge tokens together to send to the service. Everything we thought of ﬁt this style of authorization was probably doable, but seemed fairly hard.

Slide 51

Slide 51 text

Slide 52

Slide 52 text

Everything we thought of ﬁt this style of authorization was probably doable, but seemed fairly hard to implement right. And if we ever wanted to create a new API, we'd end up doing all this work over again. We'd already duplicated a bunch of code creating v1-5 of the REST API.

Slide 53

Slide 53 text

authentication api gateway service service service endpoint /graphql authorization authorization authorization We ended up taking Dan Schafer and team's advice to keep authorization logic out of the API layer. This approach greatly simpliﬁes API development. At this point, we're just passing data back and forth.

Slide 54

Slide 54 text

separation of concerns push logic to services And I think that's where you want to be as an API gateway team. Passing data back and forth. It also encourages a good separation of concerns between the API and backends.

Slide 55

Slide 55 text

easy Sounds easy right?

Slide 56

Slide 56 text

:\ Unfortunately this meant we had to rethink our current approach.    It also means convincing a ton of service owners to move business logic into their service.  And we'd just spent some time convincing teams to use this centralized authorization system.

Slide 57

Slide 57 text

;) But it really feels like the right thing to do, and we're certain the end result will be better than what we would've ended up with otherwise.

Slide 58

Slide 58 text

keep it simple Don't make it more complicated than you can aﬀord. If you have the logical abstractions, make use of them. If you don't, consider planning to make them. Try to keep the API layer as thin as possible, allow services domain over their own business logic.

Slide 59

Slide 59 text

bit.ly/gqlAuth For a more in depth look, I recommend watching Dan Schafer's 2016 ReactEurope talk on GraphQL.

Slide 60

Slide 60 text

sum m it 2016 sum m it 2017 proof-of-concept type com position & pagination authN / authZ production beta PROTOTYPE PRODUCTION ??? By now we had a really compelling prototype. I was introduction GraphQL to diﬀerent teams, and they were getting hyped. We went from (what seemed to me) wondering if anyone would care to "we must have GraphQL everything" immediately. 

Slide 61

Slide 61 text

Time Contributors Me 50+ contributors Until about 5 or 6 months ago, I was the only one adding types and functionality to the GraphQL API. We got lucky with ﬁnding support. Our web and desktop client team was looking to rewrite the website as a major company initiative. GraphQL was the perfect upgrade to these clients' data fetching components. Suddenly we had bunch of contributions to the API from teams all over the company. I was caught way oﬀ guard.

Slide 62

Slide 62 text

Slide 63

Slide 63 text

what i wish i knew: scaling contributions I have some words of advice for my past self, and hopefully some of you who are currently prototyping GraphQL APIs and showing it around to colleagues, lest you ﬁnd yourself successful.

Slide 64

Slide 64 text

document good practices It's impossible to give advice on what to do specific to your backend implementation. There are so many tradeoffs to make that largely depend on the scale you operate. What works for one company may not work for another. As you figure out what works for you, write it down.  Explain why you're taking the current approach.  When the next developer picks up the codebase they'll have a great reference to work off of, and won't be so lost.    Not everyone gets the advantage of months of proof-of-concept work to figure it out.

Slide 65

Slide 65 text

iterate on a styleguide Create a style guide, so as you discover best practices for your GraphQL implementation you have a place to iterate on. We're continuously working on ours, as we ﬁnd things that work well and things that break down in certain situations.

Slide 66

Slide 66 text

write linters Take your style guidelines and codify them in the form of linters. This will automate away mechanical feedback in pull requests, and help get new developers up to speed without requiring your time.

Slide 67

Slide 67 text

boilerplate Fight boilerplate. In my experience writing a GraphQL server in Golang, there's a ton of boilerplate involved in getting some types working in the API.

Slide 68

Slide 68 text

Slide 69

Slide 69 text

Slide 70

Slide 70 text

Slide 71

Slide 71 text

generate code Luckily, go is great for code generation. Aggressively generate code from the schema, according to the best practices you've documented. Keep iterating on code generation. The dream is to have full schema-to-resolver code generation. We're still only generating resolver skeletons, but hope to convince teams to switch to strongly typed APIs and well-understood standards for their services to make this a reality.

Slide 72

Slide 72 text

type User { id: ID! name: String! } func (r *UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk } Right now we're at the point where we can just about generate everything in simple resolvers like this except for the part where you need to know what the service's data looks like. My hope is that we can convince teams to go beyond their REST JSON APIs and adopt some kind of IDL of their own. Then, we could generate all resolver code just by knowing the mapping of GraphQL schema IDL to backend service IDL.

Slide 73

Slide 73 text

automate knowledge share I'm going to make a bold statement and claim that scaling out GraphQL knowledge to colleagues is on order of magnitude harder than any of the technical challenges in building a GraphQL API. Automate knowledge share.

Slide 74

Slide 74 text

– African Proverb “If you want to go fast, go alone. If you want to go far, go together.”

Slide 75

Slide 75 text

go.twitch.tv Encourage you to check out go.twitch.tv and compare experience with twitch.tv. Currently in beta.

Slide 76

Slide 76 text

what i wish i knew: nullability & errors BONUS

Slide 77

Slide 77 text

sum m it 2016 sum m it 2017 proof-of-concept type com position & pagination authN / authZ PROTOTYPE PRODUCTION ??? production beta Since running a beta version of the website against production traﬃc, we've discovered some sticky points around API consumer's contract with nullable types, and error usage that I'd like to share so you can avoid them. 

Slide 78

Slide 78 text

nullability I made a pretty subtle mistake in our early prototype with respect to nullability. We didn't catch it until we were ramping up to speed on our new application.

Slide 79

Slide 79 text

query getNullabilityFail { user(name: "does not exist") { followers { nodes { id, name } } } } type Query { user(name: String): User } Let's say we have a query to ﬁnd a user by their name.   Semantically speaking, if we specify a name that is not associated with any user, like "does not exist", we should expect to receive null for the user. That's exactly what our schema says should happen based on the types.

Slide 80

Slide 80 text

query getNullabilityFail { user(name: "does not exist") { followers { nodes { id, name } } } } { "data": { "user": { "followers": { "nodes": [] }  } } } But when we execute the query, we get back a diﬀerent response than we expect.

Slide 81

Slide 81 text

query getNullabilityFail { user(name: "does not exist") { followers { nodes { id, name } } } } { "data": { "user": { "followers": { "nodes": [] }  } } } X This isn't what we want, because it's not how we deﬁned the schema. Early on, I implemented resolvers to lazy-load data when the ﬁelds are resolved.  This let's us skip a call to the user service if some other service also takes user name. However, this doesn't account for invalid or missing user names.

Slide 82

Slide 82 text

query getNullabilityFail { user(name: "does not exist") { followers { nodes { id, name } } } } { "data": { "user": null } } Instead we wan to load data at the node level. This allows us to adhere to the schema's semantics.

Slide 83

Slide 83 text

errors Do the minimum to make errors useful. Assume no one will inspect errors,

Slide 84

Slide 84 text

be lazy Do the minimum to make errors useful.

Slide 85

Slide 85 text

assume Assume no one will take the time to inspect errors. Many popular clients default to to throwing the entire response away if an error occurs.

Slide 86

Slide 86 text

mutations bit.ly/gqlMutate Validation errors have a nice home on mutation payload objects. Reserve the errors array for service-type issues where you're comfortable throwing everything away.