Twitch's GraphQL Transformation (with notes)

Our GraphQL Transformation

tonyghita tonyghita @tonyghita My name is Tony Ghita. Here are
my Github, Twitch, and Twitter handles respectively. I'm an engineer at Twitch on the API platform team, leading GraphQL API implementation eﬀorts. I noticed some of the other speakers shared some pictures of their pets, so before we start I'd like to share some pictures of my rebellious teenage puppy, Finn.

This is my dog Finn. Sometimes he's pretty mischievous.

Most times he's just a good boy.

Our GraphQL Transformation Back to our regularly scheduled programming.

social video For those of you not familiar with Twitch:
- it's a social video platform - for gamers

This is Twitch. And this is our newest web app.
We're currently ramping up traﬃc to this app as I speak, and we hope to be serving it to 90% of users by the end of tomorrow. Our website has been completely re-written over just the past the 6 months. It's powered by our GraphQL API, written in TypeScript, uses Apollo Client to cache GraphQL requests and renders using React.   I'm going to talk about how we got here and what I wish I knew at the start.

Let's talk about Twitch on GraphQL.

paradigm shift Adding GraphQL to our API ecosystem required us
to make a hard paradigm shift. For a lot of us, REST has been synonymous with APIs for our entire careers.

GET /users/:id/friends query getFriends { currentUser { friends { name
} } } Making the leap from REST to GraphQL forced us to consider our systems from a diﬀerent perspective. We started thinking in terms of data and relationships between data instead of endpoints.

missing unit test This new perspective changed how we thought
about aspects of our service oriented architecture. Adding a GraphQL API to our ecosystem was like adding a missing unit test. The diﬀerent perspective exposed missed opportunities for a more scalable architecture.

improved services Adopting GraphQL has acted as a guiding force
to a more standardized and ﬂexible service ecosystem. We’ve been pushed to reconsider past decisions around service aspects like authorization and pagination, and have come out with much improved systems overall (much like this Mario who has become an apex predator).

Hoping to share with you our journey adopting GraphQL and
some things I wish I knew when I was starting out. I'm hoping this encourages you to begin your own journey or continue on if you're already started.   

600+ nerds I'm one of 600 engineers. Something like 2x
growth over the past year, lots of new faces.

api gateway /users /streams /games /friends service service service endpoints
And our architecture looks something like this. Requests ﬂow through the API gateway ﬁrst. Endpoints aggregate data from many backend services. My team works on that part. Hi team!

api gateway /users /streams /games /friends service service service endpoints
100+ We have a lot of backend services. Probably just over 100 at this point.

api gateway /users /streams /games /friends service service MONOLITH endpoints
100+ We also have one monolith we've been migrating away from. Hopefully this sounds like a familiar architecture to you.

100+ Our API gateway is written in Go. Those are Go's mascots, gophers.

100+ Most of our backend services are also written in Go.

REST, JSON Our API was a well-worn JSON REST API,
like I imagine most companies have been implementing for the last decade or so.

sum m it 2016 sum m it 2017 proof-of-concept type
com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION Here's a rough timeline of the steps we took to build a GraphQL API. We've gone from knowing next-to-nothing before GraphQL Summit 2016 to creating our own GraphQL API powering a majority of Twitch API traﬃc today.    I'm going to take you through the steps we took to get there and what I learned along the way.

Let's start at the beginning of our journey. This is
a picture of last year's GraphQL Summit.

sum m it 2017 proof-of-concept type com position & pagination
authN / authZ ??? live beta PROTOTYPE PRODUCTION sum m it 2016 We came to last year's GraphQL Summit with just a vague idea of what GraphQL was. 

As we listened to the talks, we realized that GraphQL
solved many of the issues we were experiencing at the API gateway. Issues like versioning, over-fetching data, providing a good developer experience. We came away with all sorts of ideas on how we could use GraphQL to make our lives as API gateway engineers easier. And we left inspired to spend some time building a proof-of-concept to prove that it could work within our existing systems.

sum m it 2016 sum m it 2017 type com
position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION proof-of-concept So we took our GraphQL inspiration and set out to build a proof-of-concept to see how it would work in our system. Worth noting, at this point we thought we would use GraphQL to make it easier for API gateway contributors to write REST endpoints.

– Cruise Industry News Quarterly, 1999 “... if Henry Ford
canvassed people on whether or not he should build a motor car, they'd probably tell him what they really wanted was a faster horse” In this case, we started approaching the adoption of GraphQL in search of a faster horse, instead of a motor car. Side note: famously misattributed to Henry Ford, no proof he's actually ever said this.

type Query { users(ids: [ID!], names: [String!]): [User] } type
User { id: ID name: String } The ﬁrst type in the prototype was something like this, with a single type and single query. Fetching data from a single service.  Nothing complex.

resolve types & fields load data efficiently integrate services This
simple prototype was enough to learn: - how to resolve types and fields in a GraphQL API. - how to load data efficiently (dataloader!) - and how to integrate our backend services with a GraphQL API.

sum m it 2016 sum m it 2017 proof-of-concept authN
/ authZ ??? production beta PROTOTYPE PRODUCTION type com position & pagination So I took some baby steps and successfully implemented a simple type and query, and learned a ton about how GraphQL works (especially in our choice of implementation). I thought a good next step was to implement enough types to load a logged-out version our the front page.  This would let us test more complex interactions between types (like composition and pagination) without worrying yet about authentication and authorization.

streams games This is a screenshot I took of the
logged-out front page a while back. If you squint you can just make out that we'll need to create "Stream" and "Game" types.

streams games users You'll have to trust me on this,
but we can see that streams actually are composed from games and users.

type Query { users(ids: [ID!], names: [String!]): [User] streams(first: Int
= 10): [Stream] games(first: Int = 10): [Game] } type User { id: ID name: String } type Stream { id: ID broadcaster: User game: Game name: String viewers: Int } type Game { id: ID name: String viewers: Int } This is getting to be a bit too much to ﬁt on a slide, but here we have - types that resolve other types - our ﬁrst paginated queries Most details have been omitted, but hopefully this gives you a good idea of how the data needs were weaseled out (i.e. client-driven development).

type composition pagination This step was enough to get familiarized
with the intricacies of resolver type composition, and also pagination.

pagination Type composition was straightforward. But once we had to
put pagination into the schema... well, this is where we ﬁrst really had to think about pagination as an all-encompassing idea that spanned the whole of an API.

offset + limit cursors timestamp Most (but not all) of
our services had pagination implemented using offset and limit pagination.  Some services have begun to outgrow offset limit and graduated to cursors. Other services used timestamp offsets or even more nebulous some undocumented amount of integer time units. 

REST /v1 /v2 /v3 /v4 /v5 This manifested in our
REST API organically as service owners wrote endpoints to expose their own data. This resulted in an interface that isn't really straightforward for users. Oﬀset+limit in one place, cursors in another, etc etc.

"a01s4==" We decided to standardize on cursor based pagination for
eﬃciency and ease of use. At the GraphQL layer, we translate opaque cursors until service owners can update their APIs.    At that point, we just pass the data from the API consumer to the service.

type Query { games(first: Int, after: Cursor): GameConnection } query
getFeaturedGames { games(first: 10, after: "a01s4 ==") { edges { cursor  node { id, name, viewersCount } } } } We decided to standardize on cursor based pagination for eﬃciency and ease of use. At the GraphQL layer, we translate opaque cursors until service owners can update their APIs.    At that point, we just pass the data from the API consumer to the service.

standardize the interface GraphQL gave us the perfect place to
enforce consistency in pagination schemes while allowing the services to adopt cursor-based pagination on their own roadmap. And pagination is only one example where we saw this kind of opportunity to standardize our API.  We've seen this standardization eventually work itself down to the backend.

relay cursor connection spec bit.ly/gqlPage We decided to standardize on
the relay cursor connection specification.    Encourage you to look into it because it allows for very flexible and efficient pagination that our previous offset + limit and page cursor implementation.

com position & pagination authN / authZ ??? production beta PROTOTYPE PRODUCTION Until now we punted on implementing authorization in our GraphQL API. We had some idea that it wouldn't quite ﬁt our current auth system very well. We were ﬁnally at the point where we had to face the challenge head on. 

"auth" Historically, Twitch has treated "auth" as singular concept. But
when we talk about auth, we really mean authentication (i.e. who is making the request) and authorization (i.e. what is the authenticated user allowed to do or see).

api gateway endpoints /users /streams /games /friends service service service
Here's our rough architecture diagram again. Requests ﬂow through the API gateway ﬁrst. Endpoints aggregate data from many backend services.

authentication api gateway authorization authorization authorization authorization service service service
endpoints The way our old API works is that we authenticate the request to so we know who is making the request. Each endpoint requests the authorization aggregates data from one or more backend services.

endpoints Authorization tokens are sent to the services if the user was authorized. The services then double check that all the expected authorization data is there.

endpoints Likewise, unauthorized requests

authentication api gateway service service service endpoint /graphql We could
try taking two passes at each query... one to analyze the authorization requirements and one to execute the query.

authentication api gateway service service service endpoint /graphql authorization? We
could try taking two passes at each query... one to analyze the authorization requirements and one to execute the query.

authentication api gateway service service service endpoint /graphql authorization? But
do we ship this massive authorization token to every service? Most services probably won't care about other service-related auth data.

authentication api gateway service service service endpoint /graphql authorization? authorization?
authorization? authorization? We could try to make it so that every resolver calls the authorization service and we merge tokens together to send to the service. Everything we thought of ﬁt this style of authorization was probably doable, but seemed fairly hard.

authentication api gateway service service service endpoint /graphql authorization? authorization?
authorization? authorization? We could try to make it so that every resolver calls the authorization service and we merge tokens together to send to the service.

Everything we thought of ﬁt this style of authorization was
probably doable, but seemed fairly hard to implement right. And if we ever wanted to create a new API, we'd end up doing all this work over again. We'd already duplicated a bunch of code creating v1-5 of the REST API.

authentication api gateway service service service endpoint /graphql authorization authorization
authorization We ended up taking Dan Schafer and team's advice to keep authorization logic out of the API layer. This approach greatly simpliﬁes API development. At this point, we're just passing data back and forth.

separation of concerns push logic to services And I think
that's where you want to be as an API gateway team. Passing data back and forth. It also encourages a good separation of concerns between the API and backends.

easy Sounds easy right?

:\ Unfortunately this meant we had to rethink our current
approach.    It also means convincing a ton of service owners to move business logic into their service.  And we'd just spent some time convincing teams to use this centralized authorization system.

;) But it really feels like the right thing to
do, and we're certain the end result will be better than what we would've ended up with otherwise.

keep it simple Don't make it more complicated than you
can aﬀord. If you have the logical abstractions, make use of them. If you don't, consider planning to make them. Try to keep the API layer as thin as possible, allow services domain over their own business logic.

bit.ly/gqlAuth For a more in depth look, I recommend watching
Dan Schafer's 2016 ReactEurope talk on GraphQL.

com position & pagination authN / authZ production beta PROTOTYPE PRODUCTION ??? By now we had a really compelling prototype. I was introduction GraphQL to diﬀerent teams, and they were getting hyped. We went from (what seemed to me) wondering if anyone would care to "we must have GraphQL everything" immediately. 

Time Contributors Me 50+ contributors Until about 5 or 6
months ago, I was the only one adding types and functionality to the GraphQL API. We got lucky with ﬁnding support. Our web and desktop client team was looking to rewrite the website as a major company initiative. GraphQL was the perfect upgrade to these clients' data fetching components. Suddenly we had bunch of contributions to the API from teams all over the company. I was caught way oﬀ guard.

what i wish i knew: scaling contributions I have some
words of advice for my past self, and hopefully some of you who are currently prototyping GraphQL APIs and showing it around to colleagues, lest you ﬁnd yourself successful.

document good practices It's impossible to give advice on what
to do specific to your backend implementation. There are so many tradeoffs to make that largely depend on the scale you operate. What works for one company may not work for another. As you figure out what works for you, write it down.  Explain why you're taking the current approach.  When the next developer picks up the codebase they'll have a great reference to work off of, and won't be so lost.    Not everyone gets the advantage of months of proof-of-concept work to figure it out.

iterate on a styleguide Create a style guide, so as
you discover best practices for your GraphQL implementation you have a place to iterate on. We're continuously working on ours, as we ﬁnd things that work well and things that break down in certain situations.

write linters Take your style guidelines and codify them in
the form of linters. This will automate away mechanical feedback in pull requests, and help get new developers up to speed without requiring your time.

boilerplate Fight boilerplate. In my experience writing a GraphQL server
in Golang, there's a ton of boilerplate involved in getting some types working in the API.

type User { id: ID! name: String! } func (r
*UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk }

generate code Luckily, go is great for code generation. Aggressively
generate code from the schema, according to the best practices you've documented. Keep iterating on code generation. The dream is to have full schema-to-resolver code generation. We're still only generating resolver skeletons, but hope to convince teams to switch to strongly typed APIs and well-understood standards for their services to make this a reality.

type User { id: ID! name: String! } func (r
*UserResolver) ID() (graphql.ID, error) { user, err := r.loadFn() if err != nil { return graphql.ID(""), err } if user == nil { return graphql.ID(""), errors.New("not found") } return graphql.ID(user.ID), nil } func (r *UserResolver) Name() (string, error) { user, err := r.loadFn() if err != nil { return "", err } if user == nil { return "", errors.New("not found") } return user.Name, nil } type UserResolver struct { loadFn *dataloader.Thunk } Right now we're at the point where we can just about generate everything in simple resolvers like this except for the part where you need to know what the service's data looks like. My hope is that we can convince teams to go beyond their REST JSON APIs and adopt some kind of IDL of their own. Then, we could generate all resolver code just by knowing the mapping of GraphQL schema IDL to backend service IDL.

automate knowledge share I'm going to make a bold statement
and claim that scaling out GraphQL knowledge to colleagues is on order of magnitude harder than any of the technical challenges in building a GraphQL API. Automate knowledge share.

– African Proverb “If you want to go fast, go
alone. If you want to go far, go together.”

go.twitch.tv Encourage you to check out go.twitch.tv and compare experience
with twitch.tv. Currently in beta.

what i wish i knew: nullability & errors BONUS

com position & pagination authN / authZ PROTOTYPE PRODUCTION ??? production beta Since running a beta version of the website against production traﬃc, we've discovered some sticky points around API consumer's contract with nullable types, and error usage that I'd like to share so you can avoid them. 

nullability I made a pretty subtle mistake in our early
prototype with respect to nullability. We didn't catch it until we were ramping up to speed on our new application.

query getNullabilityFail { user(name: "does not exist") { followers {
nodes { id, name } } } } type Query { user(name: String): User } Let's say we have a query to ﬁnd a user by their name.   Semantically speaking, if we specify a name that is not associated with any user, like "does not exist", we should expect to receive null for the user. That's exactly what our schema says should happen based on the types.

nodes { id, name } } } } { "data": { "user": { "followers": { "nodes": [] }  } } } But when we execute the query, we get back a diﬀerent response than we expect.

nodes { id, name } } } } { "data": { "user": { "followers": { "nodes": [] }  } } } X This isn't what we want, because it's not how we deﬁned the schema. Early on, I implemented resolvers to lazy-load data when the ﬁelds are resolved.  This let's us skip a call to the user service if some other service also takes user name. However, this doesn't account for invalid or missing user names.

nodes { id, name } } } } { "data": { "user": null } } Instead we wan to load data at the node level. This allows us to adhere to the schema's semantics.

errors Do the minimum to make errors useful. Assume no
one will inspect errors,

be lazy Do the minimum to make errors useful.

assume Assume no one will take the time to inspect
errors. Many popular clients default to to throwing the entire response away if an error occurs.

mutations bit.ly/gqlMutate Validation errors have a nice home on mutation
payload objects. Reserve the errors array for service-type issues where you're comfortable throwing everything away.

thanks!

Twitch's GraphQL Transformation (with notes)

Twitch's GraphQL Transformation (with notes)

More Decks by Tony Ghita

Other Decks in Technology

Featured

Transcript