GraphQL @ Airbnb - GraphQL Asia 2019

GraphQL @ Airbnb - GraphQL Asia 2019

With the vibrant and growing GraphQL ecosystem and community, it's simpler than ever to start using GraphQL in your project. However, adopting GraphQL incrementally (and carefully!) in huge codebases powering large distributed systems is not quite as straightforward. We'll dive into how Airbnb is tackling this challenge, what we've learned so far, and how we plan to continue evolving our GraphQL infrastructure in the future.

F9988485d0cb71e1ef39134278d7b119?s=128

Adam Miskiewicz

April 13, 2019
Tweet

Transcript

  1. Adopting GraphQL in Large Codebases ADAM MISKIEWICZ / 2019-04-13 /

    GRAPHQL ASIA
  2. Hi! I’m Adam. Hi everybody! My name is Adam Miskiewicz

    and I’m a software engineer at Airbnb I hope you all have had a great conference — I know I’ve been really impressed and inspired by all the great content. I’ve been using GraphQL since the day it was first open-sourced, and it’s really awesome to be here at the first GraphQL conference in Asia.
  3. + I want to spend some time this afternoon telling

    you about Airbnb’s GraphQL journey. It started about a year ago, and it’s been really awesome to see it grow and evolve. My team at Airbnb is called “Client Data”, and we’re currently 100% focused on building what we call at Airbnb, “API v3” — which is the next iteration of the Airbnb API built with GraphQL.
  4. Airbnb is in the middle of a monumental technical effort.

    Airbnb is in the middle of a humongous technical effort.
  5. Airbnb, since the beginning, has had a big ole monolithic

    rails app. Over 2 million lines of Airbnb-engineer-written Ruby is in this codebase — it’s very large. Monorail scaled for 10 years, but at this point: - ~ 1000 engineers are writing PRs in Monorail - Ownership is complex - Rollbacks are common - Perf is bad
  6. Monorail Two years ago, we started breaking apart our monolith

    and moving to a service oriented architecture. And very recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.
  7. Listing Service Monorail Two years ago, we started breaking apart

    our monolith and moving to a service oriented architecture. And very recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.
  8. Monorail Listing Service User Service Two years ago, we started

    breaking apart our monolith and moving to a service oriented architecture. And very recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.
  9. Listing Service User Service Reviews Service Monorail Two years ago,

    we started breaking apart our monolith and moving to a service oriented architecture. And very recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.
  10. Listing Service User Service Reviews Service API Gateway Two years

    ago, we started breaking apart our monolith and moving to a service oriented architecture. And very recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.
  11. PRESENTATION SERVICES DERIVED DATA SERVICES DATA SERVICES Airbnb SOA There

    are many ways to structure a micro-services architecture, but the general structure we’ve chosen is:
  12. PRESENTATION SERVICES DERIVED DATA SERVICES DATA SERVICES Airbnb SOA At

    the bottom of this SOA pyramid, we have “data” services. These services encapsulate a data model for a single entity and own the schema that defines that entity. These services are the building blocks, and for the most part are the only types of services in our architecture that talk to the database directly.
  13. PRESENTATION SERVICES DERIVED DATA SERVICES DATA SERVICES Airbnb SOA Our

    middle tier, or what we call “derived data services”, make up the bulk of our services. They’re services that query those building block data services (or other derived data services), and combine data from multiple entities for use in multiple contexts. This part of the service tier is where the bulk of the business logic lives.
  14. PRESENTATION SERVICES DERIVED DATA SERVICES DATA SERVICES Airbnb SOA At

    the top of our pyramid, presentation services are specialized services that specifically grab data from backend services in the most efficient way possible to present it to the client. They’re exposed to the clients through an API gateway -- each path in our REST API goes to an endpoint on a presentation service. Product teams/verticals own their own presentation service, with shared services supporting underneath. The thinking behind separating services like this was to reduce the “blast radius” of changes across multiple parts of the product — if a team wants to change the API that’s used by a certain page, they can do so with the confidence that they’re only affecting that page. At Airbnb, we’ve chosen to introduce GraphQL at this presentation service layer, rather than reaching down all the way to the data services in GraphQL layer.
  15. TIME SOA Adoption Tech Complexity YOU ARE HERE This SOA

    journey at Airbnb is far from over, and we’re definitely at the top of this curve of tech complexity as a result of SOA.
  16. But nevertheless, we decided last year to throw GraphQL into

    the mix of this effort. Web and mobile engineers were asking for it, and we set out to devise a plan that would let us adopt GraphQL into our architecture with minimal disruption to SOA migration. We needed to introduce this new technology quickly but carefully, and make sure that people’s productivity keeps on the up and up. We’re focusing our efforts on usability to aid organic adoption.
  17. Why ? So why did we want to introduce GraphQL,

    and why now? I won’t go into this ad nauseum -- we’re at a GraphQL conference! And by this point, we know why GraphQL is great, and I feel like the same benefits that other folks derive from GraphQL are similar to Airbnb's wants and needs. But there are a few key standouts:
  18. Why ? • Strong typing across I/O boundaries • Unify

    API interaction across platforms and let the client dictate its data requirements • Improve developer experience and collaboration As we migrate this monolith, getting strong typing across the I/O boundary is a huge win — if done right, it helps us migrate faster and with more confidence. Currently, when a native client requests, say, the main listing screen, it’s requesting a specific “format” of the detail page from the API, which is different from the format that serves airbnb.com. These formats are 100% server defined, which makes iterating difficult, even though all the data is available from an API. GraphQL helps us unify how we request that data across the different clients, and lets the client specify what it needs. GraphQL is also a huge win for developer experience and backend/frontend/mobile collaboration.
  19. React, iOS, Android USER 
 INTERFACE DATA 
 MODELS PLATFORM

    API 
 DEFINITION PRESENTATION
 LOGIC Backend ENGINEERS Frontend Java GraphQL As we developed our plan for introducing GraphQL — we focused on balance. Adopting a GraphQL-first API, or as some folks call it at Airbnb — GraphQL the religion — wasn’t an option for the first iteration. Migration into the SOA was the first priority, and we needed to work within the constraints of our existing effort as we decided how to move forward.
  20. Airbnb ❤ Thrift At Airbnb, all of our services have

    an associated Thrift IDL that define their endpoints + data shapes. [if you’re not familiar with Thrift] This gives us a standardized way to communicate between services.
  21. Presentation Service Framework At the presentation layer, we have a

    “presentation service framework” that provides numerous features out of the box — metrics and alerting, policy checks, content moderation, etc, and it’s tightly tied to our Thrift definitions, something we call “Service IDL” internally.
  22. It looks something like this. This endpoint exposed through the

    Thrift IDL directly corresponds to a REST endpoint — in this case / pdp_listing_details.
  23. REST Gateway Presentation Service Presentation Service Presentation Service Legacy 


    Web/Native Clients These endpoints are, as you may expect, exposed through a REST gateway and accessed by clients.
  24. Presentation Service GraphQL To introduce GraphQL, we’ve built a layer

    on top of the aforementioned presentation service framework. We turn the presentation services into GraphQL services themselves, and generate GraphQL schema directly from the Thrift IDL. We then stitch the schemas from each of these services together in a GraphQL gateway. This allows us to continue to query the RESTful versions of these API endpoints, while still exposing them through GraphQL. This gives us a great way to allow for presentation services to serve API v2 and API v3 simultaneously with little extra effort on the part of the backend engineer.
  25. REST Gateway Presentation Service Presentation Service Presentation Service Legacy 


    Web/Native Clients This is our new architecture. Our legacy clients are still making multiple requests to the REST gateway using normal REST-like endpoints, and that gateway is fanning out to the presentation services. But our new, modern, GraphQL clients talk directly to the GraphQL gateway. Instead of the clients making multiple requests to the REST gateway, the GraphQL gateway can multiplex those requests instead. We’ve specifically made the GraphQL gateway sit on top of the REST gateway again to enable ease of use and onboarding. The REST gateway provides lots of middleware — session handling, risk checks, etc — and this architecture allows the GraphQL gateway to use all of that work wholesale and not reimplement.
  26. Modern 
 Web/Native Clients GraphQL Gateway REST Gateway Presentation Service

    Presentation Service Presentation Service Legacy 
 Web/Native Clients This is our new architecture. Our legacy clients are still making multiple requests to the REST gateway using normal REST-like endpoints, and that gateway is fanning out to the presentation services. But our new, modern, GraphQL clients talk directly to the GraphQL gateway. Instead of the clients making multiple requests to the REST gateway, the GraphQL gateway can multiplex those requests instead. We’ve specifically made the GraphQL gateway sit on top of the REST gateway again to enable ease of use and onboarding. The REST gateway provides lots of middleware — session handling, risk checks, etc — and this architecture allows the GraphQL gateway to use all of that work wholesale and not reimplement.
  27. So what does this look like in practice? Here’s a

    partial definition for a service, called “Merlin”, which serves endpoints that power the homes listing detail page on airbnb.com. To enable this endpoint for GraphQL…
  28. a service engineer needs to simply add this Thrift annotation.

    To expose a mutation, they would just say “graphql_operation_type = mutation”.
  29. From this Thrift IDL, we’re codegen’ing Java POJOs. We’re using

    the awesome graphql-java project, and so we leverage “graphql-java-annotations” to be able to easily hook GraphQL schema generation into our existing Thrift codegen step.
  30. As soon as they merge their change to the Thrift

    IDL, the GraphQL schema is updated and a product engineer is able to call this API through GraphQL. The resulting GraphQL schema that we provide through the GraphQL gateway is a stitched version of all the downstream presentation service schemas. As is illustrated here, for each service, we add a top level field — “merlin”, in this case — and then under each top level field we nest each endpoint, namespacing the types accordingly.
  31. The client then queries like this.

  32. Wait… Anyone in the audience who is very familiar with

    GraphQL will notice that this is basically exposing RPC endpoints through GraphQL, which many would consider an anti-pattern. Yes. That’s exactly what we’re doing. We’re doing that because it’s the quickest way to get our backend services, already in the depths of a migration from our old monolith to SOA, to be able to quickly onboard to GraphQL. However…this hasn’t been a completely smooth experience. Let me show you two tweets, from, you guessed it, Lee Byron:
  33. None
  34. None
  35. Narrator: Lee was right. Well….Lee, of course, was right. When

    going into this project, we knew that the resulting schema wouldn’t be idiomatic GraphQL, and we were ok with that. What was harder to predict was the edge cases that we’ve run into in generating GraphQL from Thrift. For instance…
  36. Check out this union type as defined in Thrift IDL.

    Anyone who’s familiar with GraphQL in the audience may be able to pick out the problem right away… This union contains scalars!
  37. Check out this union type as defined in Thrift IDL.

    Anyone who’s familiar with GraphQL in the audience may be able to pick out the problem right away… This union contains scalars!
  38. To work around this difference between GraphQL and Thrift, we

    end up doing something like this. Gross.
  39. Another issue — how do you model something like this

    in Thrift? GraphQL has this awesome “interface” feature, but Thrift has no polymorphism of any kind.
  40. Well, the answer is, you use crazy Thrift annotations and

    a bunch of black magic in the compilation step to emulate the behavior.
  41. This makes me sad.

  42. But wait! The implements_interface thing isn’t the only terrible thing

    in this code snippet.
  43. Look at this craziness. What is this for you ask?

  44. Well looking back at our earlier example, Thrift has no

    concept of “non nullability within lists”. So we have to use Thrift annotations to model that as well. I want to show you one more pitfall. This is very cathartic for me.
  45. Well looking back at our earlier example, Thrift has no

    concept of “non nullability within lists”. So we have to use Thrift annotations to model that as well. I want to show you one more pitfall. This is very cathartic for me.
  46. Take a look at this example, again we’re looking at

    the Thrift IDL here. Notice that I have a type called “Status” in my type definitions here, but I’m importing another type called “Status” from `my_other_service`. Ideally, this would “just work”. But alas, it does not. Because Thrift has namespaces, whereas GraphQL assumes one global type namespace, there’s a conflict here, and our GraphQL schema generation doesn’t know how to properly resolve this imported type since it has the same name as a type in our service. This is a really tough problem to solve. So tough in fact, that we don’t have a solution for this problem yet, right now we just disallow importing other types from outside your service’s Thrift definitions.
  47. Tradeoffs. Phew. Ok. Deep breath. So yah, there are some

    issues here. But it’s all about tradeoffs. Even though there’s some strange problems in our Thrift -> GraphQL conversion, and they bother me deeply, I don’t regret what we’ve done here. On the frontend and native clients, we can still get a ton of benefit by adopting even this “presentation service GraphQL” — removing Redux boilerplate by leveraging Apollo Client, reducing over-fetching, etc. I like to think of this as the gateway drug to GraphQL @ Airbnb. It’s not perfect, certainly, but we’re able to start bring GraphQL to the forefront of peoples minds, and over time, that makes it easier to make GraphQL more of a first class citizen in our architecture. And this is working — GraphQL _is_ being adopted by teams across the company, and generally the sentiment is quite good.
  48. What excites me is that there’s a huge future for

    GraphQL at Airbnb, and we’re just getting started.
  49. GraphQL gives us the ability to evolve our API in

    ways that weren’t possible before - Complex field selection - Optimized backend data fetching - Greater observability tooling - Know which fields are being used
  50. GraphQL Native At the end of the day though —

    we really want to get to a place where GraphQL is very deeply engrained in our presentation layer. I’m unabashedly stealing a phrase from Nick Schrock here, but we want to take Airbnb’s architecture “GraphQL Native”. Right now, much of Airbnb’s technical work is focused on migration to SOA. But as more parts of our stack our moved out of the monolith, we have an opportunity to go GraphQL-native with our presentation services and build a GraphQL-first presentation service framework. Rather than have a bunch of RPC endpoints exposed through GraphQL, we can think of each presentation service as just a GraphQL server, with the same ownership boundaries that we gained with presentation services, but while enabling us to do much more intelligent schema stitching and downstream data fetching.
  51. Schema Federation? Build-time Schema Stitching? We’ve been investigating and prototyping

    schema federation, a new, different type of schema stitching that let’s the gateway remain lightweight, while still being able to properly express relationships between different types in the schema, even if these types are owned by different services. Another route that we’re excited to explore is doing schema stitching at build time, rather than at runtime.
  52. We’re still iterating. GraphQL as it exists today at Airbnb

    is a foothold and a solid beginning — it’s easy to get started for backend engineers and allows FE/Native engineers to get some of the benefits of working with GraphQL on the client. What I want to leave you with today is that as you all are introducing GraphQL into your organizations — pick a north star, but don’t be afraid to make compromises along the way. GraphQL is a wonderfully flexible technology, and there isn’t a one-size-fits-all use case. How you use and introduce GraphQL into your org is highly dependent on your organizational and technological structure.
  53. Thanks again for your time today! I feel really honored

    to have been part of this conference. It’s been really great to be here. For those of you from afar, have safe travels home, and for the rest of you, catch you the next time I’m in India!