Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GraphQL @ Airbnb - GraphQL Asia 2019

GraphQL @ Airbnb - GraphQL Asia 2019

With the vibrant and growing GraphQL ecosystem and community, it's simpler than ever to start using GraphQL in your project. However, adopting GraphQL incrementally (and carefully!) in huge codebases powering large distributed systems is not quite as straightforward. We'll dive into how Airbnb is tackling this challenge, what we've learned so far, and how we plan to continue evolving our GraphQL infrastructure in the future.

Adam Miskiewicz

April 13, 2019
Tweet

Other Decks in Programming

Transcript

  1. Adopting GraphQL in Large
    Codebases
    ADAM MISKIEWICZ / 2019-04-13 / GRAPHQL ASIA

    View Slide

  2. Hi! I’m Adam.
    Hi everybody! My name is Adam Miskiewicz and I’m a software engineer at Airbnb
    I hope you all have had a great conference — I know I’ve been really impressed and inspired by all the great
    content. I’ve been using GraphQL since the day it was first open-sourced, and it’s really awesome to be here at
    the first GraphQL conference in Asia.

    View Slide

  3. +
    I want to spend some time this afternoon telling you about Airbnb’s GraphQL journey. It started about a year
    ago, and it’s been really awesome to see it grow and evolve.
    My team at Airbnb is called “Client Data”, and we’re currently 100% focused on building what we call at Airbnb,
    “API v3” — which is the next iteration of the Airbnb API built with GraphQL.

    View Slide

  4. Airbnb is in the middle of a
    monumental technical effort.
    Airbnb is in the middle of a humongous technical effort.

    View Slide

  5. Airbnb, since the beginning, has had a big ole monolithic rails app. Over 2 million lines of Airbnb-engineer-written
    Ruby is in this codebase — it’s very large.
    Monorail scaled for 10 years, but at this point:
    - ~ 1000 engineers are writing PRs in Monorail
    - Ownership is complex
    - Rollbacks are common
    - Perf is bad

    View Slide

  6. Monorail
    Two years ago, we started breaking apart our monolith and moving to a service oriented architecture. And very
    recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.

    View Slide

  7. Listing
    Service
    Monorail
    Two years ago, we started breaking apart our monolith and moving to a service oriented architecture. And very
    recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.

    View Slide

  8. Monorail
    Listing
    Service
    User
    Service
    Two years ago, we started breaking apart our monolith and moving to a service oriented architecture. And very
    recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.

    View Slide

  9. Listing
    Service
    User
    Service
    Reviews
    Service
    Monorail
    Two years ago, we started breaking apart our monolith and moving to a service oriented architecture. And very
    recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.

    View Slide

  10. Listing
    Service
    User
    Service
    Reviews
    Service
    API
    Gateway
    Two years ago, we started breaking apart our monolith and moving to a service oriented architecture. And very
    recently, we put a feature freeze on all new “monorail” development…so we’re all in on this new world of SOA.

    View Slide

  11. PRESENTATION
    SERVICES
    DERIVED DATA SERVICES
    DATA SERVICES
    Airbnb SOA
    There are many ways to structure a micro-services architecture, but the general structure we’ve chosen is:

    View Slide

  12. PRESENTATION
    SERVICES
    DERIVED DATA SERVICES
    DATA SERVICES
    Airbnb SOA
    At the bottom of this SOA pyramid, we have “data” services. These services encapsulate a data model for a
    single entity and own the schema that defines that entity. These services are the building blocks, and for the
    most part are the only types of services in our architecture that talk to the database directly.

    View Slide

  13. PRESENTATION
    SERVICES
    DERIVED DATA SERVICES
    DATA SERVICES
    Airbnb SOA
    Our middle tier, or what we call “derived data services”, make up the bulk of our services. They’re services that
    query those building block data services (or other derived data services), and combine data from multiple
    entities for use in multiple contexts. This part of the service tier is where the bulk of the business logic lives.

    View Slide

  14. PRESENTATION
    SERVICES
    DERIVED DATA SERVICES
    DATA SERVICES
    Airbnb SOA
    At the top of our pyramid, presentation services are specialized services that specifically grab data from
    backend services in the most efficient way possible to present it to the client.
    They’re exposed to the clients through an API gateway -- each path in our REST API goes to an endpoint on a
    presentation service.
    Product teams/verticals own their own presentation service, with shared services supporting underneath. The
    thinking behind separating services like this was to reduce the “blast radius” of changes across multiple parts of
    the product — if a team wants to change the API that’s used by a certain page, they can do so with the
    confidence that they’re only affecting that page.
    At Airbnb, we’ve chosen to introduce GraphQL at this presentation service layer, rather than reaching down all
    the way to the data services in GraphQL layer.

    View Slide

  15. TIME
    SOA Adoption
    Tech Complexity
    YOU ARE
    HERE
    This SOA journey at Airbnb is far from over, and we’re definitely at the top of this curve of tech complexity as a
    result of SOA.

    View Slide

  16. But nevertheless, we decided last year to throw GraphQL into the mix of this effort. Web and mobile engineers
    were asking for it, and we set out to devise a plan that would let us adopt GraphQL into our architecture with
    minimal disruption to SOA migration.
    We needed to introduce this new technology quickly but carefully, and make sure that people’s productivity
    keeps on the up and up.
    We’re focusing our efforts on usability to aid organic adoption.

    View Slide

  17. Why ?
    So why did we want to introduce GraphQL, and why now?
    I won’t go into this ad nauseum -- we’re at a GraphQL conference! And by this point, we know why GraphQL is
    great, and I feel like the same benefits that other folks derive from GraphQL are similar to Airbnb's wants and
    needs.
    But there are a few key standouts:

    View Slide

  18. Why ?
    • Strong typing across I/O boundaries
    • Unify API interaction across platforms and let the client dictate its
    data requirements
    • Improve developer experience and collaboration
    As we migrate this monolith, getting strong typing across the I/O boundary is a huge win — if done right, it helps
    us migrate faster and with more confidence.
    Currently, when a native client requests, say, the main listing screen, it’s requesting a specific “format” of the
    detail page from the API, which is different from the format that serves airbnb.com. These formats are 100%
    server defined, which makes iterating difficult, even though all the data is available from an API. GraphQL helps
    us unify how we request that data across the different clients, and lets the client specify what it needs.
    GraphQL is also a huge win for developer experience and backend/frontend/mobile collaboration.

    View Slide

  19. React, iOS, Android
    USER 

    INTERFACE
    DATA 

    MODELS
    PLATFORM
    API 

    DEFINITION
    PRESENTATION

    LOGIC
    Backend
    ENGINEERS Frontend
    Java GraphQL
    As we developed our plan for introducing GraphQL — we focused on balance. Adopting a GraphQL-first API, or
    as some folks call it at Airbnb — GraphQL the religion — wasn’t an option for the first iteration. Migration into the
    SOA was the first priority, and we needed to work within the constraints of our existing effort as we decided how
    to move forward.

    View Slide

  20. Airbnb ❤ Thrift
    At Airbnb, all of our services have an associated Thrift IDL that define their endpoints + data shapes. [if you’re
    not familiar with Thrift] This gives us a standardized way to communicate between services.

    View Slide

  21. Presentation Service Framework
    At the presentation layer, we have a “presentation service framework” that provides numerous features out of
    the box — metrics and alerting, policy checks, content moderation, etc, and it’s tightly tied to our Thrift
    definitions, something we call “Service IDL” internally.

    View Slide

  22. It looks something like this.
    This endpoint exposed through the Thrift IDL directly corresponds to a REST endpoint — in this case /
    pdp_listing_details.

    View Slide

  23. REST Gateway
    Presentation Service Presentation Service Presentation Service
    Legacy 

    Web/Native Clients
    These endpoints are, as you may expect, exposed through a REST gateway and accessed by clients.

    View Slide

  24. Presentation Service GraphQL
    To introduce GraphQL, we’ve built a layer on top of the aforementioned presentation service framework.
    We turn the presentation services into GraphQL services themselves, and generate GraphQL schema directly
    from the Thrift IDL. We then stitch the schemas from each of these services together in a GraphQL gateway.
    This allows us to continue to query the RESTful versions of these API endpoints, while still exposing them through
    GraphQL.
    This gives us a great way to allow for presentation services to serve API v2 and API v3 simultaneously with little
    extra effort on the part of the backend engineer.

    View Slide

  25. REST Gateway
    Presentation Service Presentation Service Presentation Service
    Legacy 

    Web/Native Clients
    This is our new architecture.
    Our legacy clients are still making multiple requests to the REST gateway using normal REST-like endpoints, and
    that gateway is fanning out to the presentation services.
    But our new, modern, GraphQL clients talk directly to the GraphQL gateway. Instead of the clients making
    multiple requests to the REST gateway, the GraphQL gateway can multiplex those requests instead.
    We’ve specifically made the GraphQL gateway sit on top of the REST gateway again to enable ease of use and
    onboarding. The REST gateway provides lots of middleware — session handling, risk checks, etc — and this
    architecture allows the GraphQL gateway to use all of that work wholesale and not reimplement.

    View Slide

  26. Modern 

    Web/Native Clients
    GraphQL Gateway
    REST Gateway
    Presentation Service Presentation Service Presentation Service
    Legacy 

    Web/Native Clients
    This is our new architecture.
    Our legacy clients are still making multiple requests to the REST gateway using normal REST-like endpoints, and
    that gateway is fanning out to the presentation services.
    But our new, modern, GraphQL clients talk directly to the GraphQL gateway. Instead of the clients making
    multiple requests to the REST gateway, the GraphQL gateway can multiplex those requests instead.
    We’ve specifically made the GraphQL gateway sit on top of the REST gateway again to enable ease of use and
    onboarding. The REST gateway provides lots of middleware — session handling, risk checks, etc — and this
    architecture allows the GraphQL gateway to use all of that work wholesale and not reimplement.

    View Slide

  27. So what does this look like in practice?
    Here’s a partial definition for a service, called “Merlin”, which serves endpoints that power the homes listing
    detail page on airbnb.com.
    To enable this endpoint for GraphQL…

    View Slide

  28. a service engineer needs to simply add this Thrift annotation. To expose a mutation, they would just say
    “graphql_operation_type = mutation”.

    View Slide

  29. From this Thrift IDL, we’re codegen’ing Java POJOs. We’re using the awesome graphql-java project, and so we
    leverage “graphql-java-annotations” to be able to easily hook GraphQL schema generation into our existing
    Thrift codegen step.

    View Slide

  30. As soon as they merge their change to the Thrift IDL, the GraphQL schema is updated and a product engineer is
    able to call this API through GraphQL.
    The resulting GraphQL schema that we provide through the GraphQL gateway is a stitched version of all the
    downstream presentation service schemas.
    As is illustrated here, for each service, we add a top level field — “merlin”, in this case — and then under each top
    level field we nest each endpoint, namespacing the types accordingly.

    View Slide

  31. The client then queries like this.

    View Slide

  32. Wait…
    Anyone in the audience who is very familiar with GraphQL will notice that this is basically exposing RPC
    endpoints through GraphQL, which many would consider an anti-pattern.
    Yes. That’s exactly what we’re doing.
    We’re doing that because it’s the quickest way to get our backend services, already in the depths of a migration
    from our old monolith to SOA, to be able to quickly onboard to GraphQL.
    However…this hasn’t been a completely smooth experience.
    Let me show you two tweets, from, you guessed it, Lee Byron:

    View Slide

  33. View Slide

  34. View Slide

  35. Narrator: Lee was right.
    Well….Lee, of course, was right.
    When going into this project, we knew that the resulting schema wouldn’t be idiomatic GraphQL, and we were
    ok with that. What was harder to predict was the edge cases that we’ve run into in generating GraphQL from
    Thrift.
    For instance…

    View Slide

  36. Check out this union type as defined in Thrift IDL.
    Anyone who’s familiar with GraphQL in the audience may be able to pick out the problem right away…
    This union contains scalars!

    View Slide

  37. Check out this union type as defined in Thrift IDL.
    Anyone who’s familiar with GraphQL in the audience may be able to pick out the problem right away…
    This union contains scalars!

    View Slide

  38. To work around this difference between GraphQL and Thrift, we end up doing something like this. Gross.

    View Slide

  39. Another issue — how do you model something like this in Thrift? GraphQL has this awesome “interface” feature,
    but Thrift has no polymorphism of any kind.

    View Slide

  40. Well, the answer is, you use crazy Thrift annotations and a bunch of black magic in the compilation step to
    emulate the behavior.

    View Slide

  41. This makes me sad.

    View Slide

  42. But wait! The implements_interface thing isn’t the only terrible thing in this code snippet.

    View Slide

  43. Look at this craziness. What is this for you ask?

    View Slide

  44. Well looking back at our earlier example, Thrift has no concept of “non nullability within lists”. So we have to use
    Thrift annotations to model that as well.
    I want to show you one more pitfall. This is very cathartic for me.

    View Slide

  45. Well looking back at our earlier example, Thrift has no concept of “non nullability within lists”. So we have to use
    Thrift annotations to model that as well.
    I want to show you one more pitfall. This is very cathartic for me.

    View Slide

  46. Take a look at this example, again we’re looking at the Thrift IDL here. Notice that I have a type called “Status” in
    my type definitions here, but I’m importing another type called “Status” from `my_other_service`.
    Ideally, this would “just work”. But alas, it does not.
    Because Thrift has namespaces, whereas GraphQL assumes one global type namespace, there’s a conflict here,
    and our GraphQL schema generation doesn’t know how to properly resolve this imported type since it has the
    same name as a type in our service.
    This is a really tough problem to solve. So tough in fact, that we don’t have a solution for this problem yet, right
    now we just disallow importing other types from outside your service’s Thrift definitions.

    View Slide

  47. Tradeoffs.
    Phew. Ok. Deep breath.
    So yah, there are some issues here. But it’s all about tradeoffs. Even though there’s some strange problems in our
    Thrift -> GraphQL conversion, and they bother me deeply, I don’t regret what we’ve done here.
    On the frontend and native clients, we can still get a ton of benefit by adopting even this “presentation service
    GraphQL” — removing Redux boilerplate by leveraging Apollo Client, reducing over-fetching, etc.
    I like to think of this as the gateway drug to GraphQL @ Airbnb. It’s not perfect, certainly, but we’re able to start
    bring GraphQL to the forefront of peoples minds, and over time, that makes it easier to make GraphQL more of
    a first class citizen in our architecture.
    And this is working — GraphQL _is_ being adopted by teams across the company, and generally the sentiment is
    quite good.

    View Slide

  48. What excites me is that there’s a huge future for GraphQL at Airbnb, and we’re just getting started.

    View Slide

  49. GraphQL gives us the ability to
    evolve our API in ways that weren’t
    possible before
    - Complex field selection
    - Optimized backend data fetching
    - Greater observability tooling
    - Know which fields are being used

    View Slide

  50. GraphQL Native
    At the end of the day though — we really want to get to a place where GraphQL is very deeply engrained in our
    presentation layer.
    I’m unabashedly stealing a phrase from Nick Schrock here, but we want to take Airbnb’s architecture “GraphQL
    Native”.
    Right now, much of Airbnb’s technical work is focused on migration to SOA. But as more parts of our stack our
    moved out of the monolith, we have an opportunity to go GraphQL-native with our presentation services and
    build a GraphQL-first presentation service framework.
    Rather than have a bunch of RPC endpoints exposed through GraphQL, we can think of each presentation
    service as just a GraphQL server, with the same ownership boundaries that we gained with presentation
    services, but while enabling us to do much more intelligent schema stitching and downstream data fetching.

    View Slide

  51. Schema Federation?
    Build-time Schema Stitching?
    We’ve been investigating and prototyping schema federation, a new, different type of schema stitching that
    let’s the gateway remain lightweight, while still being able to properly express relationships between different
    types in the schema, even if these types are owned by different services.
    Another route that we’re excited to explore is doing schema stitching at build time, rather than at runtime.

    View Slide

  52. We’re still iterating.
    GraphQL as it exists today at Airbnb is a foothold and a solid beginning — it’s easy to get started for backend
    engineers and allows FE/Native engineers to get some of the benefits of working with GraphQL on the client.
    What I want to leave you with today is that as you all are introducing GraphQL into your organizations — pick a
    north star, but don’t be afraid to make compromises along the way. GraphQL is a wonderfully flexible
    technology, and there isn’t a one-size-fits-all use case. How you use and introduce GraphQL into your org is
    highly dependent on your organizational and technological structure.

    View Slide

  53. Thanks again for your time today! I feel really honored to have been part of this conference. It’s been really great
    to be here. For those of you from afar, have safe travels home, and for the rest of you, catch you the next time
    I’m in India!

    View Slide