Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From 1 to 201 Lambda functions in production: Evolving a serverless startup architecture

From 1 to 201 Lambda functions in production: Evolving a serverless startup architecture

My slides from AWS Community Day Turkey 2023 (https://aws.cloudturkey.io/)

----

Building a serverless function or an API is easy. However, things get a bit more complicated as your application grows. What works for a few functions often doesn't work for hundreds of functions and services. As your application grows, you'll need to evolve your architecture, deployment, monitoring, and tooling.

This talk is a case study of the serverless startup's architecture evolution. We started with a single Lambda function in early 2018 and evolved our application through multiple stages and architectures. Currently, the application uses CQRS with GraphQL and 200 Lambda functions serving millions of requests. We faced and solved many issues during the last four years, learned many things, and managed to keep our infrastructure costs low.

Slobodan Stojanović

May 06, 2023
Tweet

More Decks by Slobodan Stojanović

Other Decks in Programming

Transcript

  1. A story
    about building
    a profitable bootstrapped
    startup using serverless.

    View Slide

  2. @slobodan_
    Our journey
    starts with a prototype,
    continues with an MVP,
    and follows the evolution to
    a real product.

    View Slide

  3. @slobodan_
    We faced many issues
    and learned a lot along the way.

    View Slide

  4. Evolving a serverless
    startup architecture
    From 1 to 201 Lambda functions in production

    View Slide

  5. Evolving a serverless
    startup architecture
    From 1 to 201 Lambda functions in production

    View Slide

  6. @slobodan_
    The problem & the idea

    View Slide

  7. @slobodan_
    Everything started in 2016
    with a problem in our other company.

    View Slide

  8. @slobodan_
    Everything started in 2017
    with an idea, a prototype, and a landing page.

    View Slide

  9. @slobodan_
    Everything really started in 2018
    with an MVP.

    View Slide

  10. @slobodan_
    A simple idea
    •Track leave requests and a number of remaining PTO days
    •Use SSO to avoid another username and password to
    remember
    •Integrate with Slack to make and approve PTO requests
    •Show events in a Google calendar

    View Slide

  11. @slobodan_
    How hard can it be?

    View Slide

  12. @slobodan_
    Slobodan Stojanović
    CTO and co-founder of the product I am talking about
    co-author of Serverless Apps with Node.js book
    AWS Serverless Hero
    JS Belgrade meetup organizer

    View Slide

  13. @slobodan_
    Architecture

    View Slide

  14. @slobodan_
    A prototype

    View Slide

  15. @slobodan_
    An MVP architecture:
    a simple serverless bot*

    View Slide

  16. @slobodan_
    Why serverless?
    •Faster - it was fast to build a prototype and an MVP
    •Less - we outsourced scaling, maintenance, and security
    •Focused - we focused on the business logic and minimized
    time spent on everything else
    •Cheaper - the cost scales with users, and it starts with $0

    View Slide

  17. View Slide

  18. View Slide

  19. @slobodan_
    Benefits
    •Quick and independent deployments
    •Easy to understand and maintain
    •Easy to onboard new people
    •Cheap

    View Slide

  20. @slobodan_
    The cost of the infrastructure:
    $0,00

    View Slide

  21. @slobodan_
    Adding new features

    View Slide

  22. @slobodan_
    Downsides
    •Independent deployment for each Lambda function
    •Hard to manage a!er we added 10 Lambda functions
    •Hard to scale, as a critical part wasn't serverless
    •An obvious bottleneck

    View Slide

  23. @slobodan_
    ~100 paying teams

    View Slide

  24. @slobodan_
    The first architecture iteration:
    a complex serverless
    spaghetti

    View Slide

  25. @slobodan_
    The first architecture iteration:
    a complex serverless
    application

    View Slide

  26. @slobodan_
    Infrastructure as code (IaC)
    &
    first microservices

    View Slide

  27. @slobodan_
    Microservices

    View Slide

  28. View Slide

  29. @slobodan_
    We had ~150 Lambda functions

    View Slide

  30. @slobodan_
    First migrations
    from an old service
    to a new one

    View Slide

  31. @slobodan_
    We realized that we'll need to
    find the right architecture for
    our problem

    View Slide

  32. @slobodan_
    Most significant changes
    •(Almost) everything was in AWS CloudFormation (IaC)
    •We replaced Node.js server with serverless services
    •We started using TypeScript instead of JavaScript
    •We started a migration from MongoDB to DynamoDB

    View Slide

  33. @slobodan_
    Benefits
    •Easier deployments
    •App was auto-scalable
    •We had almost 100% uptime out-of-the-box
    •Still very cheap (our infrastructure cost was less than $100/
    month)

    View Slide

  34. @slobodan_
    Downsides
    •The big flaw in our system design: we were storing a state
    and not events in our database
    •We were still wasting a lot of time on less important things
    •Project complexity and the number of new services
    increased, and it was harder to onboard new developers
    •Developers don't like YAML

    View Slide

  35. @slobodan_
    ~600 paying teams

    View Slide

  36. @slobodan_
    The second architecture
    iteration:
    an event-driven system
    with microservices

    View Slide

  37. @slobodan_
    The quest for an architecture
    that solves our problem

    View Slide

  38. @slobodan_
    In the end, we decided to use
    Event Sourcing
    &
    Command Query
    Responsibility Segregation
    (CQRS)

    View Slide

  39. @slobodan_
    Why Event Sourcing
    and CQRS?

    View Slide

  40. @slobodan_
    Storing state
    vs
    storing events

    View Slide

  41. @slobodan_
    A common flow in our app
    •Ana created a new location and moved John and Mike to that location
    •Ana assigned Mike as an approver
    •Ana made a leave policy (20 PTO days per year)
    •John requested leave, and Ana approved it
    •Brought forward event happened and five unused days are transferred to the next
    year balance
    •Ana changed John's working week
    •Mike added some past leaves for John
    •Alex moved John to another location with a different policy

    View Slide

  42. @slobodan_
    How do we calculate John's
    remaining PTO days?
    Events and CQRS to the rescue.
    As a bonus, we got everything we need for the audit logs.

    View Slide

  43. View Slide

  44. 1. The client sends an API POST
    request or GraphQL mutation
    2. The event is stored in the Events
    table (append-only, no edits)
    3. The DynamoDB streams the event to the Lambda
    function that sends it to the EventBridge event bus
    4. EventBridge triggers the specific
    business logic Lambda function
    5. The business logic Lambda stores the "cached"
    data to one of the read-only DynamoDB tables
    6. And then triggers the
    mutation that sends a
    "fake" mutation
    7. A "fake" mutation
    triggers the GraphQl
    subscription to notify
    the clients
    8. App uses an event bus
    to "route" the response
    to the user's platform

    View Slide

  45. Client then query the read-only
    tables directly using GraphQl

    View Slide

  46. Client then query the read-only
    tables directly using GraphQl
    Or using the RESTful API
    in case of bots

    View Slide

  47. We decreased the number of
    Lambda functions to 101!
    But now we have 230 functions
    in production !

    View Slide

  48. @slobodan_
    Benefits
    •Fully Managed GraphQL
    •Less code
    •Better control
    •All benefits from the previous architecture (without the
    spaghetti part)
    •Monorepo and shared types

    View Slide

  49. @slobodan_
    Downsides
    •YAML is still very important (but we also added CDK for
    some smaller services)
    •Many new AWS services to learn
    •Velocity templates

    View Slide

  50. Lambda functions → business
    logic
    Other services →
    transformation and
    orchestration

    View Slide

  51. @slobodan_
    ~2000 paying teams

    View Slide

  52. @slobodan_
    The cost of the infrastructure:
    ~1% of MRR
    ~$1250

    View Slide

  53. @slobodan_
    Architecture should evolve with
    your product.
    Don't waste your time making it
    perfect for the MVP.

    View Slide

  54. @slobodan_
    Development & Testing

    View Slide

  55. @slobodan_
    Common questions during onboarding
    •How do I run this locally?
    •How do we debug errors?
    •What is a DynamoDB single-table design, and why do we
    store all events in the same table?
    •What did we just do?

    View Slide

  56. @slobodan_
    It's impossible to run the whole
    serverless application locally
    But if you are learning to be a pilot, you don't start with driving
    a plane in your backyard !

    View Slide

  57. @slobodan_
    You can simulate parts of your
    application locally
    Think of it as an early version of a "Flight Simulator"
    It's helpful, but probably far from the complete set of
    tools to learn how to fly the plane

    View Slide

  58. @slobodan_
    What can we do?
    •A Lambda function is just a function - you can run it locally
    as any other function*
    •You can use SAM Local or similar tools to run (a simulation
    of) a Lambda function locally in the Docker container
    •You can test in the cloud
    •You can waste your time trying to simulate everything
    locally

    View Slide

  59. @slobodan_
    Or you can write tests

    View Slide

  60. @slobodan_
    How do we test a common client-server app?

    View Slide

  61. @slobodan_
    How do we test a serverless app?

    View Slide

  62. @slobodan_
    How do we test a serverless app?

    View Slide

  63. @slobodan_
    Hexagonal architecture

    View Slide

  64. @slobodan_
    Anatomy of a Lambda function

    View Slide

  65. @slobodan_
    Anatomy of a Lambda function

    View Slide

  66. @slobodan_
    Anatomy of a Lambda function

    View Slide

  67. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  68. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  69. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  70. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  71. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  72. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  73. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  74. @slobodan_
    TypeScript example (business logic)
    interface IParams {
    event: T
    parser: (T) => IParsedData
    repositories: {
    users: IUserRepository
    notifications: INotificationRepository
    }
    }
    export async function businessLogic(params: IParams): Promise {
    const {event, parser, repositories} = params
    try {
    const parsedData = parser(event)
    await repositories.users.invite(parsedData.email)
    await repositories.notifications.invitationSent()
    } catch (err) {
    await repositories.notifications.failure(err)
    }
    }

    View Slide

  75. @slobodan_
    TypeScript example (handler)
    export async function handler(event: APIGatewayProxyEvent): Promise {
    try {
    const userRepository = new UsersDynamoDBRepository(process.env.USERS_TABLE)
    const eventBridgeRepository = new EventBridgeRepository(process.env.EVENT_BUS)
    await businessLogic({
    event,
    parser: parseApiGatewayEvent,
    repositories: {
    users: userRepository,
    notifications: eventBridgeRepository
    }
    })
    await apiGatewaySuccessResponse()
    } catch(err) {
    await apiGatewayErrorResponse()
    }
    }

    View Slide

  76. @slobodan_
    TypeScript example (handler)
    export async function handler(event: APIGatewayProxyEvent): Promise {
    try {
    const userRepository = new UsersDynamoDBRepository(process.env.USERS_TABLE)
    const eventBridgeRepository = new EventBridgeRepository(process.env.EVENT_BUS)
    await businessLogic({
    event,
    parser: parseApiGatewayEvent,
    repositories: {
    users: userRepository,
    notifications: eventBridgeRepository
    }
    })
    await apiGatewaySuccessResponse()
    } catch(err) {
    await apiGatewayErrorResponse()
    }
    }

    View Slide

  77. @slobodan_
    TypeScript example (handler)
    export async function handler(event: APIGatewayProxyEvent): Promise {
    try {
    const userRepository = new UsersDynamoDBRepository(process.env.USERS_TABLE)
    const eventBridgeRepository = new EventBridgeRepository(process.env.EVENT_BUS)
    await businessLogic({
    event,
    parser: parseApiGatewayEvent,
    repositories: {
    users: userRepository,
    notifications: eventBridgeRepository
    }
    })
    await apiGatewaySuccessResponse()
    } catch(err) {
    await apiGatewayErrorResponse()
    }
    }

    View Slide

  78. @slobodan_

    View Slide

  79. @slobodan_

    View Slide

  80. @slobodan_

    View Slide

  81. @slobodan_

    View Slide

  82. @slobodan_

    View Slide

  83. @slobodan_
    Also useful for migrations

    View Slide

  84. @slobodan_

    View Slide

  85. @slobodan_

    View Slide

  86. @slobodan_

    View Slide

  87. @slobodan_
    Integration tests

    View Slide

  88. @slobodan_
    Testing in the cloud

    View Slide

  89. @slobodan_
    describe('DynamoDB repository', () => {
    describe('unit', () => { ... })
    describe('integration', () => {
    beforeAll(() => {
    // Create test DB
    })
    afterAll(() => {
    // Destroy test DB
    })
    // Tests
    })
    })

    View Slide

  90. @slobodan_
    beforeAll(async () => {
    const params = { ... }
    await dynamoDb.createTable(params).promise()
    await dynamoDb.waitFor('tableExists', {
    TableName: tableName
    }).promise()
    })

    View Slide

  91. @slobodan_
    afterAll(async () => {
    await dynamoDb.deleteTable({
    TableName: tableName
    }).promise()
    await dynamoDb.waitFor('tableNotExists', {
    TableName: tableName
    }).promise()
    })

    View Slide

  92. @slobodan_
    Or we deploy the
    full app and run
    integration tests

    View Slide

  93. @slobodan_
    Deploy where?
    Serverless environments are often cheap!

    View Slide

  94. @slobodan_
    Our environments
    12 similar
    environments
    total cost:
    $1250/month

    View Slide

  95. @slobodan_
    It's hard to run or simulate a
    serverless app locally.
    Make small trade-o"s to make
    your app testable, and you'll be
    able to move fast.

    View Slide

  96. @slobodan_
    Security

    View Slide

  97. @slobodan_
    Is serverless secure enough?

    View Slide

  98. @slobodan_
    Accounts are cheap
    Create one AWS sub-account per environment
    Create an AWS sub-account for each developer
    But remember to set budget notifications!

    View Slide

  99. @slobodan_
    Tools for management and
    easy repeatability
    CloudFormation
    AWS Organization Formation

    View Slide

  100. @slobodan_
    Use AWS SSO

    View Slide

  101. @slobodan_
    Use AWS IAM Identity Center
    (Successor to AWS SSO)

    View Slide

  102. @slobodan_
    Learn AWS Identity & Access
    Management (IAM)
    Apply the minimal permissions model for all AWS services
    i.e., a Lambda function can do one specific operation only

    View Slide

  103. @slobodan_
    Use WAF to protect
    your application

    View Slide

  104. @slobodan_
    But what about distributed
    denial-of-service (DDOS)
    attacks?

    View Slide

  105. @slobodan_
    But what about distributed
    denial-of-wallet (DDOW)
    attacks?
    "All AWS customers benefit from the automatic protections of
    AWS Shield Standard, at no additional charge."

    View Slide

  106. @slobodan_
    You still need to be sure that
    your code is secure

    View Slide

  107. @slobodan_
    Learn about security best
    practices for your service
    provider and use them from
    day one.

    View Slide

  108. @slobodan_
    Philosophy

    View Slide

  109. @slobodan_
    Focus on the
    important things only,
    and outsource everything else

    View Slide

  110. @slobodan_
    But how do we know what's
    important?

    View Slide

  111. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  112. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  113. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  114. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  115. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  116. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  117. @slobodan_
    See "Why the Fuss about Serverless?" (https://youtu.be/SPsaqiegOP4)

    View Slide

  118. @slobodan_
    Does this apply only to the
    infrastructure?

    View Slide

  119. @slobodan_
    Do not waste your time or
    energy building from scratch
    components in the product or
    commodity phases

    View Slide

  120. @slobodan_
    Focus on your business logic
    and outsource everything else.

    View Slide

  121. @slobodan_
    Joe Armstrong
    “Make it work, then make it beautiful,
    then if you really, really have to, make it
    fast.
    90% of the time, if you make it beautiful,
    it will already be fast. So really, just
    make it beautiful!”
    creator of Erlang programming language

    View Slide

  122. View Slide

  123. Thank you!
    twitter: @slobodan_

    View Slide