Slide 1

Slide 1 text

Cell-Based Architecture Design in AWS 2025/03/26, AWS Startup Loft Senior Site Reliability Engineer Kazuki Higashiguchi

Slide 2

Slide 2 text

Agenda Background behind choosing Cell-Based Architecture 1 2 Cell-Based Architecture Implementation in AWS ● Goal of this AWS architecture ● What Cell-Based Architecture is ● Why we adopted this Cell-Based Architecture design ● Multiple AWS accounts management ● Unit of AWS account ● Cross-cell infra considerations ● Cell provisioning ● IAM management

Slide 3

Slide 3 text

タイトルタイトルタイトルタイトルタ イトルタイトルタイトル 小見出し小見出し小見出し小見出し Kazuki Higashiguchi Senior SRE @ Autify, Inc. (2022~) SWE, Tech Lead, EM @ BASE, Inc. (2017~) Cloud Infra Engineer @ S-cubism inc. (2016~) @hgsgtk /in/hgsgtk

Slide 4

Slide 4 text

Autify

Slide 5

Slide 5 text

Autify

Slide 6

Slide 6 text

Background behind choosing Cell-Based Architecture 01.

Slide 7

Slide 7 text

Autify NoCode Web/Mobile infrastructure on AWS Autify NoCode Web and Mobile are our two main SaaS products, with their core server components built on AWS since 2019. Both are designed as a single, global application. Easy maintenance Large blast radius Hard limitations Difficult bootstrapping Engineers just need to consider only one stack. Pros Cons Bootstrapping a new stack is a secret sauce and almost impossible to reproduce later.

Slide 8

Slide 8 text

Our infrastructure architecture issues ( And new products… ) Autify Genesis is in Open Beta, progressing toward GA, with a long-term vision to expand our product lineup. However, the existing architecture has limitations, especially as customers demand on-premise or single tenant solutions alongside cloud-based SaaS. To support future growth, we need a flexible, future-proof architecture for the next decade. (2019~) (GA) …

Slide 9

Slide 9 text

Cell-Based Architecture ● Divides an application into multiple cells, each with its own independent set of resources (compute, storage, network). ● Each cell can operate independently, reducing the impact of failures and enabling horizontal scaling. A distributed system architecture, organizing a complex application into independent, self-contained units called "cells."

Slide 10

Slide 10 text

Benefits of Cell-Based Architecture Scale-out over Scale-up Easy Bootstrap Small blast radius Isolate noisy neighbors Facilitates quick setup across diverse regions. Enhances growth by expanding horizontally rather than vertically. Enables precise targeting and minimizes collateral damage. Prevents interference by segregating resource-intensive components.

Slide 11

Slide 11 text

Why we adopted this Cell-Based Architecture design Cell-based architecture is not a one-size-fits-all solution. However, this strategy matches what we need for Autify Genesis and our future products, and takes into account the following future business challenges: Flexible scalability to support business growth Capability to support single/multi-tenant and multi-regional deployments On-premise deployment feasibility, requiring the same effort as building a new stack

Slide 12

Slide 12 text

Cell-Based Architecture Implementation in AWS 02.

Slide 13

Slide 13 text

Isolation with Multiple AWS Accounts “Treat an AWS account as a resource container.” Environment Isolation Billing Separation Service Quota Management Workload Isolation Permission Delegation Isolate pre-production and production, for security, governance, and regulation reasons. Cell isolation using multiple AWS accounts is the simplest and most effective implementation approach. As Cell-Based architecture manages many AWS resources, we’d hit service quotas if we used a single AWS account.

Slide 14

Slide 14 text

Multi AWS accounts with AWS Organizations

Slide 15

Slide 15 text

AWS Services for Multi Account Management AWS Service Description AWS Organizations Manages a large number of AWS accounts. AWS IAM Identity Center (SSO) Single sign-on (login) to many AWS accounts. AWS Organizations are the prerequisite to use this feature. AWS Control Tower Sets up and governs an AWS multi-account environment, following prescriptive best practices. Account Factory (AWS Service Catalog) Self-service for new account payouts. AWS Config Evaluation and monitoring of AWS account settings. AWS CloudTrail Evidence trail of operations in your AWS account. AWS Security Hub Comprehensive view of your security state in AWS.

Slide 16

Slide 16 text

AWS Account as Isolation Unit Env Cell Region Production us-west-2 us-east-1 ap-northeast-1 eu-west-1 Staging Sandbox Workload Usw2 Cell1 Prd Workload Usw2 Cell2 Prd Route 53 prd SES Workload Usw2 Prd SES Workload Apne1 Prd Multi-tenant cells Single-tenant cells Workload Usw2 Cell3 Prd (Global) (Cross Cell)

Slide 17

Slide 17 text

Cell-Based Architecture in AWS

Slide 18

Slide 18 text

AWS Account Naming Policy Each AWS account requires a unique email address (e.g., [email protected]), but the user part (before the "@") is limited to 64 characters. To avoid exceeding this limit, we need a structured policy for segmenting account names effectively. Segment Our policy Alternative Environment 3 characters `prd`, `stg`, `sbx`, `shd` ● Full char: `production ● Four char: `prod` Region AZ id style `usw2`, `use1`, `apne1`, `euw1` ● xx-xxx-1 style: `us-west-2` ● ISO3166-1 Alpha-2: `us`, `jp` ● IATA airport code: `TYO`

Slide 19

Slide 19 text

Cross-cell infra resource considerations

Slide 20

Slide 20 text

Amazon Route 53 The cell-based architecture includes a Cell Router, a lightweight layer responsible for directing requests to the appropriate cell. This router layer is a shared component across all cells. A common approach to implementing this is using Amazon Route 53, which we also leverage. Env AWS Account Production Route 53 Prd Staging Route 53 Stg Sandbox Route 53 Sbx

Slide 21

Slide 21 text

Amazon SES Using Amazon SES as a cross-cell resource is more practical, as new accounts must request production access, a process that potentially delays cell deployments due to fraud restrictions.

Slide 22

Slide 22 text

Cell Provisioning Infrastructure as Code is essential for provisioning cell-based architecture. The process begins with creating a new AWS account using Account Factory, followed by bootstrapping resources with Terraform due to its broader SaaS support, superior provisioning over AWS CDK (CloudFormation), and greater maturity over CDKTF. Cross-cell resources (Route 53, SES, etc) cross-cell-infra app-workload-repo Provisioning Provisioning Cell Cell Cell Cell Cell Cell Cell Cell Code Repository …

Slide 23

Slide 23 text

IAM Management Cell-based architecture requires managing multiple AWS accounts, making it impractical to issue IAM users in each member account. Instead, leveraging AWS Single Sign-On (SSO) is essential for efficient access management. ● Integrate users from Google Workspace into IAM Identity Center - Google Workspace users who belong to a specific Google Group can be automatically provisioned into AWS IAM Identity Center using SCIM. ● IAM Identity Center Groups, Memberships, Permissions - We streamline access control across multiple AWS accounts in a cell-based architecture.

Slide 24

Slide 24 text

Workflow for Setting Up a New Developer We pre-provision IAM Identity Groups and access permissions in advance. When team changes occur, we update IAM user memberships—all managed using Terraform. New Developer Setup Workflow 1. Create a new user in Google Workspace. 2. Add the user to the [email protected] Google Group - Members of this group are automatically granted access to AWS IAM Identity Center. 3. Assign the user to a specific IAM Identity Center group using Terraform.

Slide 25

Slide 25 text

タイトルタイトルタイトルタイトルタ イトルタイトルタイトル 小見出し小見出し小見出し小見出し Kazuki Higashiguchi Any Questions? @hgsgtk /in/hgsgtk

Slide 26

Slide 26 text

References ● AWS re:Invent 2024. “SaaS meets cell-based architecture: A natural multi-tenant fit (SAS315)”. https://www.youtube.com/watch?v=wYm_PJc2U8c ● AWS re:Invent 2024. “Learn to create a robust, easy-to-scale architecture with cells (ARC335)”. https://youtu.be/OkT12t-fvRE?si=WQj5YVXJp13lBz8T ● AWS re:Invent 2020. “How to scale beyond limits with cell-based architectures”. https://www.youtube.com/watch?v=HUwz8uko7HY ● InfoQ. “Architecting for High Availability in the Cloud with Cellular Architecture”. https://www.youtube.com/watch?v=1RfCjEg4ygY. posted on Jun 11, 2024 ● AWS Well-Architected. “What is a cell-based architecture?”. https://docs.aws.amazon.com/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/what-is-a-cell-based-arc hitecture.html. quoted on Feb 17, 2025 ● AWS Well-Architected. “Why use a cell-based architecture?”. https://docs.aws.amazon.com/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/why-to-use-a-cell-based -architecture.html. quoted on Feb 17, 2025 ● AWS Well-Architected. “REL10-BP04 Use bulkhead architectures to limit scope of impact”. https://docs.aws.amazon.com/wellarchitected/2023-04-10/framework/rel_fault_isolation_use_bulkhead.html. quoted on Feb 17, 2025 ● AWS Well-Architected. “When to use a cell-based architecture?”. https://docs.aws.amazon.com/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/when-to-use-a-cell-bas ed-architecture.html. quoted on Feb 17, 2025

Slide 27

Slide 27 text

References ● Amazon Simple Email Service. “Request production access (Moving out of the Amazon SES sandbox)”. https://docs.aws.amazon.com/ses/latest/dg/request-production-access.html. quoted on Feb 26, 2025 ● Amazon Route 53. “Quotas”. https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html. quoted on Feb 26, 2025 ● AWS IAM Identity Center. “Configure SAML and SCIM with Google Workspace and IAM Identity Center”. https://docs.aws.amazon.com/singlesignon/latest/userguide/gs-gwp.html. quoted on Feb 26, 2025