$30 off During Our Annual Pro Sale. View Details »

Network architecture design for microservices on Google Cloud Platform

Raphael Fraysse
September 19, 2019

Network architecture design for microservices on Google Cloud Platform

Presentation for the GCPUG Tokyo Network Day 2019: https://gcpug-tokyo.connpass.com/event/144935/

A tale about thinking, planning, and designing a network architecture for large-scale microservices on GCP in a post-IPO company.

Blog version available at https://blog.usejournal.com/network-architecture-design-for-microservices-on-gcp-ce8d10d5396e

Follow me on Twitter: https://twitter.com/la1nra

Raphael Fraysse

September 19, 2019
Tweet

More Decks by Raphael Fraysse

Other Decks in Technology

Transcript

  1. Network Architecture Design for
    Microservices on GCP

    View Slide

  2. 2
    About me
    @lainra (GitHub)
    Twitter / @la1nra
    SRE at Mercari microservices
    platform team

    View Slide

  3. 3
    Target network architecture design for microservices on GCP

    View Slide

  4. Thank you for coming! See you next time!

    View Slide

  5. Thank you for coming! See you next time!
    Just kidding!

    View Slide

  6. 6
    If you find that getting the solution from the start isn’t thrilling enough,
    please keep with me to understand the journey that led to it!
    More seriously

    View Slide

  7. 7
    Table of contents
    ● Infrastructure introduction
    ● Issues leading to the architecture redesign
    ● Defining the new architecture goals
    ● Challenges and solutions
    ● Final design
    ● Wrap-up

    View Slide

  8. Infrastructure introduction

    View Slide

  9. 9
    ● 100+ microservices
    ● 100+ VPCs (1 microservice = 1 VPC)
    ● 2 main Google Kubernetes Engine (GKE) clusters (1 Production
    and 1 Development)
    ● 5+ secondary GKE clusters
    ● 2 countries (Japan and USA)
    ● 200+ developers
    ● 3k+ pods
    Mercari infrastructure in a few numbers

    View Slide

  10. 10
    Our microservices multi-tenancy model
    Source: https://speakerdeck.com/mercari/mtc2018-microservices-platform-at-mercari

    View Slide

  11. Issues leading to the architecture redesign

    View Slide

  12. 12
    Cluster-internal cross-microservices communication worked fine, but we had issues
    with outgoing traffic, especially the following:
    Issues leading to the architecture redesign
    ● Traffic destined for internal services in other VPCs
    ● Traffic destined for GCP managed services
    ● Traffic destined for external tenants (third-party, Internet…)
    ● Traffic destined for our on-premises datacenter, AWS

    View Slide

  13. 13
    Traffic destined for internal services in other VPCs
    Our unmanaged network:
    ➔ All communications are public!
    ➔ Cost more than private traffic: 0.01$ per GB vs free (within same zone)
    ➔ Less secure than private traffic
    ➔ VPC default subnets IP ranges overlap -> Cannot use VPC Peering to privatise traffic

    View Slide

  14. 14
    There are 2 kinds of GCP managed services:
    Traffic destined for GCP managed services
    - Services accessed through an API (i.e `*.googleapis.com`)
    - Services provisioned in either another customer VPC or GCP-managed VPC (i.e
    Cloud Memorystore, Private Cloud SQL)
    While there is no issue with the first kind, the second require consumers to call the service
    with an IP address from the VPC range of their instance.
    GKE pods use a different CIDR than the VPC Subnet so they will get their traffic dropped
    when leaving the VPC.
    ➔ Need to make GKE pods to use the same IP range as the VPC

    View Slide

  15. 15
    We saw earlier that all GCE instances and GKE nodes had public IP addresses.
    Traffic destined for external tenants (third-party, Internet…)
    There are several problems with this:
    1. Public IP leaking
    - They are exposed to the Internet by the public IP
    - When communicating with external tenants, the public IP is advertised
    2. Lack of security mitigation options
    - GKE uses many ports for type: `LoadBalancer` and `NodePorts` Services
    - Makes it hard to mitigate the security risk with firewall rules
    ➔ Need to stop using Public IP addresses for GCE instances

    View Slide

  16. 16
    Microservices are in GCP and the monolith is on-premises as we are still migrating:
    Traffic destined for on-premises datacenter, AWS
    Source: https://speakerdeck.com/mercari/mtc2018-microservices-platform-at-mercari

    View Slide

  17. 17
    Microservices are in GCP and the monolith is on-premises as we are still migrating:
    Traffic destined for on-premises datacenter, AWS
    Now using Google Edge Peering to get a direct BGP route between GCP and our DC and L7
    proxies to ensure security.
    - Requires using our own Public IPv4 address block
    - Cannot advertise private subnets from both locations
    ➔ Need a better way to provide private connectivity
    Also, we wish to provide some AWS services for our developers.
    ➔ Need to build a reliable and high-performance link between GCP and AWS

    View Slide

  18. 18
    Issues summary
    ● Cross-VPC security
    ● Cross-VPC traffic cost
    ● Cross-VPC traffic reliability
    ● GCE Instances security
    ● Inability for GKE pods to perform Cross-VPC connectivity
    ● On-premises and multi-cloud connectivity
    ● Lack of network resources management

    View Slide

  19. 19
    Issues summary
    ● Cross-VPC security
    ● Cross-VPC traffic cost
    ● Cross-VPC traffic reliability
    ● GCE Instances security
    ● Inability for GKE pods to perform Cross-VPC connectivity
    ● On-premises and multi-cloud connectivity
    ● Lack of network resources management
    How can we solve these issues?

    View Slide

  20. New architecture goals definition

    View Slide

  21. 21
    New architecture goals definition
    ● Harden East-West security between GCP projects
    ● Reduce East-West traffic cost
    ● Make East-West traffic more reliable
    ● Disable GCE instances public IPs and enforce internal traffic
    ● Enable Cross-VPC connectivity for GKE pods
    ● Enable production-grade on-premises and multi-cloud connectivity
    ● Define a multi-tenancy network management design
    1:1 mapping between issues and goals to ensure we solve the right problem:

    View Slide

  22. Challenges and solutions

    View Slide

  23. 23
    Challenges and solutions
    ● Challenge 1: Multi-tenancy design
    ● Challenge 2: Which network ownership model to use to
    enforce IP address management?
    ● Challenge 3: How big do we need to think?
    ● Challenge 4: Private IPv4 addresses exhaustion
    ● Challenge 5: Identifying edge cases
    ● Challenge 6: Managing multiple regions in a Shared VPC
    ● Challenge 7: Making GCE instances private only
    During our research, we had many challenges to solve to get an architecture design
    which fulfils our goals:

    View Slide

  24. Challenge 1: Multi-tenancy design

    View Slide

  25. 25
    Challenge 1: Multi-tenancy design
    ● We use Terraform and GitHub repositories to manage microservices tenants GCP
    and GKE resources with full automation
    ● We manage these resources in a central project, our GKE cluster and limit required
    operations by microservices developer teams to get bootstrapped
    ➔ With 100+ microservices GCP projects and VPCs, how to handle this the best way?
    Giving flexibility to developers while providing adequate guardrails is a core concept
    of our microservices platform.

    View Slide

  26. 26
    Challenge 1: Multi-tenancy design
    + Cost nothing (using routes) and is secure
    + Cross-VPC traffic is considered the same as internal VPC traffic
    - Is limited to 25 VPCs per peering group, but we have 100+ microservices…
    - Need to manage the link inside each project on both sides, harder to automate
    - Peered VPC network cannot overlap IP ranges -> default VPCs cannot be peered...
    ➔ This is not the solution we’re looking for.
    Option 1: VPC Peering - Connect VPCs with a direct-internal link

    View Slide

  27. 27
    Challenge 1: Multi-tenancy design
    + Can be used to connect VPCs reaching the VPC Peering limit (25)
    + VPCs IP ranges can overlap
    - Need to manage the VPN tunnel for each project
    - Impossible to automate and self-serve it
    - Very costly, 0.05$ per tunnel/per hour, double to get HA mode
    ➔ This is not the solution we’re looking for.
    Option 2: Cloud VPN - Create a VPN tunnel between VPCs

    View Slide

  28. 28
    Challenge 1: Multi-tenancy design
    + Simplest management, all firewall rules and projects are centralized
    + Easy to attach GCP projects to the Shared VPC
    + Fine-grained permissions model with subnets per GCP project
    + Automatable
    + Cost nothing as all projects belong to the same VPC
    + Scale with multi-tenancy
    - Easier to reach VPC limitations because of using one VPC network and one GCP project
    - VPC subnets IP ranges cannot overlap -> require a good IP Address Management strategy
    ➔ This looks like the best solution for our use case!
    Option 3: Shared VPC - Share a central VPC network across GCP projects

    View Slide

  29. 29
    Challenge 1: Multi-tenancy design
    However, Shared VPC enforces all participating GCP projects to not have
    any IP overlap, leading us to the next challenge.
    Solution: Use Shared VPC for our multi-tenancy model

    View Slide

  30. Challenge 2: Which network ownership model to
    enforce IP address management?

    View Slide

  31. 31
    Challenge 2: Which network ownership model to use to enforce IP address management?
    1 development team -> OK
    10 development teams -> OK…
    100 development teams -> In the red… -> BOTTLENECK
    It might even happen before reaching 100 teams!
    ➔ How to prevent that???
    Enterprises usually have a dedicated network team managing network resources
    relying on standard procedures and manual operations.

    View Slide

  32. 32
    Challenge 2: Which network ownership model to use to enforce IP address management?
    ● Make it so that microservices teams are self-sufficient in handling network for their
    scope.
    ● They don’t need full network control, only to easily fulfil their requirements.
    ● Network team manages IP address centrally and provides generic IP blocks to users on a
    self-service basis
    ➔ In consequence, the network team need to provide automated configurations interfacing
    with the other automation tools used to provision microservices infrastructure.
    Solution: Automate network-related processes and operations.

    View Slide

  33. 33
    Challenge 2: Which network ownership model to use to enforce IP address management?
    Network is usually one common layer across entities. What happens when
    multiple teams try to manage it separately?
    What about multiple entities, companies in the same group?

    View Slide

  34. 34
    Challenge 2: Which network ownership model to use to enforce IP address management?
    ‘organizations which design systems … are constrained
    to produce designs which are copies of the
    communication structures of these organizations’
    Did you hear about Conway’s law?

    View Slide

  35. 35
    Challenge 2: Which network ownership model to use to enforce IP address management?
    Symptoms of Conway’s law in network management:
    1. Naming convention drift:
    ○ i.e “vpc-network-tokyo-01” vs “vpc-tokyo-01” vs “vpc-asia-northeast-01”
    2. Base architecture patterns definition drift, i.e having:
    ○ User-facing application in development network
    ○ Overly strict isolation between environments
    ○ Different IP address assignment strategies
    3. Conflict of ownership, siloing, etc… between teams

    View Slide

  36. 36
    Challenge 2: Which network ownership model to use to enforce IP address management?
    ● One way to handle and enforce IP address assignment -> ensures no IP overlap
    ● Still able to collect requirements and use cases from all entities
    ● Ensure the best architecture/solutions for most stakeholders
    ● Can automate almost all network components provisioning
    With our automated Shared VPC provisioning, microservices teams get:
    ● Attachment between Shared VPC Host Project and their GCP project
    ● A subnet to use for GCE, Cloud SQL, Cloud Memorystore, Private Cloud Functions
    ● Secondary IP ranges when using GKE (but hard to call them microservices when doing so)
    Solution: One central team manages all entities network

    View Slide

  37. Challenge 3: How big do we need to think?

    View Slide

  38. 38
    Challenge 3: How big do we need to think?
    It brings several whereabouts, such as:
    ● How to process all this information?
    ● How to understand what we need?
    ● How to get things done?
    An inconvenient to designing such an architecture as a single team is the scope’s size.

    View Slide

  39. 39
    Challenge 3: How big do we need to think?
    It should be one of the requirements for an architecture design as it should not prevent
    scalability.
    To create it, we need input from all infrastructure stakeholders.
    Some guidelines to define a capacity planning:
    ● Understand how much infrastructure is used now, in 3 years, 5 years
    ● Extrapolate previsions with business expectations and roadmap
    ● Keep some extra margin for unexpected growth/events
    ● Get challenged by many stakeholders to improve the planning quality
    Solution 1: Define a “rough” capacity planning

    View Slide

  40. 40
    Challenge 3: How big do we need to think?
    It is easy to be conservative when designing such an architecture and capacity planning, thus
    never getting them done.
    Some advices to keep in mind during the design process:
    1. Identify as many two-way doors decisions as possible while keeping the base of your
    architecture a high-quality decision.
    2. Not every part of the design is ever set in stone, even less in our time.
    3. Define what will be a one-way door decision, two-way door decisions first and tackle the
    one-way first.
    Solution 2: Keep flexibility in the process

    View Slide

  41. 41
    Challenge 3: How big do we need to think?
    ● Is our design enabling future technologies such as serverless?
    ● How much capacity do we need for Disaster Recovery?
    ● Is our design future-proof? Is it evolutive?
    ● What would be the pain points in managing such a design?
    Other good-to-ask questions:

    View Slide

  42. Challenge 4: Private IPv4 addresses exhaustion

    View Slide

  43. 43
    Challenge 4: Private IPv4 addresses exhaustion
    Kubernetes loves IP addresses since it gives each pod a unique one.
    ● No issue when using overlay networks such as Flannel, Calico overlay…
    ● But in GKE, pods are now first-class citizen with Alias IP
    Alias IP gives a VPC subnet IP address to each pod in a GKE cluster. Great for a lot of reasons!
    However, it ends up bad, very bad...
    IPv4 is a scarce resource: ~18M private IPv4 addresses available

    View Slide

  44. 44
    Challenge 4: Private IPv4 addresses exhaustion
    ● GKE Nodes CIDR: /22 (1024 IPs)
    ● Pods CIDR: /14 (262144 IPs), each node has a /24 (256 IPs) portion allocated
    ● Services CIDR: /20 (4096 IPs)
    Total: 267k IP addresses, which is ~1.5% of the total RFC 1918 IPv4 pool!
    When scaling out to 8 clusters, with Disaster Recovery, you get almost 25% of it used!
    ➔ Kubernetes is extremely “IPvore” so we had to find solutions to make it use fewer IP
    addresses.
    Breakdown of Kubernetes IP addresses usage (for a 1000 nodes GKE cluster with
    default settings):

    View Slide

  45. 45
    Challenge 4: Private IPv4 addresses exhaustion
    Flexible Pod CIDR sacrifices pod density for saving IP address.
    - Limits the number of pods running per node. Default /24 for up to 110 pods per node
    - /25 pods CIDR -> up to 64 pods per node, 128 IPs saved per node
    - /26 pods CIDR -> up to 32 pods per node, 192 IPs saved per node
    In earlier calculation using /26 pods CIDR it means:
    - Max cluster IP usage: 267k -> 70k, a 74% decrease
    - Max pods capacity: 110k -> 32k, a 71% decrease
    ➔ Depending on your use case, you may choose different values, we chose /26 since it fitted our
    capacity planning.
    Solution: Use Flexible Pod CIDR for GKE clusters

    View Slide

  46. Challenge 5: Identifying edge-cases

    View Slide

  47. 47
    Challenge 5: Identifying edge-cases
    Edge-cases are cases that could invalidate your design, either expectedly or unexpectedly.
    We researched thoroughly the GCP documentation to find possible limitations to the design.
    Main limitations we identified with Shared VPC in GCP (as of August 2019):
    ● Max Shared VPC Service Project per Host Project: 100
    ● Max number of subnets per project: 275
    ● Max secondary IP ranges per subnet: 30
    ● Max number of VM instances per VPC network: 15000
    ● Max number of firewall rules per project: 500
    ● Max number of Internal Load Balancers per VPC network: 50
    ● Max nodes for GKE when using GCLB Ingress: 1000
    Technical limitations are everywhere, even within cloud providers.

    View Slide

  48. 48
    Challenge 5: Identifying edge-cases
    Solution: Research limitations extensively but with moderation
    1. Understand the limitations that would apply to the architecture design, i.e:
    ● 15k VM max = max 15 GKE clusters with 1000 nodes
    ● 275 subnets max = max 275 microservices (requiring GCE use)
    2. Match these with the capacity planning to ensure their alignment.
    We acknowledged these limitations thinking that:
    ● These would be lifted in the future, with as few redesigns as possible
    ● We might not achieve this scale (obviously we want to!)
    ● We made many two-way doors decisions so it is a calculated risk
    ➔ The important takeaway here is the ability to find the consensus between edge-cases, your
    capacity planning and your risk assessment.

    View Slide

  49. Challenge 6: Shared VPC multi-region design

    View Slide

  50. 50
    Challenge 6: Shared VPC multi-region design
    Taking our challenges into account, we defined 4 options for the Shared VPC multi-region design:
    ● Option 1: 1 Global Shared VPC Host Project, 1 Shared VPC network per region peered with VPC
    peering
    ● Option 2: 1 Global Shared VPC Host Project, 1 Global Shared VPC network
    ● Option 3: 1 Shared VPC Host Project per region with VPC peering
    ● Option 4: 1 Shared VPC Host Project per region without VPC peering
    Shared VPC is designed for multi-region but there are several ways to do so.

    View Slide

  51. 51
    Challenge 6: Shared VPC multi-region design
    Option 1: 1 Global Shared VPC Host Project, 1
    Shared VPC network per region peered with
    VPC peering
    Option 2: 1 Global Shared VPC Host Project, 1
    Global Shared VPC network

    View Slide

  52. 52
    Challenge 6: Shared VPC multi-region design
    Option 3: 1 Shared VPC Host Project per
    region with VPC peering
    Option 4: 1 Shared VPC Host Project per
    region without VPC peering

    View Slide

  53. 53
    Challenge 6: Shared VPC multi-region design
    ● It has the simplest management with a centralized Shared VPC Host Project for the entire
    group
    ● It is the easiest way to implement the infrastructure logic in GitHub and Terraform
    ● Interconnection between regions is straightforward and leverages GCP Global VPC Network
    ● It fulfils the architecture goals and our guesses in Solution 5
    After weighing in on each option’s pros and cons, we choose Option 2 for the following
    reasons:

    View Slide

  54. Challenge 7: Making GCE instances private only

    View Slide

  55. 55
    Challenge 7: Making GCE instances private only
    When GCE instances only have private IP addresses, they don’t have outbound Internet connectivity.
    To enable it, we need to use NAT (Network Address Translation).
    ➔ In a Shared VPC architecture, how can we provide scalable NAT across multiple regions?
    The only way to make instances private is to not give them a public IP address.

    View Slide

  56. 56
    Challenge 7: Making GCE instances private only
    Cloud NAT is a scalable regional NAT service for outbound traffic from GCE instances:
    - It is embedded into the SDN (Software Defined Network) and decoupled from standard traffic
    - Use one public IP to serve up to 64k TCP and 64k UDP ports
    - Integrates within a VPC, so all shared VPC projects can use a central Cloud NAT
    - Useful to get static IP addresses for third-parties using IP whitelists
    - Default setting of 64 ports per GCE instance
    In case of GKE nodes, 64 ports may be a bit low due to how many pods are hosted.
    ➔ Need to fine-tune the number of NAT IPs/number of ports allocated per VM to find a good
    balance for GKE nodes.
    Solution: Use Cloud NAT in each region

    View Slide

  57. Final design

    View Slide

  58. 58
    Final network architecture design

    View Slide

  59. Wrap-up

    View Slide

  60. 60
    Wrap-up
    ● Lots of issues with default network settings -> must redesign the network.
    ● Solving these issues should be the new architecture goals
    ● Design research implies many unexpected challenges
    ● Network ownership model must align with the network architecture
    ● Be strategic on designing IP assignment for Shared VPC and GKE
    ● Identifying edge-cases is crucial to evaluate the architecture design
    ● Multi-region + Shared VPC is not straightforward
    ● Ensure NAT capacity when making instances private

    View Slide

  61. Thank you for coming!
    (We’re hiring!!!)

    View Slide