Upgrade to Pro — share decks privately, control downloads, hide ads and more …

KloudNFV: Declarative and Hierarchical Software...

KloudNFV: Declarative and Hierarchical Software-Defined Networking Platform using Kubernetes Extension

LINE Developers

June 16, 2023
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. TR; TL (claim) (1) Declarative and Hierarchical SDN C-plane makes

    “VNF life-cycle” shorter to “hours” (2) “Short VNF lifecycle” makes “Engineering Cost” lower in Commercial World (3) Respecting K8s Design principle, we can get (1) without Big-Cost 2
  2. net-b1 net-a2 net-a1 Service A Service B Virtual Private Cloud

    Networking in Production 3 VPN Collaborator Data Center Server Computing Service VM ASBR ASBR DCI-BB DCI-BB internet Networking Requirements • Isolation & Routing • Function (NAT, ACL, Mirror, S2S-VPN, etc..) Operation & Software Requirements • Reliability & Scalability • Many Mid Software Upgrades • Fundamental Internal System upgrade • Efficiency for development Other Region VFP, Orion, Zeta (NSDI’ 17, 18, 22) DSR (NSDI’ 21) SDN DB Controller Controller Controller Data Model Data Model Data Model Data Model Un-revealed This Research Orion (NSDI’ 21) ONIX (OSDI’ 10)
  3. Many Mid Software Upgrade in Commercial Case 4 example1: Flow

    Metering with ACL Action example2: Dedicated/Shared Cluster option user vm App1 App2 user vm App1 App2 dplane dplane dplane On-demand Isolation
  4. Data-Plane Overview • Hows for Networking Requirements: ◦ Isolation ->

    SRv6 L3VPN using Neutron's custom plugin ◦ Routing -> VM Based Router-VM (it’s normal vm in OpenStack viewpoint) ◦ Functions -> using Linux networking feature (tc, netfilter, ebpf, vti, netns, frr, libreswan, etc..) • Router-VM is in the single Failure domain, Control-plane will create these Router-VM in different failure domains 5
  5. Summary: VM Based vRouter Cluster 10 Endpoint1 is damaged by

    failure-domain outage. Availability -> 66%
  6. Summary: VM Based vRouter Cluster 11 Endpoint1 is updated as

    “service-out”. Availability -> 100% Service OUT
  7. Summary: VM Based vRouter Cluster 12 Endpoint1 is updated as

    “service-out”. Availability -> 100% Service OUT (1) Declarative and Hierarchical SDN C-plane makes “VNF life-cycle” shorter to “hours” (2) “Short VNF lifecycle” makes “Engineering Cost” lower in Commercial World (3) Respecting K8s Design principle, we can get (1) without Big-Cost Next Step is… How to construct/manage vRouter Cluster as Managed service
  8. Control-Plane Overview “KloudNFV” • Generic control-plane platform for private cloud

    networking at LINE • Design Concept ◦ All the APIs are represented with K8s-CRD (only CRUD) ◦ All the Controllers are represented as just a K8s-Custom-Controller 13
  9. Resources and Controllers • Gateway: HA aware endpoints cluster •

    Endpoint: Single Failure domain network function • NfvMachine: Modular virtual router base abstraction for NFV purpose 14
  10. No need to care Upper-side for NfvMachine Many Upper-side kinds

    are exist Endpoint Controller will be “dummy yaml translator logic” ← eliminate engineering 18
  11. No need to care Upper-side for NfvMachine Many Upper-side kinds

    are exist Endpoint Controller will be “dummy yaml translator logic” ← eliminate engineering 19 (1) Declarative and Hierarchical SDN C-plane makes “VNF life-cycle” shorter to “hours” (2) “Short VNF lifecycle” makes “Engineering Cost” lower in Commercial World (3) Respecting K8s Design principle, we can get (1) without Big-Cost by inserting “deployment of nfv-stack” into the SDN
  12. Gateway -> Endpoint -> NfvMachine 21 kind: RoutingGateway metadata: name:

    gw1 spec: networks - ext-network1 - pri-network1 endpoints: - ep1 - ep2 - ep3 server: flavor: 3vCPU_4gbRAM image: vRouter Watch kind: RoutingEndpoint metadata: name: gw1-ep3 spec: networks - ext-network1 - pri-network1 server: flavor: 3vCPU_4gbRAM image: vRouter kind: RoutingEndpoint metadata: name: gw1-ep2 spec: networks - ext-network1 - pri-network1 server: flavor: 3vCPU_4gbRAM image: vRouter kind: RoutingEndpoint metadata: name: gw1-ep1 spec: networks - ext-network1 - pri-network1 server: flavor: 3vCPU_4gbRAM image: vRouter Create
  13. Gateway -> Endpoint -> NfvMachine 22 Routing Gateway Mani- fests

    Watch kind: RoutingEndpoint metadata: name: gw1-ep3 spec: networks - ext-network1 - pri-network1 server: flavor: 3vCPU_4gbRAM image: vRouter kind: RoutingEndpoint metadata: name: gw1-ep2 spec: networks - ext-network1 - pri-network1 server: flavor: 3vCPU_4gbRAM image: vRouter kind: RoutingEndpoint metadata: name: gw1-ep1 spec: networks - ext-network1 - pri-network1 server: flavor: 3vCPU_4gbRAM image: vRouter Create kind: NfvMachine metadata: name: gw1-ep3 spec: networks - ext-network1 - pri-network1 server: {...} containers: {...} kind: NfvMachine metadata: name: gw1-ep2 spec: networks - ext-network1 - pri-network1 server: {...} containers: {...} kind: NfvMachine metadata: name: gw1-ep1 spec: networks - ext-network1 - pri-network1 server: {...} containers: {...}
  14. Gateway -> Endpoint -> NfvMachine 23 Routing Gateway Mani- fests

    Routing Endpoint Mani- fest Create Routing Endpoint Mani- fest Routing Endpoint Mani- fest kind: NfvMachine metadata: name: gw1-ep3 spec: networks - ext-network1 - pri-network1 server: kind: NfvMachine metadata: name: gw1-ep2 spec: networks - ext-network1 - pri-network1 server: {} kind: NfvMachine metadata: name: gw1-ep1 spec: networks - ext-network1 - pri-network1 ..(snip).. Watch
  15. Gateway -> Endpoint -> NfvMachine 24 Routing Gateway Mani- fests

    Routing Endpoint Mani- fest Create Routing Endpoint Mani- fest Routing Endpoint Mani- fest kind: NfvMachine metadata: name: gw1-ep3 spec: networks - ext-network1 - pri-network1 server: kind: NfvMachine metadata: name: gw1-ep2 spec: networks - ext-network1 - pri-network1 server: {} kind: NfvMachine metadata: name: gw1-ep1 spec: networks - ext-network1 - pri-network1 ..(snip).. Watch (1) Declarative and Hierarchical SDN C-plane makes “VNF life-cycle” shorter to “hours” (2) “Short VNF lifecycle” makes “Engineering Cost” lower in Commercial World (3) Respecting K8s Design principle, we can get (1) without Big-Cost
  16. RollingUpdate • RollingUpdate Controller Capability ◦ policy: what kind of

    action for each “Endpoint” • Endpoint Capability ◦ maintenance mode: routing advertisement stop, etc.. 25
  17. RollingUpdate 26 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} (1) create (2) watch
  18. RollingUpdate 27 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (3) status set
  19. RollingUpdate 28 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (4) Update Endpoint1 as maint-mode Maintenance Mode
  20. RollingUpdate 29 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (5) Delete child NfvMachine Maintenance Mode Delete
  21. RollingUpdate 30 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (6) wait boot-up (7) Reconcile to create NfvMachine for Endpoint RE Ctrlr Maintenance Mode Creating
  22. RollingUpdate 31 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED RE Ctrlr Maintenance Mode Created
  23. RollingUpdate 32 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: FINISHED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED Maintenance Mode (8) disable maintenance mode
  24. RollingUpdate 33 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: FINISHED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED same procedures Done
  25. Growing Journey of RollingUpdate • 2021.09 initial release ◦ machine-recreate

    • 2022.05 policy abstraction ◦ container-refresh ◦ loader-container-refresh • 2022.08 no-maint-mode 34
  26. Growing Journey of RollingUpdate • 2021.09 initial release ◦ machine-recreate

    • 2022.05 policy abstraction ◦ container-refresh ◦ loader-container-refresh • 2022.08 no-maint-mode 35 (1) Declarative and Hierarchical SDN C-plane makes “VNF life-cycle” shorter to “hours” (2) “Short VNF lifecycle” makes “Engineering Cost” lower in Commercial World (3) Respecting K8s Design principle, we can get (1) without Big-Cost
  27. Day1 Service Development Day2 In case of Previous Project (2020.01~)

    38 Mar Apr Feb Jan May Jun Project Start System Design System Implement Test-Env Release Real-Env Release Operation design Why What How • base network technology verification • base distributed system technology verification Operation Kit (ansible playbooks) Operation Manual Service Level Objective System Development • Base component (apiserver, information-transfer, database manipulator) • SDN algorithm Daily/Weekly Task Customer Support Encourage mechanism to another member Additional Feature and Improvement
  28. Day1 Service Development Day2 In case of Previous Project (2020.01~)

    39 Mar Apr Feb Jan May Jun Project Start System Design System Implement Test-Env Release Real-Env Release Operation design Daily/Weekly Task Customer Support Encourage mechanism to another member Additional Feature and Improvement Day1-Cost Development Operation Day2-Cost Development Operation
  29. Day1 Service Development Day2 In case of Previous Project (2020.01~)

    Mar Apr Feb Jan May Jun Project Start System Design System Implement Test-Env Release Real-Env Release Operation design Day1 Day2 Project Start System Design System Implement TestEnv Release Real-Env Release Operation design In case of KloudNFV (2020.09~)
  30. Production Experience (including next issue) • Service development leadtime ->

    ½ • K8s Storage Limitation ◦ not only VPC’s but also LB, DNS, another resources can be stored in single k8s cluster or not…? ◦ etcd has 8GB storage limitation and many resources make the controller slower • NfvMachine VM’s Noisy neighbor affection 41
  31. Conclusion 42 (1) Declarative and Hierarchical SDN C-plane makes “VNF

    life-cycle” shorter to “hours” (2) “Short VNF lifecycle” makes “Engineering Cost” lower in Commercial World (3) Respecting K8s Design principle, we can get (1) without Big-Cost A-Endpoint Gateway B-Endpoint C-Endpoint NfvMachine RollingUpd