Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New Network Provisioning System Leveraging Kubernetes and Cloud Native Open Source

Hiroki Okui
December 05, 2022

New Network Provisioning System Leveraging Kubernetes and Cloud Native Open Source

Hiroki Okui

December 05, 2022
Tweet

More Decks by Hiroki Okui

Other Decks in Technology

Transcript

  1. New Network Provisioning System Leveraging Kubernetes and Cloud Native Open

    Source NTT Communications Hiroki Okui #ossummit
  2. Self Introduction • Hiroki Okui (@HirokiOkui) • Software Engineer at

    NTT Communications • Transport Network • Network Provisioning System • CI/CD DevOps Platform, etc.. 2
  3. Scope of this presentation • Within scope ◦ Config Generation

    from High-level Intent (API/CLI Request) ◦ Declarative Configuration and Programming ◦ GitOps • Out of scope ◦ SDN Control-plane ◦ Workflow engine / Orchestration ◦ Monitoring ◦ ZTP 3 NE Provisioner Orchestrator GitOps Controller NorthBound API SouthBound Driver Scope
  4. Today’s session • Problems of the legacy approaches • Cloud

    Native Technologies helpful for Network Provisioning • Design of New Network Provisioning System • Demo 4
  5. API Automation • Using REST API provided by devices or

    their EMS • Using Ansible and Python libraries Case1: API Automation API Orchestrator NE Provisioner User Ops 3rd-Party - Service Intent - Cache of actual device config - etc… 7
  6. Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No

    Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch 8
  7. Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No

    Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch Simple Scripting using text templating like Jinja. It is fragile and easily broken 9
  8. Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No

    Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch Simple Scripting using text templating like Jinja. It is fragile and easily broken Need to invoke get command to see what is actually deployed 10
  9. Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No

    Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch Simple Scripting using text templating like Jinja. It is fragile and easily broken Need to invoke get command to see what is actually deployed Configuration drift caused by manual direct operation or software version up 11
  10. Declarative config is stored at Git repository and will be

    delivered when PR merged • By Ansible or simple scripts • According to the IPAM/DCIM (e.g. NetBox) Case2: Git-based Continuous Delivery Github NE Provisioner Ops Reviewer Approve PR merge CI PR 12
  11. Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge

    CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing 13
  12. Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge

    CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing Actual mapped config might be drifted from the CI templating test results, caused by - State change of NetBox - Version difference between CI env and prod 14
  13. Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge

    CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing Depends external IPAM/DCIM. Needs an extra operation to perform rollback in addition to git checkout. Actual mapped config might be drifted from the CI templating test results, caused by - State change of NetBox - Version difference between CI env and prod 15
  14. Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge

    CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing Depends external IPAM/DCIM. Needs an extra operation to perform rollback in addition to git checkout. Actual mapped config might be drifted from the CI templating test results, caused by - State change of NetBox - Version difference between CI env and prod Push-based (CIOps, not GitOps) All device secrets are stored here and it increases security risk 16
  15. Cloud Native Practices (helpful for Network Provisioning) • Kubernetes Custom

    Operator • GitOps • Secret Management • Data Configuration Language (CUE) 18
  16. Manage system declaratively using Kubernetes • Kubernetes Reconciliation loop ◦

    Converge system state with described state by running delivery procedure repeatedly ◦ All Kubernetes resources are managed by this approach • Kubernetes Custom Operator ◦ Extension that configure user’s own resources external to Kubernetes ◦ Well-developed ecosystem to write your own Custom Operator 19
  17. GitOps • GitOps key principles ◦ Entire system config is

    described declaratively ◦ Deploy system config automatically when Git PR merged ◦ Single Source of Truth & Pull-based • Advantage of GitOps ◦ Canonical desired config is versioned in Git as SSoT, operator can easily rollback entire system by git revert. ◦ Pull-based: secrets are installed near the target system and improve security pull CIOps push hook deploy GitOps push deploy We can also reduce security risk by adopting SecretManager of public cloud with Secret Operator or Secrets Store CSI Driver 20
  18. CUE • A powerful data configuration language with new programming

    model ◦ Authored by Marcel van Loheizen who made GCL *1 • Specialized in data unification ◦ Unifies multiple data in arbitrary layer ◦ Gets the same results regardless of order of evaluation (commutative and associative) • Types are Values ◦ Doesn’t distinguish values and types ◦ Simply declares constraints and schema • Programmable ◦ Supports coding practices like templating and module ◦ Type Generation from Go API, OpenAPI, Protobuf *1: Configuration Language used in Google/Borg // Value Alice: age: 20 // Type People: age: int // Constraint Member: age: > 18 // Validate Alice & People & Member Types and Values 21
  19. Modern CI/CD Pipeline with CUE Manifest Repo k8s Reconciliation Loop

    Ops Reviewer Approve PR merge Test Admission Webhook Source Repo Pull PR Use CUE for type validation and policy test in CI Declarative system config is written by CUE Compile CUE and delivery generated YAML 22
  20. Comparison to Network Provisioning System Manifest Repo k8s Reconciliation Loop

    Ops Reviewer Approve PR merge Test Admission Webhook Source Repo Pull Github NE Provisioner Ops Reviewer Approve PR merge CI PR PR Push 23
  21. Issues to be addressed Manifest Repo k8s Reconciliation Loop Ops

    Reviewer Approve PR merge Test Admission Webhook Source Repo Pull Github NE Provisioner Ops Reviewer Approve PR merge CI PR PR Push 24 No models and schema. No capabilities for static analysis, except for golden files testing Not SSoT. Needs an extra operation to perform rollback in addition to git checkout Procedural script and text templating included in the delivery flow, leading to enbug and configuration drift Push-based (CIOps, not GitOps) All device secrets are stored here and it increases security risk
  22. Requirements • Typed Programming of network configuration, not simple text

    templating • Abstract to the intent-based high-level model/interface with CRUD capability ◦ For domain-driven development of the north-bound application ◦ Must have the ability to perform composite of multiple typed document tree • GitOps ◦ SSoT, Pull-based ◦ SecretManager integration and security hardening • Basic requirements as the network provisioning system ◦ Transaction of distributed network devices ◦ Support of multi-vendor / multi-version devices 26
  23. Requirements • Typed Programming of network configuration, not simple text

    templating • Abstract to the intent-based high-level model/interface with CRUD capability ◦ For domain-driven development of the north-bound application ◦ Must have the ability to perform composite of multiple typed document tree • GitOps ◦ SSoT, Pull-based ◦ SecretManager integration and security hardening • Basic requirements as the network provisioning system ◦ Transaction of distributed network devices ◦ Support of multi-vendor / multi-version devices commutative and associative 27
  24. Requirements • Typed Programming of network configuration, not simple text

    templating • Abstract to the intent-based high-level model/interface with CRUD capability ◦ For domain-driven development of the north-bound application ◦ Must have the ability to perform composite of multiple typed document tree • GitOps ◦ SSoT, Pull-based ◦ SecretManager integration and security hardening • Basic requirements as the network provisioning system ◦ Transaction of distributed network devices ◦ Support of multi-vendor / multi-version devices commutative and associative 28
  25. Requirements • Typed Programming of network configuration, not simple text

    templating • Abstract to the intent-based high-level model/interface with CRUD capability ◦ For domain-driven development of the north-bound application ◦ Must have the ability to perform composite of multiple typed document tree • GitOps ◦ SSoT, Pull-based ◦ SecretManager integration and security hardening • Basic requirements as the network provisioning system ◦ Transaction of distributed network devices ◦ Support of multi-vendor / multi-version devices commutative and associative 29
  26. Design k8s Reconciliation Loop Ops Admin Approve PR merge Test

    Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 30
  27. Design k8s Reconciliation Loop Ops Admin Approve PR merge Test

    Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 31 - Write data mapper from high-level model to device config model easily by CUE - Type Validation and Policy Enforcement by CUE - Provide CRUD API to enable north-bound system to perform domain-driven development
  28. Design k8s Reconciliation Loop Ops Admin Approve PR merge Test

    Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 32 - SSoT - Perform rollback to any revision by git checkout - You can get entire device config from Git Repository - CI test using actual device config - Static Analysis using model schema
  29. Design k8s Reconciliation Loop Ops Admin Approve PR merge Test

    Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification Pull-based GitOps leveraging FluxCD Custom Operator 33 - DeviceRollout Operator to perform transaction of distributed network devices - When any device provision failed, all devices will rollback to the previous state Extend to support multi-vendor/multi-version devices by implementing k8s Custom Operator as device driver
  30. Design k8s Reconciliation Loop Ops Admin Approve PR merge Test

    Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification - Secret of devices can be managed as k8s Secret - Easily integrate with SecretManager of Public Cloud to improve security using External Secrets Operator or Secret Store CSI Driver 34
  31. Design k8s Reconciliation Loop Ops Admin Approve PR merge Test

    Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 35
  32. How to generate CUE types and develop driver? k8s Reconciliation

    Loop Ops Admin Approve PR merge Test Pull gNMI API Server Reduce Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR SSoT OpenConfig Devices DeviceA Operator DeviceA Operator DeviceA Subscriber gNMI gNMI Subscribe 36 Map Eval
  33. Use openconfig/ygot k8s Reconciliation Loop Ops Admin Approve PR merge

    Test Pull gNMI API Server Reduce Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR SSoT OpenConfig Devices DeviceA Operator DeviceA Operator DeviceA Subscriber gNMI gNMI Subscribe 37 OpenConfig YANG Go Struct CUE type def ygot generator cue get Map Eval
  34. Requirements satisfied? • Typed Programming of network configuration, not simple

    text templating • Abstract to the intent-based high-level model/interface with CRUD capability ◦ For domain-driven development of the caller system of the north-bound API ◦ Must have the ability to perform composite of multiple typed document tree • GitOps ◦ SSoT, Pull-based ◦ SecretManager integration and security hardening • Basic requirements as the network provisioning system ◦ Transaction of distributed network devices ◦ Support of multi-vendor / multi-version devices => OK => OK => OK => OK => WIP => OK => Under investigation with actual device 39
  35. On going work • Field Trial with transport whitebox transponders

    using OpenConfig/gNMI ◦ Integration test with actual devices • Under development to release this provisioning system as open-source ◦ Just a PoC quality at this time, we needs lots of work.. 40
  36. Takeaways • Developed new network provisioning system leveraging Kubernetes Custom

    Operator, FluxCD, and CUE • Kubernetes and the operation pattern is well-designed for automation and it can be applied even for network provisioning system • CUE is a great language that has a capability to change network provisioning system drastically 41