Slide 1

Slide 1 text

New Network Provisioning System Leveraging Kubernetes and Cloud Native Open Source NTT Communications Hiroki Okui #ossummit

Slide 2

Slide 2 text

Self Introduction ● Hiroki Okui (@HirokiOkui) ● Software Engineer at NTT Communications ● Transport Network ● Network Provisioning System ● CI/CD DevOps Platform, etc.. 2

Slide 3

Slide 3 text

Scope of this presentation ● Within scope ○ Config Generation from High-level Intent (API/CLI Request) ○ Declarative Configuration and Programming ○ GitOps ● Out of scope ○ SDN Control-plane ○ Workflow engine / Orchestration ○ Monitoring ○ ZTP 3 NE Provisioner Orchestrator GitOps Controller NorthBound API SouthBound Driver Scope

Slide 4

Slide 4 text

Today’s session ● Problems of the legacy approaches ● Cloud Native Technologies helpful for Network Provisioning ● Design of New Network Provisioning System ● Demo 4

Slide 5

Slide 5 text

Problems of Legacy Network Provisioning System 5

Slide 6

Slide 6 text

Cases of the legacy approach ● API Automation ● Git-based Continuous Delivery 6

Slide 7

Slide 7 text

API Automation ● Using REST API provided by devices or their EMS ● Using Ansible and Python libraries Case1: API Automation API Orchestrator NE Provisioner User Ops 3rd-Party - Service Intent - Cache of actual device config - etc… 7

Slide 8

Slide 8 text

Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch 8

Slide 9

Slide 9 text

Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch Simple Scripting using text templating like Jinja. It is fragile and easily broken 9

Slide 10

Slide 10 text

Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch Simple Scripting using text templating like Jinja. It is fragile and easily broken Need to invoke get command to see what is actually deployed 10

Slide 11

Slide 11 text

Case1: Problems API Orchestrator NE Provisioner User Ops 3rd-Party No Ansible modules for minor devices (e.g. Carrier grade Transport Devices) -> Scripting from scratch Simple Scripting using text templating like Jinja. It is fragile and easily broken Need to invoke get command to see what is actually deployed Configuration drift caused by manual direct operation or software version up 11

Slide 12

Slide 12 text

Declarative config is stored at Git repository and will be delivered when PR merged ● By Ansible or simple scripts ● According to the IPAM/DCIM (e.g. NetBox) Case2: Git-based Continuous Delivery Github NE Provisioner Ops Reviewer Approve PR merge CI PR 12

Slide 13

Slide 13 text

Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing 13

Slide 14

Slide 14 text

Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing Actual mapped config might be drifted from the CI templating test results, caused by - State change of NetBox - Version difference between CI env and prod 14

Slide 15

Slide 15 text

Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing Depends external IPAM/DCIM. Needs an extra operation to perform rollback in addition to git checkout. Actual mapped config might be drifted from the CI templating test results, caused by - State change of NetBox - Version difference between CI env and prod 15

Slide 16

Slide 16 text

Case2: Problems Github NE Provisioner Ops Reviewer Approve PR merge CI PR No models and schema, just text templating. No capabilities for static analysis, except for golden files testing Depends external IPAM/DCIM. Needs an extra operation to perform rollback in addition to git checkout. Actual mapped config might be drifted from the CI templating test results, caused by - State change of NetBox - Version difference between CI env and prod Push-based (CIOps, not GitOps) All device secrets are stored here and it increases security risk 16

Slide 17

Slide 17 text

Cloud Native Technologies helpful for Network Provisioning 17

Slide 18

Slide 18 text

Cloud Native Practices (helpful for Network Provisioning) ● Kubernetes Custom Operator ● GitOps ● Secret Management ● Data Configuration Language (CUE) 18

Slide 19

Slide 19 text

Manage system declaratively using Kubernetes ● Kubernetes Reconciliation loop ○ Converge system state with described state by running delivery procedure repeatedly ○ All Kubernetes resources are managed by this approach ● Kubernetes Custom Operator ○ Extension that configure user’s own resources external to Kubernetes ○ Well-developed ecosystem to write your own Custom Operator 19

Slide 20

Slide 20 text

GitOps ● GitOps key principles ○ Entire system config is described declaratively ○ Deploy system config automatically when Git PR merged ○ Single Source of Truth & Pull-based ● Advantage of GitOps ○ Canonical desired config is versioned in Git as SSoT, operator can easily rollback entire system by git revert. ○ Pull-based: secrets are installed near the target system and improve security pull CIOps push hook deploy GitOps push deploy We can also reduce security risk by adopting SecretManager of public cloud with Secret Operator or Secrets Store CSI Driver 20

Slide 21

Slide 21 text

CUE ● A powerful data configuration language with new programming model ○ Authored by Marcel van Loheizen who made GCL *1 ● Specialized in data unification ○ Unifies multiple data in arbitrary layer ○ Gets the same results regardless of order of evaluation (commutative and associative) ● Types are Values ○ Doesn’t distinguish values and types ○ Simply declares constraints and schema ● Programmable ○ Supports coding practices like templating and module ○ Type Generation from Go API, OpenAPI, Protobuf *1: Configuration Language used in Google/Borg // Value Alice: age: 20 // Type People: age: int // Constraint Member: age: > 18 // Validate Alice & People & Member Types and Values 21

Slide 22

Slide 22 text

Modern CI/CD Pipeline with CUE Manifest Repo k8s Reconciliation Loop Ops Reviewer Approve PR merge Test Admission Webhook Source Repo Pull PR Use CUE for type validation and policy test in CI Declarative system config is written by CUE Compile CUE and delivery generated YAML 22

Slide 23

Slide 23 text

Comparison to Network Provisioning System Manifest Repo k8s Reconciliation Loop Ops Reviewer Approve PR merge Test Admission Webhook Source Repo Pull Github NE Provisioner Ops Reviewer Approve PR merge CI PR PR Push 23

Slide 24

Slide 24 text

Issues to be addressed Manifest Repo k8s Reconciliation Loop Ops Reviewer Approve PR merge Test Admission Webhook Source Repo Pull Github NE Provisioner Ops Reviewer Approve PR merge CI PR PR Push 24 No models and schema. No capabilities for static analysis, except for golden files testing Not SSoT. Needs an extra operation to perform rollback in addition to git checkout Procedural script and text templating included in the delivery flow, leading to enbug and configuration drift Push-based (CIOps, not GitOps) All device secrets are stored here and it increases security risk

Slide 25

Slide 25 text

New Network Provisioning System 25

Slide 26

Slide 26 text

Requirements ● Typed Programming of network configuration, not simple text templating ● Abstract to the intent-based high-level model/interface with CRUD capability ○ For domain-driven development of the north-bound application ○ Must have the ability to perform composite of multiple typed document tree ● GitOps ○ SSoT, Pull-based ○ SecretManager integration and security hardening ● Basic requirements as the network provisioning system ○ Transaction of distributed network devices ○ Support of multi-vendor / multi-version devices 26

Slide 27

Slide 27 text

Requirements ● Typed Programming of network configuration, not simple text templating ● Abstract to the intent-based high-level model/interface with CRUD capability ○ For domain-driven development of the north-bound application ○ Must have the ability to perform composite of multiple typed document tree ● GitOps ○ SSoT, Pull-based ○ SecretManager integration and security hardening ● Basic requirements as the network provisioning system ○ Transaction of distributed network devices ○ Support of multi-vendor / multi-version devices commutative and associative 27

Slide 28

Slide 28 text

Requirements ● Typed Programming of network configuration, not simple text templating ● Abstract to the intent-based high-level model/interface with CRUD capability ○ For domain-driven development of the north-bound application ○ Must have the ability to perform composite of multiple typed document tree ● GitOps ○ SSoT, Pull-based ○ SecretManager integration and security hardening ● Basic requirements as the network provisioning system ○ Transaction of distributed network devices ○ Support of multi-vendor / multi-version devices commutative and associative 28

Slide 29

Slide 29 text

Requirements ● Typed Programming of network configuration, not simple text templating ● Abstract to the intent-based high-level model/interface with CRUD capability ○ For domain-driven development of the north-bound application ○ Must have the ability to perform composite of multiple typed document tree ● GitOps ○ SSoT, Pull-based ○ SecretManager integration and security hardening ● Basic requirements as the network provisioning system ○ Transaction of distributed network devices ○ Support of multi-vendor / multi-version devices commutative and associative 29

Slide 30

Slide 30 text

Design k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 30

Slide 31

Slide 31 text

Design k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 31 - Write data mapper from high-level model to device config model easily by CUE - Type Validation and Policy Enforcement by CUE - Provide CRUD API to enable north-bound system to perform domain-driven development

Slide 32

Slide 32 text

Design k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 32 - SSoT - Perform rollback to any revision by git checkout - You can get entire device config from Git Repository - CI test using actual device config - Static Analysis using model schema

Slide 33

Slide 33 text

Design k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification Pull-based GitOps leveraging FluxCD Custom Operator 33 - DeviceRollout Operator to perform transaction of distributed network devices - When any device provision failed, all devices will rollback to the previous state Extend to support multi-vendor/multi-version devices by implementing k8s Custom Operator as device driver

Slide 34

Slide 34 text

Design k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification - Secret of devices can be managed as k8s Secret - Easily integrate with SecretManager of Public Cloud to improve security using External Secrets Operator or Secret Store CSI Driver 34

Slide 35

Slide 35 text

Design k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Map Reduce Eval Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR DeviceA Operator DeviceA Operator DeviceB CR SSoT Multi-vendor Multi-version Devices DeviceA Operator DeviceA Operator DeviceA Subscriber DeviceA Operator DeviceA Operator DeviceB Subscriber Provision Change Notification 35

Slide 36

Slide 36 text

How to generate CUE types and develop driver? k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Reduce Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR SSoT OpenConfig Devices DeviceA Operator DeviceA Operator DeviceA Subscriber gNMI gNMI Subscribe 36 Map Eval

Slide 37

Slide 37 text

Use openconfig/ygot k8s Reconciliation Loop Ops Admin Approve PR merge Test Pull gNMI API Server Reduce Composite Northbound Model Southbound Model Source Controller Device Rollout CR DeviceA Operator DeviceA Operator DeviceA CR SSoT OpenConfig Devices DeviceA Operator DeviceA Operator DeviceA Subscriber gNMI gNMI Subscribe 37 OpenConfig YANG Go Struct CUE type def ygot generator cue get Map Eval

Slide 38

Slide 38 text

Demo 38

Slide 39

Slide 39 text

Requirements satisfied? ● Typed Programming of network configuration, not simple text templating ● Abstract to the intent-based high-level model/interface with CRUD capability ○ For domain-driven development of the caller system of the north-bound API ○ Must have the ability to perform composite of multiple typed document tree ● GitOps ○ SSoT, Pull-based ○ SecretManager integration and security hardening ● Basic requirements as the network provisioning system ○ Transaction of distributed network devices ○ Support of multi-vendor / multi-version devices => OK => OK => OK => OK => WIP => OK => Under investigation with actual device 39

Slide 40

Slide 40 text

On going work ● Field Trial with transport whitebox transponders using OpenConfig/gNMI ○ Integration test with actual devices ● Under development to release this provisioning system as open-source ○ Just a PoC quality at this time, we needs lots of work.. 40

Slide 41

Slide 41 text

Takeaways ● Developed new network provisioning system leveraging Kubernetes Custom Operator, FluxCD, and CUE ● Kubernetes and the operation pattern is well-designed for automation and it can be applied even for network provisioning system ● CUE is a great language that has a capability to change network provisioning system drastically 41