Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New Network Provisioning System Leveraging Kubernetes and Cloud Native Open Source

Hiroki Okui
December 05, 2022

New Network Provisioning System Leveraging Kubernetes and Cloud Native Open Source

Hiroki Okui

December 05, 2022
Tweet

More Decks by Hiroki Okui

Other Decks in Technology

Transcript

  1. New Network Provisioning System
    Leveraging Kubernetes
    and Cloud Native Open Source
    NTT Communications
    Hiroki Okui
    #ossummit

    View Slide

  2. Self Introduction
    ● Hiroki Okui (@HirokiOkui)
    ● Software Engineer at NTT Communications
    ● Transport Network
    ● Network Provisioning System
    ● CI/CD DevOps Platform, etc..
    2

    View Slide

  3. Scope of this presentation
    ● Within scope
    ○ Config Generation from High-level Intent (API/CLI Request)
    ○ Declarative Configuration and Programming
    ○ GitOps
    ● Out of scope
    ○ SDN Control-plane
    ○ Workflow engine / Orchestration
    ○ Monitoring
    ○ ZTP
    3
    NE Provisioner
    Orchestrator
    GitOps
    Controller
    NorthBound
    API
    SouthBound
    Driver
    Scope

    View Slide

  4. Today’s session
    ● Problems of the legacy approaches
    ● Cloud Native Technologies helpful for Network Provisioning
    ● Design of New Network Provisioning System
    ● Demo
    4

    View Slide

  5. Problems of
    Legacy Network Provisioning System
    5

    View Slide

  6. Cases of the legacy approach
    ● API Automation
    ● Git-based Continuous Delivery
    6

    View Slide

  7. API Automation
    ● Using REST API provided by devices or their EMS
    ● Using Ansible and Python libraries
    Case1: API Automation
    API
    Orchestrator
    NE Provisioner
    User
    Ops
    3rd-Party
    - Service Intent
    - Cache of actual device config
    - etc…
    7

    View Slide

  8. Case1: Problems
    API
    Orchestrator
    NE Provisioner
    User
    Ops
    3rd-Party
    No Ansible modules for minor devices
    (e.g. Carrier grade Transport Devices)
    -> Scripting from scratch
    8

    View Slide

  9. Case1: Problems
    API
    Orchestrator
    NE Provisioner
    User
    Ops
    3rd-Party
    No Ansible modules for minor devices
    (e.g. Carrier grade Transport Devices)
    -> Scripting from scratch
    Simple Scripting using text
    templating like Jinja. It is
    fragile and easily broken
    9

    View Slide

  10. Case1: Problems
    API
    Orchestrator
    NE Provisioner
    User
    Ops
    3rd-Party
    No Ansible modules for minor devices
    (e.g. Carrier grade Transport Devices)
    -> Scripting from scratch
    Simple Scripting using text
    templating like Jinja. It is
    fragile and easily broken
    Need to invoke
    get command to see
    what is actually deployed
    10

    View Slide

  11. Case1: Problems
    API
    Orchestrator
    NE Provisioner
    User
    Ops
    3rd-Party
    No Ansible modules for minor devices
    (e.g. Carrier grade Transport Devices)
    -> Scripting from scratch
    Simple Scripting using text
    templating like Jinja. It is
    fragile and easily broken
    Need to invoke
    get command to see
    what is actually deployed
    Configuration drift caused by
    manual direct operation
    or software version up
    11

    View Slide

  12. Declarative config is stored at Git repository and will be delivered when PR merged
    ● By Ansible or simple scripts
    ● According to the IPAM/DCIM (e.g. NetBox)
    Case2: Git-based Continuous Delivery
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    12

    View Slide

  13. Case2: Problems
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    No models and schema, just text templating.
    No capabilities for static analysis,
    except for golden files testing
    13

    View Slide

  14. Case2: Problems
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    No models and schema, just text templating.
    No capabilities for static analysis,
    except for golden files testing
    Actual mapped config might be drifted
    from the CI templating test results, caused by
    - State change of NetBox
    - Version difference between CI env and prod
    14

    View Slide

  15. Case2: Problems
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    No models and schema, just text templating.
    No capabilities for static analysis,
    except for golden files testing
    Depends external IPAM/DCIM.
    Needs an extra operation to perform
    rollback in addition to git checkout.
    Actual mapped config might be drifted
    from the CI templating test results, caused by
    - State change of NetBox
    - Version difference between CI env and prod
    15

    View Slide

  16. Case2: Problems
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    No models and schema, just text templating.
    No capabilities for static analysis,
    except for golden files testing
    Depends external IPAM/DCIM.
    Needs an extra operation to perform
    rollback in addition to git checkout.
    Actual mapped config might be drifted
    from the CI templating test results, caused by
    - State change of NetBox
    - Version difference between CI env and prod
    Push-based (CIOps, not GitOps)
    All device secrets are stored here
    and it increases security risk
    16

    View Slide

  17. Cloud Native Technologies
    helpful for Network Provisioning
    17

    View Slide

  18. Cloud Native Practices (helpful for Network Provisioning)
    ● Kubernetes Custom Operator
    ● GitOps
    ● Secret Management
    ● Data Configuration Language (CUE)
    18

    View Slide

  19. Manage system declaratively using Kubernetes
    ● Kubernetes Reconciliation loop
    ○ Converge system state with described state
    by running delivery procedure repeatedly
    ○ All Kubernetes resources are managed by this approach
    ● Kubernetes Custom Operator
    ○ Extension that configure user’s own resources external to Kubernetes
    ○ Well-developed ecosystem to write your own Custom Operator
    19

    View Slide

  20. GitOps
    ● GitOps key principles
    ○ Entire system config is described declaratively
    ○ Deploy system config automatically when Git PR merged
    ○ Single Source of Truth & Pull-based
    ● Advantage of GitOps
    ○ Canonical desired config is versioned in Git as SSoT,
    operator can easily rollback entire system by git revert.
    ○ Pull-based: secrets are installed near the target system
    and improve security pull
    CIOps
    push hook deploy
    GitOps
    push deploy
    We can also reduce security risk by adopting
    SecretManager of public cloud with
    Secret Operator or Secrets Store CSI Driver
    20

    View Slide

  21. CUE
    ● A powerful data configuration language with new programming model
    ○ Authored by Marcel van Loheizen who made GCL *1
    ● Specialized in data unification
    ○ Unifies multiple data in arbitrary layer
    ○ Gets the same results regardless of order of evaluation
    (commutative and associative)
    ● Types are Values
    ○ Doesn’t distinguish values and types
    ○ Simply declares constraints and schema
    ● Programmable
    ○ Supports coding practices like templating and module
    ○ Type Generation from Go API, OpenAPI, Protobuf
    *1: Configuration Language used in Google/Borg
    // Value
    Alice: age: 20
    // Type
    People: age: int
    // Constraint
    Member: age: > 18
    // Validate
    Alice & People & Member
    Types and Values
    21

    View Slide

  22. Modern CI/CD Pipeline with CUE
    Manifest
    Repo
    k8s Reconciliation Loop
    Ops
    Reviewer
    Approve
    PR merge
    Test
    Admission
    Webhook
    Source
    Repo
    Pull
    PR
    Use CUE for type validation
    and policy test in CI
    Declarative system config
    is written by CUE
    Compile CUE and
    delivery generated YAML
    22

    View Slide

  23. Comparison to Network Provisioning System
    Manifest
    Repo
    k8s Reconciliation Loop
    Ops
    Reviewer
    Approve
    PR merge
    Test
    Admission
    Webhook
    Source
    Repo
    Pull
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    PR Push
    23

    View Slide

  24. Issues to be addressed
    Manifest
    Repo
    k8s Reconciliation Loop
    Ops
    Reviewer
    Approve
    PR merge
    Test
    Admission
    Webhook
    Source
    Repo
    Pull
    Github NE Provisioner
    Ops
    Reviewer
    Approve
    PR merge
    CI
    PR
    PR Push
    24
    No models and schema.
    No capabilities for static analysis,
    except for golden files testing
    Not SSoT.
    Needs an extra operation to perform
    rollback in addition to git checkout
    Procedural script and text templating
    included in the delivery flow, leading to
    enbug and configuration drift
    Push-based (CIOps, not GitOps)
    All device secrets are stored here
    and it increases security risk

    View Slide

  25. New Network Provisioning System
    25

    View Slide

  26. Requirements
    ● Typed Programming of network configuration, not simple text templating
    ● Abstract to the intent-based high-level model/interface with CRUD capability
    ○ For domain-driven development of the north-bound application
    ○ Must have the ability to perform composite of multiple typed document tree
    ● GitOps
    ○ SSoT, Pull-based
    ○ SecretManager integration and security hardening
    ● Basic requirements as the network provisioning system
    ○ Transaction of distributed network devices
    ○ Support of multi-vendor / multi-version devices
    26

    View Slide

  27. Requirements
    ● Typed Programming of network configuration, not simple text templating
    ● Abstract to the intent-based high-level model/interface with CRUD capability
    ○ For domain-driven development of the north-bound application
    ○ Must have the ability to perform composite of multiple typed document tree
    ● GitOps
    ○ SSoT, Pull-based
    ○ SecretManager integration and security hardening
    ● Basic requirements as the network provisioning system
    ○ Transaction of distributed network devices
    ○ Support of multi-vendor / multi-version devices
    commutative and associative
    27

    View Slide

  28. Requirements
    ● Typed Programming of network configuration, not simple text templating
    ● Abstract to the intent-based high-level model/interface with CRUD capability
    ○ For domain-driven development of the north-bound application
    ○ Must have the ability to perform composite of multiple typed document tree
    ● GitOps
    ○ SSoT, Pull-based
    ○ SecretManager integration and security hardening
    ● Basic requirements as the network provisioning system
    ○ Transaction of distributed network devices
    ○ Support of multi-vendor / multi-version devices
    commutative and associative
    28

    View Slide

  29. Requirements
    ● Typed Programming of network configuration, not simple text templating
    ● Abstract to the intent-based high-level model/interface with CRUD capability
    ○ For domain-driven development of the north-bound application
    ○ Must have the ability to perform composite of multiple typed document tree
    ● GitOps
    ○ SSoT, Pull-based
    ○ SecretManager integration and security hardening
    ● Basic requirements as the network provisioning system
    ○ Transaction of distributed network devices
    ○ Support of multi-vendor / multi-version devices
    commutative and associative
    29

    View Slide

  30. Design
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Map Reduce
    Eval Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    CR
    SSoT
    Multi-vendor
    Multi-version
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    Subscriber
    Provision
    Change
    Notification
    30

    View Slide

  31. Design
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Map Reduce
    Eval Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    CR
    SSoT
    Multi-vendor
    Multi-version
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    Subscriber
    Provision
    Change
    Notification
    31
    - Write data mapper from high-level model to
    device config model easily by CUE
    - Type Validation and Policy Enforcement by CUE
    - Provide CRUD API to enable north-bound system
    to perform domain-driven development

    View Slide

  32. Design
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Map Reduce
    Eval Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    CR
    SSoT
    Multi-vendor
    Multi-version
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    Subscriber
    Provision
    Change
    Notification
    32
    - SSoT
    - Perform rollback to any revision by git checkout
    - You can get entire device config from Git Repository
    - CI test using actual device config
    - Static Analysis using model schema

    View Slide

  33. Design
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Map Reduce
    Eval Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    CR
    SSoT
    Multi-vendor
    Multi-version
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    Subscriber
    Provision
    Change
    Notification
    Pull-based GitOps
    leveraging FluxCD Custom Operator
    33
    - DeviceRollout Operator to perform
    transaction of distributed network devices
    - When any device provision failed, all devices
    will rollback to the previous state
    Extend to support multi-vendor/multi-version
    devices by implementing k8s Custom Operator
    as device driver

    View Slide

  34. Design
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Map Reduce
    Eval Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    CR
    SSoT
    Multi-vendor
    Multi-version
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    Subscriber
    Provision
    Change
    Notification
    - Secret of devices can be managed as k8s Secret
    - Easily integrate with SecretManager of Public Cloud to improve security
    using External Secrets Operator or Secret Store CSI Driver
    34

    View Slide

  35. Design
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Map Reduce
    Eval Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    CR
    SSoT
    Multi-vendor
    Multi-version
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceB
    Subscriber
    Provision
    Change
    Notification
    35

    View Slide

  36. How to generate CUE types and develop driver?
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Reduce
    Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    SSoT
    OpenConfig
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    gNMI
    gNMI
    Subscribe
    36
    Map
    Eval

    View Slide

  37. Use openconfig/ygot
    k8s Reconciliation Loop
    Ops
    Admin
    Approve
    PR merge
    Test
    Pull
    gNMI
    API Server
    Reduce
    Composite
    Northbound
    Model
    Southbound
    Model
    Source
    Controller
    Device
    Rollout CR
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    CR
    SSoT
    OpenConfig
    Devices
    DeviceA
    Operator
    DeviceA
    Operator
    DeviceA
    Subscriber
    gNMI
    gNMI
    Subscribe
    37
    OpenConfig YANG
    Go Struct
    CUE type def
    ygot
    generator
    cue get
    Map
    Eval

    View Slide

  38. Demo
    38

    View Slide

  39. Requirements satisfied?
    ● Typed Programming of network configuration, not simple text templating
    ● Abstract to the intent-based high-level model/interface with CRUD capability
    ○ For domain-driven development of the caller system of the north-bound API
    ○ Must have the ability to perform composite of multiple typed document tree
    ● GitOps
    ○ SSoT, Pull-based
    ○ SecretManager integration and security hardening
    ● Basic requirements as the network provisioning system
    ○ Transaction of distributed network devices
    ○ Support of multi-vendor / multi-version devices
    => OK
    => OK
    => OK
    => OK
    => WIP
    => OK
    => Under investigation with actual device
    39

    View Slide

  40. On going work
    ● Field Trial with transport whitebox transponders using OpenConfig/gNMI
    ○ Integration test with actual devices
    ● Under development to release this provisioning system as open-source
    ○ Just a PoC quality at this time, we needs lots of work..
    40

    View Slide

  41. Takeaways
    ● Developed new network provisioning system
    leveraging Kubernetes Custom Operator, FluxCD, and CUE
    ● Kubernetes and the operation pattern is well-designed for automation
    and it can be applied even for network provisioning system
    ● CUE is a great language that has a capability
    to change network provisioning system drastically
    41

    View Slide