Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEのネットワークオーケストレータをリニューアルした話 / LINE’s new Network Orchestrator

LINEのネットワークオーケストレータをリニューアルした話 / LINE’s new Network Orchestrator

JANOG49 Meetingの登壇資料です
https://www.janog.gr.jp/meeting/janog49/lineorc/
講演者:福田 守昴
Verda Platform室 ネットワーク開発チーム所属

LINE Developers

January 27, 2022
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. LINEのネットワークオーケストレータをリニューアルした話
    LINE株式会社 福田守昴
    1

    View full-size slide

  2. @LINE
    ・Network Orchestrator development
    ・White box NOS development
    ・Telecom Infra Project
    ・IoT Gateway firmware development
    ・IoT protocol stack development
    ・enterprise NOS test and release engineering
    ・test automation system development
    Subaru Fukuda
    2016.Apr - 2018.Apr
    2018.Mar - 2019.Sep
    2019.Oct - 2020.Oct
    2020.Nov - NOW
    About Me

    View full-size slide

  3. What is Verda?
    80,000+
    Virtual
    Machine
    40,000+
    Baremetal
    6,000+
    Hypervisor
    NAT
    Load Balancer
    VM / Baremetal
    MySQL
    Elasticsearch
    Image Repo
    Shared Filesystem DNS
    App engine
    (like heroku)
    Controller
    And
    More…
    3

    View full-size slide

  4. Underlay Network
    LINEのネットワークをゼロから再設計した話@JANOG43
    4

    View full-size slide

  5. Multi Components Architecture
    5

    View full-size slide

  6. Problems
    1) SCALABILITY
    2) MULTIVENDOR
    3) TRIGGER
    4) BATCH CHANGE
    5) HUMAN ERROR
    6

    View full-size slide

  7. SCALABILITY
    7

    View full-size slide

  8. Problem
    Config Update Process
    ①Update Database
    ②Create Inventory
    ③Apply Config(Run Ansible)
    8

    View full-size slide

  9. Problem
    Problem
    • Ansible server load is big
    • It takes a long time
    • manual operations are required.
    • To update database
    • To generate inventory
    • To run Ansible
    9

    View full-size slide

  10. Agent Application
    1:N 1:1
    10

    View full-size slide

  11. Agent Sync Config
    11
    Config Update Process
    0) agent watch DB
    1) operator update DB
    2) agent detect the change
    3) Update config (run Ansible)

    View full-size slide

  12. Agent Sync Config
    12
    Config Update Process
    0) agent watch DB
    1) operator update DB
    2) agent detect the change
    3) Update config (run Ansible)

    View full-size slide

  13. Agent Sync Config
    13
    Config Update Process
    0) agent watch DB
    1) operator update DB
    2) agent detect the change
    3) Update config (run Ansible)

    View full-size slide

  14. Agent Sync Config
    Config Update Process
    0) agent watch DB
    1) operator update DB
    2) agent detect the change
    3) Update config (run Ansible)
    14

    View full-size slide

  15. Agent Deployment Process
    PROVISION
    • SOME INITIAL SETUP
    • INSTALL Docker
    • DEPLOY AGENT
    ZTP SCRIPT
    • SETUP FOR SSH
    • PROVISION REQUEST
    15

    View full-size slide

  16. MULTI VENDOR
    16

    View full-size slide

  17. Problem
    ARISTA
    Cumulus Linux
    How to apply config is different between Cumulus and ARISTA.
    17

    View full-size slide

  18. vendor agnostic vendor specific
    Data Flow
    Only Ansible playbook should have vendor specific code.
    18

    View full-size slide

  19. Ansible Tag
    Cumulus Linux
    ARISTA
    target: localhost
    tag: cumulus
    target: localhost
    tag: arista
    - name: example-task1
    XXX:
    XXXARG: "example"
    tags: cumulus
    - name: example-task1
    XXX:
    XXXARG: "example"
    tags: arista
    19
    Environmental variable
    nos={cumulus | arista}

    View full-size slide

  20. Ansible Tag
    Cumulus Linux
    ARISTA
    target: localhost
    tag: cumulus
    target: localhost
    tag: arista
    - name: example-task1
    XXX:
    XXXARG: "example"
    tags: cumulus
    - name: example-task1
    XXX:
    XXXARG: "example"
    tags: arista
    20

    View full-size slide

  21. vendor agnostic vendor specific
    Data Flow
    How did we realize vendor agnostic config param DB?
    21

    View full-size slide

  22. Config Parameter Sheet
    1. SWITCH
    • hostname, os-version, server-room, etc
    2. INTERFACE
    • mac, speed, mtu, ip, etc
    3. BGP
    • AS, neighbor, peer-group
    4. QOS
    • config for shaping
    5. ROUTEMAP
    • ingress/egress routemap
    6. PREFIXLIST
    • Ipv4/ipv6 prefixlist
    SWITCH INTERFACE
    BGP QOS
    ROUTEMAP PREFIXLIST
    22

    View full-size slide

  23. Config Parameter Sheet
    1. SWITCH
    • hostname, os-version, server-room, etc
    2. INTERFACE
    • mac, speed, mtu, ip, etc
    3. BGP
    • AS, neighbor, peer-group
    4. QOS
    • config for shaping
    5. ROUTEMAP
    • ingress/egress routemap
    6. PREFIXLIST
    • Ipv4/ipv6 prefixlist
    {
    "routemap-001": {
    "entries": [
    {
    "action": "permit",
    "sequence": 10,
    "set_actions": [
    {
    "action": "as-path prepend",
    "value": "auto auto auto auto auto"
    }
    ]
    }
    ]
    }

    }
    EX)ROUTEMAP PARAMETER SHEET
    23

    View full-size slide

  24. Config Parameter Sheet
    KEY VALUE
    (JSON FORMAT)
    SWITCH001/SWITCH ...
    SWITCH001/INTERFACE ...
    SWITCH001/BGP ...
    SWITCH001/QOS ...
    SWITCH001/ROUTEMAP ...
    SWITCH001/PREFIXLIST ...
    ... ...
    SWITCH002/SWITCH ...
    SWITCH002/INTERFACE ...
    SWITCH002/BGP ...
    ... ...
    ・・・
    switch001
    ・・・
    24

    View full-size slide

  25. SYNC-AGENT Handle Config Parameter Sheet
    KEY VALUE
    (JSON FORMAT)
    SWITCH001/SWITCH ...
    SWITCH001/INTERFACE ...
    SWITCH001/BGP ...
    SWITCH001/QOS ...
    SWITCH001/ROUTEMAP ...
    SWITCH001/PREFIXLIST ...
    ... ...
    SWITCH002/SWITCH ...
    SWITCH002/INTERFACE ...
    SWITCH002/BGP ...
    ... ...
    switch001
    ・・・
    watch
    25

    View full-size slide

  26. SYNC-AGENT Handle Config Parameter Sheet
    KEY VALUE
    (JSON FORMAT)
    SWITCH001/SWITCH ...
    SWITCH001/INTERFACE ...
    SWITCH001/BGP ...
    SWITCH001/QOS ...
    SWITCH001/ROUTEMAP ...
    SWITCH001/PREFIXLIST ...
    ... ...
    SWITCH002/SWITCH ...
    SWITCH002/INTERFACE ...
    SWITCH002/BGP ...
    ... ...
    switch001
    watch
    26
    SWITCH
    INTERFACE
    ROUTEMAP PREFIXLIST
    BGP
    QOS

    View full-size slide

  27. SYNC-AGENT Handle Config Parameter Sheet
    KEY VALUE
    (JSON FORMAT)
    SWITCH001/SWITCH ...
    SWITCH001/INTERFACE ...
    SWITCH001/BGP ...
    SWITCH001/QOS ...
    SWITCH001/ROUTEMAP ...
    SWITCH001/PREFIXLIST ...
    ... ...
    switch001
    watch
    27
    SWITCH
    INTERFACE
    ROUTEMAP PREFIXLIST
    BGP
    QOS
    0) update SWITCH001/INTERFACE.
    1) sync-agent detect the change and
    get the INTERFACE config pram sheet
    2) sync-agent updates switch config ;run Ansible

    View full-size slide

  28. SYNC-AGENT Handle Config Parameter Sheet
    KEY VALUE
    (JSON FORMAT)
    SWITCH001/SWITCH ...
    SWITCH001/INTERFACE ...
    SWITCH001/BGP ...
    SWITCH001/QOS ...
    SWITCH001/ROUTEMAP ...
    SWITCH001/PREFIXLIST ...
    ... ...
    switch001
    watch
    28
    28
    SWITCH
    INTERFACE
    ROUTEMAP PREFIXLIST
    BGP
    QOS
    INTERFACE
    0) update SWITCH001/INTERFACE.
    1) sync-agent detect the change and
    get the INTERFACE config pram sheet
    2) sync-agent updates switch config ;run Ansible

    View full-size slide

  29. SYNC-AGENT Handle Config Parameter Sheet
    KEY VALUE
    (JSON FORMAT)
    SWITCH001/SWITCH ...
    SWITCH001/INTERFACE ...
    SWITCH001/BGP ...
    SWITCH001/QOS ...
    SWITCH001/ROUTEMAP ...
    SWITCH001/PREFIXLIST ...
    ... ...
    switch001
    watch
    29
    29
    SWITCH
    INTERFACE
    ROUTEMAP PREFIXLIST
    BGP
    QOS
    0) update SWITCH001/INTERFACE.
    1) sync-agent detect the change and
    get the INTERFACE config pram sheet
    2) sync-agent updates switch config ;run Ansible
    Update Config

    View full-size slide

  30. SYNC-AGENT Handle Config Parameter Sheet
    switch001
    30
    30
    SWITCH
    INTERFACE
    ROUTEMAP PREFIXLIST
    BGP
    QOS
    - name: include config params
    include_vars:
    dir: ”CFG_PARAM_PATH"
    ・・・
    playbook
    CFG_PARAM_PATH/XXX.json
    include_vars Imports Every Config Parameter Sheets As Ansible vars

    View full-size slide

  31. Vendor Agnostic?
    • Ensure operations
    • On Arista and Cumulus
    • On LINE’s Network
    • Need to change schema
    • When we introduce new vendor switches.
    • When we change our network architecture drastically.
    SWITCH INTERFACE BGP
    QOS
    ROUTEMAP
    PREFIXLIST
    31

    View full-size slide

  32. Yang Schema
    RFC7951: JSON Encoding of Data Modeled with YANG
    YANG JSON
    define
    module interface {
    import ietf-inet-types {
    prefix "inet";
    }
    import ietf-yang-types {
    prefix "yang";
    }
    ...
    leaf mac_address {
    type yang:mac-address;
    }
    ...
    leaf-list ipv4 {
    type inet:ipv4-prefix;
    min-elements 0;
    }
    ...
    parameter sheet
    schema
    EXAMPLE SCHEMA
    32

    View full-size slide

  33. Schema Driven Development
    YANG
    JSON
    Schema
    output
    input
    Pyang generate json schema!
    33

    View full-size slide

  34. Schema Driven Development
    CONFIG-PARAMETER CHANGE PROCESS
    1. update schemas with yang
    2. generate json schemas from yang schemas
    3. deploy generated json schemas to API server
    API-SERVER make sure to
    validate the data just before
    update etcd.
    34

    View full-size slide

  35. {
    ...
    "hostname": "SWITCH00X",
    "network_os": "cumulus",
    ...
    }
    DHCP Option
    Cumulus Linux
    ARISTA
    ...
    "mac": "xxxx.xxxx.xxxx",
    - "ip": "X.X.X.X/X"
    - "ztp-script option code": "XX",
    ...
    {
    ...
    "name": "eth0",
    "type": "management",
    "mac_address": "xxxx.xxxx.xxxx",
    "ip": ["X.X.X.X/X"],
    ...
    }
    config parameter sheet
    dhcpd.conf
    request
    response
    update
    dhcpd.conf
    35

    View full-size slide

  36. Operator Trigger
    Config Update Process
    0) agent watch DB
    1) operator update DB
    2) agent detect the change
    3) Update config (run Ansible)
    37

    View full-size slide

  37. Problem
    ToR SWITCH
    SERVER
    BGP SESSION
    BGPD RUNNING
    38

    View full-size slide

  38. Problem
    ToR SWITCH
    SERVER
    whitelist to filter
    unwanted prefix
    BGPD RUNNING
    unwanted prefix
    39

    View full-size slide

  39. Problem
    Need to identify the switch a server is connecting to
    40

    View full-size slide

  40. Connection Trigger
    KEY VALUE
    ... ...
    SERVER001 {
    ”hostname": SERVER001,
    “ipv4_prefixes”: [192.0.2.0/24],
    “ipv6_prefixes”: []
    }
    ... ...
    server-config
    parameter
    41
    CONFIG PARAM DB
    1) Store prefixes for SERVER001
    SYNC AGENT
    3) Detect the connection of SERVER001 by LLDP
    4) Get the prefixes for SERVER001
    5) Update whitelist ; run Ansible
    6) watch SERVER001
    SERVER
    2) connect to a switch

    View full-size slide

  41. Connection Trigger
    KEY VALUE
    ... ...
    SERVER001 {
    ”hostname": SERVER001,
    “ipv4_prefixes”: [192.0.2.0/24],
    “ipv6_prefixes”: []
    }
    ... ...
    42
    CONFIG PARAM DB
    1) Store prefixes for SERVER001
    SYNC AGENT
    3) Detect the connection of SERVER001 by LLDP
    4) Get the prefixes for SERVER001
    5) Update whitelist ; run Ansible
    6) watch SERVER001
    SERVER
    2) connect to a switch

    View full-size slide

  42. Connection Trigger
    KEY VALUE
    ... ...
    SERVER001 {
    ”hostname": SERVER001,
    “ipv4_prefixes”: [192.0.2.0/24],
    “ipv6_prefixes”: []
    }
    ... ...
    LLDP
    Detect SERVER001
    43
    CONFIG PARAM DB
    1) Store prefixes for SERVER001
    SYNC AGENT
    3) Detect the connection of SERVER001 by LLDP
    4) Get the prefixes for SERVER001
    5) Update whitelist ; run Ansible
    6) watch SERVER001
    SERVER
    2) connect to a switch

    View full-size slide

  43. Connection Trigger
    KEY VALUE
    ... ...
    SERVER001 {
    ”hostname": SERVER001,
    “ipv4_prefixes”: [192.0.2.0/24],
    “ipv6_prefixes”: []
    }
    ... ...
    LLDP
    Get (key=SERVER001)
    44
    CONFIG PARAM DB
    1) Store prefixes for SERVER001
    SYNC AGENT
    3) Detect the connection of SERVER001 by LLDP
    4) Get the prefixes for SERVER001
    5) Update whitelist; run Ansible
    6) watch SERVER001
    SERVER
    2) connect to a switch

    View full-size slide

  44. Connection Trigger
    KEY VALUE
    ... ...
    SERVER001 {
    ”hostname": SERVER001,
    “ipv4_prefixes”: [192.0.2.0/24],
    “ipv6_prefixes”: []
    }
    ... ...
    LLDP
    Add “192.0.2.0/24” to
    the prefix-list
    45
    CONFIG PARAM DB
    1) Store prefixes for SERVER001
    SYNC AGENT
    3) Detect the connection of SERVER001 by LLDP
    4) Get the prefixes for SERVER001
    5) Update whitelist; run Ansible
    6) watch SERVER001
    SERVER
    2) connect to a switch

    View full-size slide

  45. Connection Trigger
    KEY VALUE
    ... ...
    SERVER001 {
    ”hostname": SERVER001,
    “ipv4_prefixes”: [192.0.2.0/24],
    “ipv6_prefixes”: []
    }
    ... ...
    CONFIG PARAM DB
    1) Store prefixes for SERVER001
    SYNC AGENT
    3) Detect the connection of SERVER001 by LLDP
    4) Get the prefixes for SERVER001
    5) Update whitelist ; run Ansible
    6) watch SERVER001
    LLDP
    watch
    watch
    SERVER
    2) connect to a switch
    46

    View full-size slide

  46. BATCH CHANGE
    47

    View full-size slide

  47. Problem
    Automation is good but …
    Batch change is dangerous
    48

    View full-size slide

  48. Grouping
    want to apply a config to multiple switches
    49
    GROUP

    View full-size slide

  49. Group Config Parameter Sheet
    KEY VALUE
    SW001/SWITCH {
    "hostname": "SW001",
    "switch_groups": ["GRP-A"],

    }
    ... ...
    /SWGRP/GRP-A {
    "ipv4_prefixes": [{"action": "deny","prefix": "192.0.2.0/24"}],

    }
    SW001
    config-pram
    type=group
    Introduce new config-param; type=group .
    Multiple switches watch the entry but ...
    Batch change is dangerous.
    watch x 3
    watch x1
    50

    View full-size slide

  50. Sync-Group Config Parameter Sheet
    SW001
    KEY VALUE
    SW001/SWITCH ...
    ... ...
    /SWGRP/GRP-A ...
    ... ...
    /SWGRP/GRP-A/SYNC_GRP …
    ... ...
    config-pram
    type=sync-group
    Introduce new config-param; type=group .
    Also, introduce new config-param; type=sync-group .
    Multiple switches watch the sync-group entry.
    51

    View full-size slide

  51. Sync-Group Config Parameter Sheet
    JSON
    =
    ・・・
    X
    group-sync-state-machine
    config parameter
    state-machine switches in the group
    STATE
    • DONE
    • NOT-YET
    • SYNC 52

    View full-size slide

  52. Group Sync
    53
    0) every switch’s state is DONE.
    1) operator update SWG-A’s config param.
    2) API-SERVER set every switch’s state to NOT-YET.
    3) API-SERVER set SW001’s state to SYNC
    4) sync-agent fetch SWG-A’s config param.
    5) sync-agent update SW001’s config
    6) sync-agent set SW001’s state to DONE
    7) operator set the others to SYNC.
    8) sync-agent fetch SWG-A’s config param.
    9) sync-agent update switch’s config.
    10) sync-agent set the both states to DONE.

    View full-size slide

  53. Group Sync
    54
    0) every switch’s state is DONE.
    1) operator update SWG-A’s config param.
    2) API-SERVER set every switch’s state to NOT-YET.
    3) API-SERVER set SW001’s state to SYNC
    4) sync-agent fetch SWG-A’s config param.
    5) sync-agent update SW001’s config
    6) sync-agent set SW001’s state to DONE
    7) operator set the others to SYNC.
    8) sync-agent fetch SWG-A’s config param.
    9) sync-agent update switch’s config.
    10) sync-agent set the both states to DONE.

    View full-size slide

  54. Group Sync
    55
    0) every switch’s state is DONE.
    1) operator update SWG-A’s config param.
    2) API-SERVER set every switch’s state to NOT-YET.
    3) API-SERVER set SW001’s state to SYNC
    4) sync-agent fetch SWG-A’s config param.
    5) sync-agent update SW001’s config
    6) sync-agent set SW001’s state to DONE
    7) operator set the others to SYNC.
    8) sync-agent fetch SWG-A’s config param.
    9) sync-agent update switch’s config.
    10) sync-agent set the both states to DONE.

    View full-size slide

  55. Group Sync
    56
    0) every switch’s state is DONE.
    1) operator update SWG-A’s config param.
    2) API-SERVER set every switch’s state to NOT-YET.
    3) API-SERVER set SW001’s state to SYNC
    4) sync-agent fetch SWG-A’s config param.
    5) sync-agent update SW001’s config
    6) sync-agent set SW001’s state to DONE
    7) operator set the others to SYNC.
    8) sync-agent fetch SWG-A’s config param.
    9) sync-agent update switch’s config.
    10) sync-agent set the both states to DONE.

    View full-size slide

  56. Group Sync
    57
    0) every switch’s state is DONE.
    1) operator update SWG-A’s config param.
    2) API-SERVER set every switch’s state to NOT-YET.
    3) API-SERVER set SW001’s state to SYNC
    4) sync-agent fetch SWG-A’s config param.
    5) sync-agent update SW001’s config
    6) sync-agent set SW001’s state to DONE
    7) operator set the others to SYNC.
    8) sync-agent fetch SWG-A’s config param.
    9) sync-agent update switch’s config.
    10) sync-agent set the both states to DONE.

    View full-size slide

  57. Group Sync
    58
    0) every switch’s state is DONE.
    1) operator update SWG-A’s config param.
    2) API-SERVER set every switch’s state to NOT-YET.
    3) API-SERVER set SW001’s state to SYNC
    4) sync-agent fetch SWG-A’s config param.
    5) sync-agent update SW001’s config
    6) sync-agent set SW001’s state to DONE
    7) operator set the others to SYNC.
    8) sync-agent fetch SWG-A’s config param.
    9) sync-agent update switch’s config.
    10) sync-agent set the both states to DONE.

    View full-size slide

  58. How To Join Group
    59
    switch_groups property
    in SWITCH parameter
    sheet shows groups .

    View full-size slide

  59. How To Join Group
    60
    1) Operator updates SW004’s switch_groups.
    2) sync-agent fetches SWG-A PARAM.
    3) sync-agent updates switch config.
    4) sync-agent add SW004’s state machine.
    5) sync-agent watch sync-group parameter.

    View full-size slide

  60. How To Join Group
    61
    1) Operator updates SW004’s switch_groups.
    2) sync-agent fetches SWG-A PARAM.
    3) sync-agent updates switch config.
    4) sync-agent add SW004’s state machine.
    5) sync-agent watch sync-group parameter.

    View full-size slide

  61. How To Join Group
    62
    1) Operator updates SW004’s switch_groups.
    2) sync-agent fetches SWG-A PARAM.
    3) sync-agent updates switch config.
    4) sync-agent add SW004’s state machine.
    5) sync-agent watch sync-group parameter.

    View full-size slide

  62. How To Join Group
    63
    1) Operator updates SW004’s switch_groups.
    2) sync-agent fetches SWG-A PARAM.
    3) sync-agent updates switch config.
    4) sync-agent add SW004’s state machine.
    5) sync-agent watch sync-group parameter.

    View full-size slide

  63. HUMAN ERROR
    64

    View full-size slide

  64. One Command Operation
    • Ex) DEVICE ISOLATION
    $ device-isolation isolate SW-001 --level 1
    65

    View full-size slide

  65. Monitoring
    • Any application which we develop includes prometheus exporter function.
    • service discovery by consul
    • Slack notification
    66

    View full-size slide

  66. CONCLUSION
    67

    View full-size slide

  67. LINE’s Network Orchestrator
    68
    1) SCALABILITY
    2) MULTIVENDOR
    3) TRIGGER
    4) BATCH CHANGE
    5) HUMAN ERROR

    View full-size slide

  68. Current
    2020.May 2021.Feb
    DEVELOPMENT MAINTENANCE
    2,000+
    SWITCHES
    Run on prod env since 2021.Feb
    NOW
    69

    View full-size slide

  69. Future Work
    • Rollback feature
    • Dry-Run feature
    • Introduce k8s CR
    2020.May 2021.Feb
    DEVELOPMENT MAINTENANCE
    NOW
    70

    View full-size slide

  70. DISCUSSION
    71

    View full-size slide