Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEのネットワークオーケストレータをリニューアルした話 / LINE’s new Network Orchestrator

LINEのネットワークオーケストレータをリニューアルした話 / LINE’s new Network Orchestrator

JANOG49 Meetingの登壇資料です
https://www.janog.gr.jp/meeting/janog49/lineorc/
講演者:福田 守昴
Verda Platform室 ネットワーク開発チーム所属

A3966f193f4bef226a0d3e3c1f728d7f?s=128

LINE Developers
PRO

January 27, 2022
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. LINEのネットワークオーケストレータをリニューアルした話 LINE株式会社 福田守昴 1

  2. @LINE ・Network Orchestrator development ・White box NOS development ・Telecom Infra

    Project ・IoT Gateway firmware development ・IoT protocol stack development ・enterprise NOS test and release engineering ・test automation system development Subaru Fukuda 2016.Apr - 2018.Apr 2018.Mar - 2019.Sep 2019.Oct - 2020.Oct 2020.Nov - NOW About Me
  3. What is Verda? 80,000+ Virtual Machine 40,000+ Baremetal 6,000+ Hypervisor

    NAT Load Balancer VM / Baremetal MySQL Elasticsearch Image Repo Shared Filesystem DNS App engine (like heroku) Controller And More… 3
  4. Underlay Network LINEのネットワークをゼロから再設計した話@JANOG43 <https://www.janog.gr.jp/meeting/janog43/program/line/> 4

  5. Multi Components Architecture 5

  6. Problems 1) SCALABILITY 2) MULTIVENDOR 3) TRIGGER 4) BATCH CHANGE

    5) HUMAN ERROR 6
  7. SCALABILITY 7

  8. Problem Config Update Process ①Update Database ②Create Inventory ③Apply Config(Run

    Ansible) 8
  9. Problem Problem • Ansible server load is big • It

    takes a long time • manual operations are required. • To update database • To generate inventory • To run Ansible 9
  10. Agent Application 1:N 1:1 10

  11. Agent Sync Config 11 Config Update Process 0) agent watch

    DB 1) operator update DB 2) agent detect the change 3) Update config (run Ansible)
  12. Agent Sync Config 12 Config Update Process 0) agent watch

    DB 1) operator update DB 2) agent detect the change 3) Update config (run Ansible)
  13. Agent Sync Config 13 Config Update Process 0) agent watch

    DB 1) operator update DB 2) agent detect the change 3) Update config (run Ansible)
  14. Agent Sync Config Config Update Process 0) agent watch DB

    1) operator update DB 2) agent detect the change 3) Update config (run Ansible) 14
  15. Agent Deployment Process PROVISION • SOME INITIAL SETUP • INSTALL

    Docker • DEPLOY AGENT ZTP SCRIPT • SETUP FOR SSH • PROVISION REQUEST 15
  16. MULTI VENDOR 16

  17. Problem ARISTA Cumulus Linux How to apply config is different

    between Cumulus and ARISTA. 17
  18. vendor agnostic vendor specific Data Flow Only Ansible playbook should

    have vendor specific code. 18
  19. Ansible Tag Cumulus Linux ARISTA target: localhost tag: cumulus target:

    localhost tag: arista - name: example-task1 XXX: XXXARG: "example" tags: cumulus - name: example-task1 XXX: XXXARG: "example" tags: arista 19 Environmental variable nos={cumulus | arista}
  20. Ansible Tag Cumulus Linux ARISTA target: localhost tag: cumulus target:

    localhost tag: arista - name: example-task1 XXX: XXXARG: "example" tags: cumulus - name: example-task1 XXX: XXXARG: "example" tags: arista 20
  21. vendor agnostic vendor specific Data Flow How did we realize

    vendor agnostic config param DB? 21
  22. Config Parameter Sheet 1. SWITCH • hostname, os-version, server-room, etc

    2. INTERFACE • mac, speed, mtu, ip, etc 3. BGP • AS, neighbor, peer-group 4. QOS • config for shaping 5. ROUTEMAP • ingress/egress routemap 6. PREFIXLIST • Ipv4/ipv6 prefixlist SWITCH INTERFACE BGP QOS ROUTEMAP PREFIXLIST 22
  23. Config Parameter Sheet 1. SWITCH • hostname, os-version, server-room, etc

    2. INTERFACE • mac, speed, mtu, ip, etc 3. BGP • AS, neighbor, peer-group 4. QOS • config for shaping 5. ROUTEMAP • ingress/egress routemap 6. PREFIXLIST • Ipv4/ipv6 prefixlist { "routemap-001": { "entries": [ { "action": "permit", "sequence": 10, "set_actions": [ { "action": "as-path prepend", "value": "auto auto auto auto auto" } ] } ] } … } EX)ROUTEMAP PARAMETER SHEET 23
  24. Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH ... SWITCH001/INTERFACE

    ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... SWITCH002/SWITCH ... SWITCH002/INTERFACE ... SWITCH002/BGP ... ... ... ・・・ switch001 ・・・ 24
  25. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... SWITCH002/SWITCH ... SWITCH002/INTERFACE ... SWITCH002/BGP ... ... ... switch001 ・・・ watch 25
  26. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... SWITCH002/SWITCH ... SWITCH002/INTERFACE ... SWITCH002/BGP ... ... ... switch001 watch 26 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS
  27. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... switch001 watch 27 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS 0) update SWITCH001/INTERFACE. 1) sync-agent detect the change and get the INTERFACE config pram sheet 2) sync-agent updates switch config ;run Ansible
  28. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... switch001 watch 28 28 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS INTERFACE 0) update SWITCH001/INTERFACE. 1) sync-agent detect the change and get the INTERFACE config pram sheet 2) sync-agent updates switch config ;run Ansible
  29. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... switch001 watch 29 29 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS 0) update SWITCH001/INTERFACE. 1) sync-agent detect the change and get the INTERFACE config pram sheet 2) sync-agent updates switch config ;run Ansible Update Config
  30. SYNC-AGENT Handle Config Parameter Sheet switch001 30 30 SWITCH INTERFACE

    ROUTEMAP PREFIXLIST BGP QOS - name: include config params include_vars: dir: ”CFG_PARAM_PATH" ・・・ playbook CFG_PARAM_PATH/XXX.json include_vars Imports Every Config Parameter Sheets As Ansible vars
  31. Vendor Agnostic? • Ensure operations • On Arista and Cumulus

    • On LINE’s Network • Need to change schema • When we introduce new vendor switches. • When we change our network architecture drastically. SWITCH INTERFACE BGP QOS ROUTEMAP PREFIXLIST 31
  32. Yang Schema RFC7951: JSON Encoding of Data Modeled with YANG

    YANG JSON define module interface { import ietf-inet-types { prefix "inet"; } import ietf-yang-types { prefix "yang"; } ... leaf mac_address { type yang:mac-address; } ... leaf-list ipv4 { type inet:ipv4-prefix; min-elements 0; } ... parameter sheet schema EXAMPLE SCHEMA 32
  33. Schema Driven Development YANG JSON Schema output input Pyang generate

    json schema! 33
  34. Schema Driven Development CONFIG-PARAMETER CHANGE PROCESS 1. update schemas with

    yang 2. generate json schemas from yang schemas 3. deploy generated json schemas to API server API-SERVER make sure to validate the data just before update etcd. 34
  35. { ... "hostname": "SWITCH00X", "network_os": "cumulus", ... } DHCP Option

    Cumulus Linux ARISTA ... "mac": "xxxx.xxxx.xxxx", - "ip": "X.X.X.X/X" - "ztp-script option code": "XX", ... { ... "name": "eth0", "type": "management", "mac_address": "xxxx.xxxx.xxxx", "ip": ["X.X.X.X/X"], ... } config parameter sheet dhcpd.conf request response update dhcpd.conf 35
  36. TRIGGER 36

  37. Operator Trigger Config Update Process 0) agent watch DB 1)

    operator update DB 2) agent detect the change 3) Update config (run Ansible) 37
  38. Problem ToR SWITCH SERVER BGP SESSION BGPD RUNNING 38

  39. Problem ToR SWITCH SERVER whitelist to filter unwanted prefix BGPD

    RUNNING unwanted prefix 39
  40. Problem Need to identify the switch a server is connecting

    to 40
  41. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... server-config parameter 41 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  42. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... 42 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  43. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... LLDP Detect SERVER001 43 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  44. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... LLDP Get (key=SERVER001) 44 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  45. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... LLDP Add “192.0.2.0/24” to the prefix-list 45 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  46. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 LLDP watch watch SERVER 2) connect to a switch 46
  47. BATCH CHANGE 47

  48. Problem Automation is good but … Batch change is dangerous

    48
  49. Grouping want to apply a config to multiple switches 49

    GROUP
  50. Group Config Parameter Sheet KEY VALUE SW001/SWITCH { "hostname": "SW001",

    "switch_groups": ["GRP-A"], … } ... ... /SWGRP/GRP-A { "ipv4_prefixes": [{"action": "deny","prefix": "192.0.2.0/24"}], … } SW001 config-pram type=group Introduce new config-param; type=group . Multiple switches watch the entry but ... Batch change is dangerous. watch x 3 watch x1 50
  51. Sync-Group Config Parameter Sheet SW001 KEY VALUE SW001/SWITCH ... ...

    ... /SWGRP/GRP-A ... ... ... /SWGRP/GRP-A/SYNC_GRP … ... ... config-pram type=sync-group Introduce new config-param; type=group . Also, introduce new config-param; type=sync-group . Multiple switches watch the sync-group entry. 51
  52. Sync-Group Config Parameter Sheet JSON = ・・・ X group-sync-state-machine config

    parameter state-machine switches in the group STATE • DONE • NOT-YET • SYNC 52
  53. Group Sync 53 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  54. Group Sync 54 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  55. Group Sync 55 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  56. Group Sync 56 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  57. Group Sync 57 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  58. Group Sync 58 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  59. How To Join Group 59 switch_groups property in SWITCH parameter

    sheet shows groups .
  60. How To Join Group 60 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  61. How To Join Group 61 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  62. How To Join Group 62 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  63. How To Join Group 63 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  64. HUMAN ERROR 64

  65. One Command Operation • Ex) DEVICE ISOLATION $ device-isolation isolate

    SW-001 --level 1 65
  66. Monitoring • Any application which we develop includes prometheus exporter

    function. • service discovery by consul • Slack notification 66
  67. CONCLUSION 67

  68. LINE’s Network Orchestrator 68 1) SCALABILITY 2) MULTIVENDOR 3) TRIGGER

    4) BATCH CHANGE 5) HUMAN ERROR
  69. Current 2020.May 2021.Feb DEVELOPMENT MAINTENANCE 2,000+ SWITCHES Run on prod

    env since 2021.Feb NOW 69
  70. Future Work • Rollback feature • Dry-Run feature • Introduce

    k8s CR 2020.May 2021.Feb DEVELOPMENT MAINTENANCE NOW 70
  71. DISCUSSION 71