Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Saltconf 2017 - Orchestration with network devices: challenges and solutions

Mircea Ulinic
November 02, 2017

Saltconf 2017 - Orchestration with network devices: challenges and solutions

One of the major challenges in networking is the diversity, in terms of data representation, which is often vendor-specific. Vendors APIs are inconsistent and incomplete, some mainstreams platforms are closed, and custom software is not allowed on your device.
By combining Salt proxy minions, with third party libraries such as NAPALM, which presents the data in a vendor-agnostic shape, we are able to leverage the DevOps methodologies in networking. NAPALM support is now integrated in the official Salt releases, beginning with Carbon and improved in Nitrogen. Beyond cross-vendor configuration management, reaction to network events becomes easy and there are no orchestration boundaries.

SaltConf17, Salt Lake City, UT
http://saltconf.com/saltconf17-agenda/

PDF version: https://eventmobi.com/api/events/20058/documents/download/70dfc2fa-2215-468c-b3b4-882b63e76679.pdf/as/Saltconf_2017_Orchestration_with_network_devices_challenges_and_solutions_Mircea_Ulinic_Cloudflare.pdf

Mircea Ulinic

November 02, 2017
Tweet

More Decks by Mircea Ulinic

Other Decks in Technology

Transcript

  1. 2 Mircea Ulinic • Network software engineer at Cloudflare •

    Previously research and teaching assistant at EPFL, Switzerland • Member and maintainer at NAPALM Automation • Integrated NAPALM in Salt • OpenConfig representative • https://mirceaulinic.net/ @mirceaulinic mirceaulinic
  2. 3 Cloudflare • How big? ◦ 7+ million zones/domains ◦

    Authoritative for ~40% of Alexa top 1 million ◦ 200 million Internet users served ◦ 86+ billion DNS queries/day ▪ Largest ▪ Fastest ▪ 35% of the Internet requests ◦ 10 trillion requests / month ◦ 10% of the Internet traffic • 120+ anycast locations globally ◦ 50 countries (and growing) ◦ Many hundreds of network devices
  3. Agenda • What is the Internet • What is a

    network device • NetDevOps challenges • Cross-platform API: NAPALM • Salt in NetDevOps • Even-driven network automation 4
  4. What is the Internet • Internet != web • A

    network of (computer/data) networks 5 http://bit.ly/2p8Ipbn CC BY 2.0
  5. 6 Network devices that exchange data with each other, through

    a cabled or wireless connection. What is the Internet What is a network
  6. 8 Information to retrieve and load: • Configuration, e.g., a

    list of NTP peers, BGP neighbors, firewall, OSPF interfaces etc. • Operational data (state), e.g., interfaces up?, BGP neighbors connected?, NTP peers synchronised? etc. What is a network device Data is key
  7. 10 What is a network device Network vendor definition Company

    that produces hardware and software (traditionally able to run only on their hardware) of arguable quality. Mostly interested in selling products and proprietary solutions rather than useful tools for customers. Often they acknowledge bugs. Sometimes, they even fix them.
  8. 12 What is a network device Incompatibility examples: Configuration protocols

    { bgp { group 4-PUBLIC-ANYCAST-PEERS { neighbor 192.168.0.1 { description "Amazon [WW HOSTING ANYCAST]"; family inet { unicast { prefix-limit { maximum 500; } } peer-as 16509; } } } router bgp 13335 neighbor 192.168.0.1 remote-as 16509 use neighbor-group 4-PUBLIC-ANYCAST-PEERS description "Amazon [WW HOSTING ANYCAST]" address-family ipv4 unicast maximum-prefix 500 bgp group "4-PUBLIC-ANYCAST-PEERS" description "Amazon [WW HOSTING ANYCAST]" neighbor 192.168.0.1 peer-as 16509 max-prefix 500 exit exit exit
  9. 13 What is a network device Incompatibility examples: Operational (1)

    [email protected]> show bgp neighbor 192.168.0.1 Peer: 192.168.0.1 AS 16509 Local: 192.168.0.2 AS 13335 Description: Amazon [WW HOSTING ANYCAST] Group: 4-PUBLIC-ANYCAST-PEERS Routing-Instance: master Forwarding routing-instance: master Type: External State: Established Flags: <Sync RSync> Last State: Idle Last Event: RecvKeepAlive Export: [ POLICY-OUT ] Import: [ POLICY-IN ] Address families configured: inet-unicast Holdtime: 90 Preference: 170 Local AS: 13335 Local System AS: 13335 Number of flaps: 0 Peer ID: 192.168.0.1 Local ID: 172.17.17.1 Active Holdtime: 90 Keepalive Interval: 30 Group index: 9 Peer index: 3 Table inet.0 Bit: 20003 RIB State: BGP restart is complete Send state: in sync Active prefixes: 279 Received prefixes: 279
  10. 14 What is a network device Incompatibility examples: Operational (2)

    RP/0/RSP0/CPU0:edge01.flw01#show bgp neighbor 192.168.0.1 detail BGP neighbor is 192.168.0.1 Remote AS 16509, local AS 13335, external link Description: Amazon [WW HOSTING ANYCAST] Remote router ID 192.168.0.1 BGP state = Established, up for 2d08h Hold time is 30, keepalive interval is 10 seconds Configured hold time: 180, keepalive: 60, min acceptable hold time: 3 Minimum time between advertisement runs is 30 secs For Address Family: IPv4 Unicast BGP neighbor version 674653686 Update group: 0.15 Filter-group: 0.5 No Refresh request being processed Route refresh request: received 0, sent 0 Policy for incoming advertisements is POLICY-IN Policy for outgoing advertisements is POLICY-OUT 332 accepted prefixes, 205 are bestpaths Cumulative no. of prefixes denied: 0. Prefix advertised 203, suppressed 0, withdrawn 0
  11. 15 What is a network device Incompatibility examples: Operational (3)

    edge01.oua01#show ip bgp neighbors 192.168.0.1 BGP neighbor is 192.168.0.1, remote AS 16509, external link Description: "Amazon [WW HOSTING ANYCAST]" BGP version 4, remote router ID 172.17.17.2, VRF default Inherits configuration from and member of peer-group 4-PUBLIC-ANYCAST-PEERS Negotiated BGP version 4 Hold time is 90, keepalive interval is 30 seconds Configured hold time is 180, keepalive interval is 60 seconds BGP state is Established, up for 18d00h Number of transitions to established: 1 Last state was OpenConfirm Neighbor Capabilities: Multiprotocol IPv4 Unicast: advertised and received and negotiated Four Octet ASN: advertised and received Route Refresh: advertised and received and negotiated Inbound route map is POLICY-IN Outbound route map is POLICY-OUT Local AS is 13335, local router ID 162.158.232.1
  12. 16 Challenges Vendors-specific APIs Platform API description Juniper (any platform)

    XML over NETCONF 1.0 Cisco IOS-XR XML over SSH/Telnet (proprietary solution) Latest versions: XML over NETCONF 1.1 and JSON over gRPC (if lucky) Cisco IOS N/A Cisco NX-OS JSON over HTTP Arista JSON over HTTP, later REST and gRPC 0 inter-compatibility, sometimes 0 consistency
  13. 17 Challenges Inconsistency: same vendor, same platform { "TABLE_interface": {

    "ROW_interface": { "interface": "Ethernet1/1", "state": "up", "admin_state": "up", "share_state": "Dedicated", "eth_hw_desc": "100/1000/10000 Ethernet", "eth_duplex": "full", "eth_speed": "1000 Mb/s", "eth_link_flapped": "2d13h", "eth_clear_counters": "never", } } } { "TABLE_interface": { "ROW_interface": { "interface": "Ethernet1/1", "state": "up", "share_state": "Dedicated", "eth_hw_desc": "1000/10000 Ethernet", "eth_duplex": "full", "eth_speed": "10 Gb/s", "eth_mtu": "1500", "eth_link_flapped": "5week(s) 4day(s)", "eth_bw": [ "10000000", "10000000" ], "eth_clear_counters": "never" Cisco NX-OS 7.0(3)I4(6) * Cisco NX-OS 7.3(1)N1(1) * * = notice the human understandable version number. Pretty intuitive, isn’t it? Pointless duplicate
  14. 18 Challenges Sometimes the API simply does not work (1)

    sw03.bjm01# show lldp neighbors | json output conversion failed due to conv error, bytes 0xC4 0x54 0x44 0xBD encoder error This is not valid JSON
  15. 19 Challenges Sometimes the API simply does not work (2)

    XML> <?xml version="1.0" encoding="UTF-8"?> <Request MajorVersion="1" MinorVersion="0"> <Get> . ~~~ snip ~~~ . </Get> </Request> ERROR: 0xa367a600 'XML Service Library' detected the 'fatal' condition 'The throttle on the memory usage has been reached. Please optimize the request to query smaller data.' Supposed to return a XML document, not an error
  16. 20 Challenges Install custom software Traditionally not possible Recently: whitebox

    devices (e.g., Arista, Cumulus) The base operating system is usually very old, e.g., Arista EOS is based on Fedora 18...
  17. 21 Multi-vendor networks challenges • Inconsistent and incompatible vendor-specific representation

    of configuration and operational data • Inconsistent and incomplete APIs • Proprietary details, specific to a vendor only (i.e., different naming and functionality for similar industry standards)
  18. Cross-platform API: NAPALM 22 NAPALM (Network Automation and Programmability Abstraction

    Layer with Multivendor support) https://github.com/napalm-automation
  19. 23 NAPALM Cross-vendor API (1) from napalm import get_network_driver driver

    = get_network_driver('junos') instance = driver('edge01.bjm01', 'username', 'password') instance.open() instance.get_bgp_neighbors() from napalm import get_network_driver driver = get_network_driver('ios') instance = driver('edge01.flw01', 'username', 'password') instance.open() instance.get_bgp_neighbors()
  20. 24 { u'global': { u'peers': { u'192.168.0.2': { u'address_family': {

    u'ipv4': { u'accepted_prefixes': 142, u'received_prefixes': 142, u'sent_prefixes': 0 } }, 'description': u'Amazon [WW HOSTING ANYCAST]', 'is_enabled': True, 'is_up': True, 'local_as': 13335, 'remote_as': 16509, 'remote_id': u'10.10.10.1', 'uptime': 8816095 } } } } { u'global': { u'peers': { u'172.17.17.2': { u'address_family': { u'ipv4': { u'accepted_prefixes': 5090, u'received_prefixes': 5090, u'sent_prefixes': 0 } }, 'description': u'Amazon [WW HOSTING ANYCAST]', 'is_enabled': True, 'is_up': True, 'local_as': 13335, 'remote_as': 16509, 'remote_id': u'10.10.10.1', 'uptime': 456323 } } } } NAPALM
  21. 25 NetDevOps challenges • Provision new devices • React to

    certain events and (re-)configure • Human error factor • Replace equipment • Monitor
  22. 27 Frameworks used in networking Ansible (1) Inconsistent and incompatible

    vendor-specific modules Lacks many features needed for network automation
  23. “ The documentation of the Ansible Core networking modules (like

    eos_facts, eos_command, nxos_command) is very poor. This caused me to waste a lot of time (trying to working out which arguments to use). Similarly, I was frustrated by inconsistencies in the output and behavior of the different modules across platforms. ” 28 https://pynet.twb-tech.com/blog/ansible/ansible-network-backup.html Frameworks used in networking Ansible (2) Kirk Byers, Network automation and Ansible instructor
  24. Salt Architecture 31 C. R. Oldham Problem: you can’t install

    Minions on traditional network devices!
  25. Salt Architecture 32 NAPALM Solution: Proxy Minions They behave like

    minions, but can manage network devices, remotely. C. R. Oldham
  26. 33

  27. Salt in NetDevOps Device pillar example /etc/salt/pillar/device1.sls 36 proxy: proxytype:

    napalm driver: junos host: hostname_or_ip_address username: my_username passwd: my_password Choose between: junos, eos, ios, iosxr, nxos, etc. See the complete list. Complete documentation at: https://docs.saltstack.com/en/develop/ref/proxy/all/salt.proxy.napalm.html
  28. Salt in NetDevOps Vendor-agnostic automation (1) 37 $ sudo salt

    iosxr-router net.arp iosxr-router: ---------- out: |_ ---------- age: 1620.0 interface: Bundle-Ether4 ip: 10.0.0.2 mac: 00:25:90:20:46:B5 |_ ---------- age: 8570.0 $ sudo salt junos-router net.arp junos-router: ---------- out: |_ ---------- age: 129.0 interface: ae2.100 ip: 10.0.0.1 mac: 84:B5:9C:CD:09:73 |_ ---------- age: 1101.0
  29. Salt in NetDevOps Vendor-agnostic automation (2) 38 $ sudo salt

    junos-router state.sls ntp junos-router: ---------- ID: oc_ntp_netconfig Function: netconfig.managed Result: True Comment: Configuration changed! Started: 10:53:25.624396 Duration: 3494.153 ms Changes: ---------- diff: [edit system ntp] - peer 172.17.17.2; [edit system ntp] + server 10.10.10.1 prefer; + server 10.10.10.2; - server 172.17.17.1 version 2 prefer; $ sudo salt iosxr-router state.sls ntp iosxr-router: ---------- ID: oc_ntp_netconfig Function: netconfig.managed Result: True Comment: Configuration changed! Started: 11:02:39.162423 Duration: 3478.683 ms Changes: ---------- diff: --- +++ @@ -1,4 +1,10 @@ +ntp + server 10.10.10.1 prefer + server 10.10.10.2 !
  30. Salt in NetDevOps CLI Examples 39 $ sudo salt 'edge*'

    net.traceroute 8.8.8.8 # execute traceroute on all devices whose minion ID starts with ‘edge’ $ sudo salt -N NA transit.disable cogent # disable Cogent in North-America $ sudo salt -G 'os:junos' net.cli “show version” # execute ‘show version’ on Juniper devices $ sudo salt -C 'edge* and G@os:iosxr and G@version:6.0.2' net.arp # get the ARP tables from devices whose ID starts with edge*, running IOS-XR 6.0.2 $ sudo salt -G 'model:MX480' probes.results # retrieve the results of the RPM probes from Juniper MX480 routers
  31. Salt in NetDevOps Salt Architecture 41 Problem: you can’t install

    Minions on traditional network devices! What about the others???
  32. 42 • White box devices ◦ Arista EOS ◦ Cumulus

    ◦ etc. • Containerised solutions ◦ Cisco IOS-XR (64 bit only) ◦ Cisco NX-OS ◦ etc. Salt in NetDevOps Platforms that can be managed like servers
  33. 43 • Junos • Cisco IOS-XR, 32 bit • Cisco

    IOS-XE, IOS • Many many others... 43 Salt in NetDevOps Platforms that can’t be managed like servers (traditional, old school systems)
  34. 44 edge01.bjm01#copy https://salt-eos.netops.life/salt-eos-latest.swix flash: edge01.bjm01#copy https://salt-eos.netops.life/startup.sh flash: Copy the SWIX

    extension to the flash edge01.bjm01#copy flash:salt-eos-latest.swix extension: edge01.bjm01#extension salt-eos-latest.swix force Install the SWIX extension Salt in NetDevOps Arista EOS Salt minion: Installation edge01.bjm01#bash #sudo /mnt/flash/startup.sh Execute the Salt Minion startup script Complete installation notes at: https://docs.saltstack.com/en/latest/topics/installation/eos.html
  35. 45 edge01.bjm01#copy https://salt-eos.netops.life/salt-eos-latest.swix flash: edge01.bjm01#copy https://salt-eos.netops.life/startup.sh flash: Copy the SWIX

    extension to the flash edge01.bjm01#copy flash:salt-eos-latest.swix extension: edge01.bjm01#extension salt-eos-latest.swix force Install the SWIX extension Salt in NetDevOps Arista EOS Salt minion: Installation edge01.bjm01#bash #sudo /mnt/flash/startup.sh Execute the Salt Minion startup script Complete installation notes at: https://docs.saltstack.com/en/latest/topics/installation/eos.html
  36. 46 $ sudo salt 'some-server' disk.usage some-server: ---------- /: ----------

    1K-blocks: 65869280 available: 60808360 capacity: 8% filesystem: rootfs used: 5060920 /dev: ---------- 1K-blocks: 65902000 Salt in NetDevOps CLI execution: server
  37. 47 $ sudo salt 'edge01.bjm01' disk.usage edge01.bjm01: ---------- /: ----------

    1K-blocks: 4870812 available: 4812376 capacity: 2% filesystem: none used: 58436 /dev: ---------- 1K-blocks: 8192 * This is real output collected from a device carrying Internet traffic Salt in NetDevOps CLI execution: Arista EOS minion*
  38. 48 wget -O bootstrap-salt.sh https://bootstrap.saltstack.com 1. Download the Salt bootstrap

    script sudo sh bootstrap-salt.sh 3. Install the Salt minion 2. Check the script!!! Salt in NetDevOps Cumulus Linux Salt minion: Installation
  39. 49 napalm/syslog/junos/NTP_SERVER_UNREACHABLE/edge01.bjm01 { "error": "NTP_SERVER_UNREACHABLE", "facility": 12, "host": "edge01.bjm01", "ip":

    "10.10.0.1", "os": "junos", "timestamp": 1499986394, "yang_message": { "system": { "ntp": { "servers": { "server": { "172.17.17.1": { "state": { "association-type": "SERVER", "stratum": 16 } } } } } } }, "yang_model": "openconfig-system" } Salt in NetDevOps Salt event bus Read https://napalm-automation.net/napalm-logs-released/ for more details on how to import such events And https://mirceaulinic.net/2017-10-19-event-driven-netw ork-automation/ For more examples
  40. 50 reactor: - 'napalm/syslog/*/NTP_SERVER_UNREACHABLE/*': - salt://reactor/exec_ntp_state.sls /etc/salt/reactor/exec_ntp_state.sls triggered NTP state:

    cmd.state.sls: - tgt: {{ data.host }} - arg: - ntp /etc/salt/master Matches the event tag napalm/syslog/junos/NTP_SERVER_UNREACHABLE/edge01.bjm01 $ sudo salt edge01.bjm01 state.sls ntp CLI Equivalent: Salt in NetDevOps Fully automated configuration changes
  41. 51 reactor: - 'napalm/syslog/*/INTERFACE_DOWN/*': - salt://reactor/if_down_shutdown.sls - salt://reactor/if_down_send_mail.sls Shutdown the

    interface /etc/salt/master Matches the event tag napalm/syslog/junos/INTERFACE_DOWM/edge01.bjm01 (Event pushed when an interface is operationally down) Salt in NetDevOps Fully automated configuration changes & more Send an email notification More details at: https://mirceaulinic.net/2017-10-19-event-driven-network-automation/
  42. References Authentication system Beacons Engines Event System Grains Jinja load_template

    documentation Master config file, example Master configuration options Mine NAPALM NAPALM BGP execution module functions NAPALM Grains NAPALM Installation NAPALM network execution module functions NAPALM NTP execution module functions NAPALM Proxy NAPALM route execution module functions Nested outputter NETAPI Modules Netconfig state 56 Node Groups NTP state Orchestration Pillar Pillar modules Proxy Minion Reactor REST CherryPy Returners Runners Salt 2016.11 (Carbon) release notes Salt Get Started Salt Installation Salt Walkthrough SaltStack Package Repo SNMP state States Targeting minions The Top file Users state YAML