Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NetDevOps: DevOps in Networking: Challenges and...

NetDevOps: DevOps in Networking: Challenges and Solutions

Hackference 2017, Birmingham, UK
https://2017.hackference.co.uk/

Mircea Ulinic

October 20, 2017
Tweet

More Decks by Mircea Ulinic

Other Decks in Technology

Transcript

  1. 2 Mircea Ulinic • Network software engineer at Cloudflare •

    Prev research and teaching assistant at EPFL, Switzerland • Member and maintainer at NAPALM Automation • Integrated NAPALM in Salt • OpenConfig representative • https://mirceaulinic.net/ @mirceaulinic mirceaulinic
  2. 3 Cloudflare • How big? ◦ 7+ million zones/domains ◦

    Authoritative for ~40% of Alexa top 1 million ◦ 200 million Internet users served ◦ 86+ billion DNS queries/day ▪ Largest ▪ Fastest ▪ 35% of the Internet requests ◦ 10 trillion requests / month ◦ 10% of the Internet traffic • 120+ anycast locations globally ◦ 50 countries (and growing) ◦ Many hundreds of network devices
  3. Agenda • What is the Internet • What is a

    network device • NetDevOps challenges • Hacking with NAPALM • Hacking network devices with Salt • Real-world network orchestration example 4
  4. What is the Internet • Internet != web • A

    network of (computer/data) networks 5 http://bit.ly/2p8Ipbn CC BY 2.0
  5. 6 Network devices that exchange data with each other, through

    a cabled or wireless connection. What is the Internet What is a network
  6. 8 Information to retrieve and load: • Configuration, e.g., a

    list of NTP peers, BGP neighbors, firewall, OSPF interfaces etc. • Operational data (state), e.g., interfaces up?, BGP neighbors connected?, NTP peers synchronised? etc. What is a network device Data is key
  7. 10 What is a network device Network vendor definition Company

    that produces hardware and software (traditionally able to run only on their hardware) of arguable quality. Mostly interested in selling products and proprietary solutions rather than useful tools for customers. Often they acknowledge bugs. Sometimes, they even fix them.
  8. 12 Challenges Incompatibility examples: Configuration protocols { bgp { group

    4-PUBLIC-ANYCAST-PEERS { neighbor 192.168.0.1 { description "Amazon [WW HOSTING ANYCAST]"; family inet { unicast { prefix-limit { maximum 500; } } peer-as 16509; } } } router bgp 13335 neighbor 192.168.0.1 remote-as 16509 use neighbor-group 4-PUBLIC-ANYCAST-PEERS description "Amazon [WW HOSTING ANYCAST]" address-family ipv4 unicast maximum-prefix 500 bgp group "4-PUBLIC-ANYCAST-PEERS" description "Amazon [WW HOSTING ANYCAST]" neighbor 192.168.0.1 peer-as 16509 max-prefix 500 exit exit exit
  9. 13 Challenges Incompatibility examples: Operational (1) [email protected]> show bgp neighbor

    192.168.0.1 Peer: 192.168.0.1 AS 16509 Local: 192.168.0.2 AS 13335 Description: Amazon [WW HOSTING ANYCAST] Group: 4-PUBLIC-ANYCAST-PEERS Routing-Instance: master Forwarding routing-instance: master Type: External State: Established Flags: <Sync RSync> Last State: Idle Last Event: RecvKeepAlive Export: [ POLICY-OUT ] Import: [ POLICY-IN ] Address families configured: inet-unicast Holdtime: 90 Preference: 170 Local AS: 13335 Local System AS: 13335 Number of flaps: 0 Peer ID: 192.168.0.1 Local ID: 172.17.17.1 Active Holdtime: 90 Keepalive Interval: 30 Group index: 9 Peer index: 3 Table inet.0 Bit: 20003 RIB State: BGP restart is complete Send state: in sync Active prefixes: 279 Received prefixes: 279
  10. 14 Challenges Incompatibility examples: Operational (2) RP/0/RSP0/CPU0:edge01.flw01#show bgp neighbor 192.168.0.1

    detail BGP neighbor is 192.168.0.1 Remote AS 16509, local AS 13335, external link Description: Amazon [WW HOSTING ANYCAST] Remote router ID 192.168.0.1 BGP state = Established, up for 2d08h Hold time is 30, keepalive interval is 10 seconds Configured hold time: 180, keepalive: 60, min acceptable hold time: 3 Minimum time between advertisement runs is 30 secs For Address Family: IPv4 Unicast BGP neighbor version 674653686 Update group: 0.15 Filter-group: 0.5 No Refresh request being processed Route refresh request: received 0, sent 0 Policy for incoming advertisements is POLICY-IN Policy for outgoing advertisements is POLICY-OUT 332 accepted prefixes, 205 are bestpaths Cumulative no. of prefixes denied: 0. Prefix advertised 203, suppressed 0, withdrawn 0
  11. 15 Challenges Incompatibility examples: Operational (3) edge01.oua01#show ip bgp neighbors

    192.168.0.1 BGP neighbor is 192.168.0.1, remote AS 16509, external link Description: "Amazon [WW HOSTING ANYCAST]" BGP version 4, remote router ID 172.17.17.2, VRF default Inherits configuration from and member of peer-group 4-PUBLIC-ANYCAST-PEERS Negotiated BGP version 4 Hold time is 90, keepalive interval is 30 seconds Configured hold time is 180, keepalive interval is 60 seconds BGP state is Established, up for 18d00h Number of transitions to established: 1 Last state was OpenConfirm Neighbor Capabilities: Multiprotocol IPv4 Unicast: advertised and received and negotiated Four Octet ASN: advertised and received Route Refresh: advertised and received and negotiated Inbound route map is POLICY-IN Outbound route map is POLICY-OUT Local AS is 13335, local router ID 162.158.232.1
  12. 16 Challenges Vendors-specific APIs Platform API description Juniper (any platform)

    XML over NETCONF 1.0 Cisco IOS-XR XML over SSH/Telnet (proprietary solution) Latest versions: XML over NETCONF 1.1 and JSON over gRPC Cisco IOS N/A Cisco NX-OS JSON over HTTP Arista JSON over HTTP, later REST and gRPC 0 inter-compatibility, sometimes 0 consistency
  13. 17 Challenges Inconsistency: same vendor, same platform { "TABLE_interface": {

    "ROW_interface": { "interface": "Ethernet1/1", "state": "up", "admin_state": "up", "share_state": "Dedicated", "eth_hw_desc": "100/1000/10000 Ethernet", "eth_duplex": "full", "eth_speed": "1000 Mb/s", "eth_link_flapped": "2d13h", "eth_clear_counters": "never", } } } { "TABLE_interface": { "ROW_interface": { "interface": "Ethernet1/1", "state": "up", "share_state": "Dedicated", "eth_hw_desc": "1000/10000 Ethernet", "eth_duplex": "full", "eth_speed": "10 Gb/s", "eth_mtu": "1500", "eth_link_flapped": "5week(s) 4day(s)", "eth_bw": [ "10000000", "10000000" ], "eth_clear_counters": "never" Cisco NX-OS 7.0(3)I4(6) * Cisco NX-OS 7.3(1)N1(1) * * = notice the human understandable version number. Pretty intuitive, isn’t it? Pointless duplicate
  14. 18 Challenges Sometimes the API simply does not work (1)

    sw03.bjm01# sh lldp neighbors | json output conversion failed due to conv error, bytes 0xC4 0x54 0x44 0xBD encoder error This is not valid JSON
  15. 19 Challenges Sometimes the API simply does not work (2)

    XML> <?xml version="1.0" encoding="UTF-8"?> <Request MajorVersion="1" MinorVersion="0"> <Get> . ~~~ snip ~~~ . </Get> </Request> ERROR: 0xa367a600 'XML Service Library' detected the 'fatal' condition 'The throttle on the memory usage has been reached. Please optimize the request to query smaller data.' Supposed to return a XML document, not an error
  16. 20 Challenges Install custom software Traditionally not possible Recently: whitebox

    devices (e.g. Arista, Cumulus) The base operating system is usually very old, e.g. Arista’s eOS is based on Fedora 14...
  17. 21 Multi-vendor networks challenges • Inconsistent and incompatible vendor-specific representation

    of configuration and operational data • Inconsistent and incomplete APIs • Proprietary details, specific to a vendor only (i.e., different naming and functionality for similar industry standards)
  18. Vendor-agnostic API: NAPALM 22 NAPALM (Network Automation and Programmability Abstraction

    Layer with Multivendor support) https://github.com/napalm-automation
  19. 23 Hacking with NAPALM Cross-vendor API (1) from napalm import

    get_network_driver driver = get_network_driver('junos') instance = driver('edge01.bjm01', 'username', 'password') instance.open() instance.get_bgp_neighbors() from napalm import get_network_driver driver = get_network_driver('ios') instance = driver('edge01.flw01', 'username', 'password') instance.open() instance.get_bgp_neighbors()
  20. 24 { u'global': { u'peers': { u'192.168.0.2': { u'address_family': {

    u'ipv4': { u'accepted_prefixes': 142, u'received_prefixes': 142, u'sent_prefixes': 0 } }, 'description': u'Amazon [WW HOSTING ANYCAST]', 'is_enabled': True, 'is_up': True, 'local_as': 13335, 'remote_as': 16509, 'remote_id': u'10.10.10.1', 'uptime': 8816095 } } } } { u'global': { u'peers': { u'172.17.17.2': { u'address_family': { u'ipv4': { u'accepted_prefixes': 5090, u'received_prefixes': 5090, u'sent_prefixes': 0 } }, 'description': u'Amazon [WW HOSTING ANYCAST]', 'is_enabled': True, 'is_up': True, 'local_as': 13335, 'remote_as': 16509, 'remote_id': u'10.10.10.1', 'uptime': 456323 } } } } NAPALM
  21. 25 NetDevOps challenges • Provision new devices • React to

    certain events and (re-)configure • Human error factor • Replace equipment • Monitor
  22. • Very scalable • Concurrency • Event-driven automation • Easily

    configurable & customizable • Native caching and drivers for useful tools • One of the friendliest communities • Great documentation 27 Hacking networks with Salt Why Salt
  23. Hacking networks with Salt Why Salt “ In SaltStack, speed

    isn’t a byproduct, it is a design goal. SaltStack was created as an extremely fast, lightweight communication bus to provide the foundation for a remote execution engine. SaltStack now provides orchestration, configuration management, event reactors, cloud provisioning, and more, all built around the SaltStack high-speed communication bus. ” 28 https://docs.saltstack.com/en/getstarted/speed.html … + cross-vendor network automation from 2016.11 (Carbon)
  24. Hacking networks with Salt Salt Architecture 31 Problem: you can’t

    install Minions on traditional network devices!
  25. Hacking networks with Salt Salt Architecture 32 NAPALM Solution: Proxy

    Minions They behave like minions, but can manage network devices, remotely.
  26. 33

  27. Hacking networks with Salt Vendor-agnostic automation (1) 36 $ sudo

    salt iosxr-router net.arp iosxr-router: ---------- out: |_ ---------- age: 1620.0 interface: Bundle-Ether4 ip: 10.0.0.2 mac: 00:25:90:20:46:B5 |_ ---------- age: 8570.0 $ sudo salt junos-router net.arp junos-router: ---------- out: |_ ---------- age: 129.0 interface: ae2.100 ip: 10.0.0.1 mac: 84:B5:9C:CD:09:73 |_ ---------- age: 1101.0
  28. Hacking networks with Salt Vendor-agnostic automation (2) 37 $ sudo

    salt junos-router state.sls ntp junos-router: ---------- ID: oc_ntp_netconfig Function: netconfig.managed Result: True Comment: Configuration changed! Started: 10:53:25.624396 Duration: 3494.153 ms Changes: ---------- diff: [edit system ntp] - peer 172.17.17.2; [edit system ntp] + server 10.10.10.1 prefer; + server 10.10.10.2; - server 172.17.17.1 version 2 prefer; $ sudo salt iosxr-router state.sls ntp iosxr-router: ---------- ID: oc_ntp_netconfig Function: netconfig.managed Result: True Comment: Configuration changed! Started: 11:02:39.162423 Duration: 3478.683 ms Changes: ---------- diff: --- +++ @@ -1,4 +1,10 @@ +ntp + server 10.10.10.1 prefer + server 10.10.10.2 !
  29. Hacking networks with Salt Salt for event-driven automation 38 Salt

    is a data driven automation framework. Each action (job) performed (manually from the CLI or automatically by the system) is uniquely identified and has an identification tag: $ sudo salt-run state.event pretty=True salt/job/20170110130619367337/new { "_stamp": "2017-01-10T13:06:19.367929", "arg": [], "fun": "net.arp", "jid": "20170110130619367337", "minions": [ "junos-router" ], "tgt": "junos-router", "tgt_type": "glob", "user": "mircea" } Tag $ sudo salt junos-router net.arp # output omitted
  30. Salt event bus 39 napalm/syslog/junos/NTP_SERVER_UNREACHABLE/edge01.bjm01 { "error": "NTP_SERVER_UNREACHABLE", "facility": 12,

    "host": "edge01.bjm01", "ip": "10.10.0.1", "os": "junos", "timestamp": 1499986394, "yang_message": { "system": { "ntp": { "servers": { "server": { "172.17.17.1": { "state": { "association-type": "SERVER", "stratum": 16 } } } } } } }, "yang_model": "openconfig-system" }
  31. 40 reactor: - 'napalm/syslog/*/NTP_SERVER_UNREACHABLE/*': - salt://reactor/exec_ntp_state.sls /etc/salt/reactor/exec_ntp_state.sls triggered NTP state:

    cmd.state.sls: - tgt: {{ data.host }} - arg: - ntp /etc/salt/master Matches the event tag napalm/syslog/junos/NTP_SERVER_UNREACHABLE/edge01.bjm01 $ sudo salt edge01.bjm01 state.sls ntp CLI Equivalent: Hacking networks with Salt Fully automated configuration changes
  32. Hacking networks with Salt Vendor-agnostic automation: how to 41 •

    Salt in 10 minutes • Salt fudamentals • Configuration management • Network Automation official Salt docs • Step-by-step tutorial -- up and running in 60 minutes • Using Salt at Scale
  33. References Authentication system Beacons Engines Event System Grains Jinja load_template

    documentation Master config file, example Master configuration options Mine NAPALM NAPALM BGP execution module functions NAPALM Grains NAPALM Installation NAPALM network execution module functions NAPALM NTP execution module functions NAPALM Proxy NAPALM route execution module functions Nested outputter NETAPI Modules Netconfig state 45 Node Groups NTP state Orchestration Pillar Pillar modules Proxy Minion Reactor REST CherryPy Returners Runners Salt 2016.11 (Carbon) release notes Salt Get Started Salt Installation Salt Walkthrough SaltStack Package Repo SNMP state States Targeting minions The Top file Users state YAML