Upgrade to Pro — share decks privately, control downloads, hide ads and more …

rtnetlink

 rtnetlink

This slide deck explains what is Netlink, especially its protocol rtnetlink, how to get along with it and how I used it in MidoNet.

Taku Fukushima

March 26, 2015
Tweet

More Decks by Taku Fukushima

Other Decks in Technology

Transcript

  1. @takufukushima • I just worked on rtnetlink library for Java/Scala

    and its application • I’m not a kernel expert :-p
  2. Agenda 1. What is Netlink and rtnetlink 2. MidoNet and

    Netlink 3. Enter rtnetlink 4. Wrap-up
  3. Netlink as an IPC • Netlink is an intra-kernel messaging

    system • Netlink is an IPC between the Linux kernel and the userspace that has: • Socket interface (AF_NETLINK family with various protocols) • Broadcast messages (notifications) from the kernel triggered by other processes
  4. History of Netlink • It was introduced in Linux 2.2,

    1999, by Alexey Kuznetsov in INR RAS as a successor of ioctl for the networking interfaces • In 1995, Linux 1.3 had /dev/netlink (Skiplink; obsolete) by Alan Cox • Generic Netlink was supported in 2.6.15, 2006
  5. Netlink use cases • iproute2, a.k.a. ip command • by

    Alexey Kuznetsov and Stephen Hemminger • Open vSwitch (OVS) • e.g., the communication between the datapath in the kernel and ova- vswitchd in the userspace
  6. Netlink protocols OVS datapath • Link • Address • Route

    • Neighbor • Rule • QDisc • Traffic Class • Traffic Filter • … iproute2 (a.k.a. ip) Netlink multiplexer include/uapi/linux/netlink.h rtnetlink
  7. • man 7 netlink • Netlink Protocol Library Suite (libnl)

    • http://www.carisma.slowglass.com/~tgr/libnl/ • RFC 3549 • https://tools.ietf.org/html/rfc3549 • Linux kernel and iproute2 source code Netlink documentations
  8. Linux source code • Use cscope, global or your preferred

    tag system • net/netlink/*.[ch] • include/linux/{genetlink, netlink, rtnetlink}.h • include/net/{genetlink, netlink, rtnetlink}.h • include/uapi/linux/*.h
  9. iproute2 source code • Use cscope, global or your preferred

    tag system • ip/*.[ch] • ip/include/linux/*.h • e.g., ip/include/linux/ip_link*.h for links • lib/libnetlink.c
  10. Notes on Netlink • Netlink data should be transferred in

    the native endian • Little endian on the little endian system • Big endian on the big endian system • You need to subscribe some groups to get the notifications from the kernel
  11. MidoNet 101 Open vSwitch datapath in_port=1, action= ... in_port=2, action=

    ... . . . in_port=29, action= ... FlowTable MidoNet Agent 3. Packet execute Flow Matched Packets Unmatched Packets 1. Upcall 2. Set Flow Table Entry Userspace Kernel Host
  12. NSDB NSDB NSDB Private Network Host Midol man Cache Datapath

    VM VM VM Flow Table Nova compute MidoNet API Nova API Horizon MidoNet CLI Neutron API MidoNet Plugin Clients / Users Host Midol man Cache Datapath VM VM VM Flow Table Nova compute BGP Gateway Midol man Datapath Flow Table BGP Gateway Midol man Datapath Flow Table GRE/VXLAN Tunneling Internet
  13. Midolman (MidoNet agent) NSDB NSDB NSDB Open vSwitch Datapath IF

    IF Interfaces on the host IF VM VM VM Midolman (MidoNet agent) Network Flow Table Watch/modify Add/remove flows Host Cache + local state Store virtual topology information Nova compute
  14. Open vSwitch Datapath IF IF e host IF VM VM

    VM Midolman (MidoNet agent) Flow Table Watch/modify Add/remove flows Host Cache + local state Nova compute
  15. Open vSwitch Datapath IF IF e host IF VM VM

    VM Midolman (MidoNet agent) Flow Table Watch/modify Add/remove flows Host Cache + local state Nova compute Open vSwitch Datapath IF IF e host IF VM VM VM Midolman (MidoNet agent) Flow Table Watch/modify Add/remove flows Host Cache + local state Nova compute Netlink
  16. MidoNet speaks Netlink • MidoNet agent drives OVS datapath kernel

    module • MidoNet agent communicates with the kernel through Netlink • e.g., upcalls and flow installations/ invalidations
  17. Upcall Lifecycle 1. Input stage • Get upcalls with packets

    from the datapath 2. Packet processing stage 1. Deduplicate and queue packets 2. Simulate packets on the virtual topology 3. Deal with the wildcard flows 4. Determine the egress physical port 3. Output stage • Emit packets and install flows based on the sims Netlink
  18. F PacketsEntryPoint NetlinkCallback Dispatcher DeduplicationActor PacketWorkflow UpcallDatapath ConnectionManager One-to-Man One-to-On

    HTB Supended Packets Waiting Room (NetlinkInputChannel) NetlinkChannel Fast Path State Management Open vSwitch Datapath Flow Table Upcall Packet 1. Input stage Select Loop
  19. Datapath Controller Flow Controller PacketsEntryPoint NetlinkCallback Dispatcher DeduplicationActor PacketWorkflow UpcallDatapath

    ConnectionManager One-to-Many One-to-One HTB Supended Packets Waiting Room (NetlinkInputChannel) NetlinkChannel (NetlinkOutputChannel) DatapathChannel ment path Open WildcardFlow Pa Wildcard Flows Flow Managem 2. Packet processing stage PacketContext PacketContext PacketContext PacketContext PacketContext PacketContext Routing by hashing with FlowKey
  20. Datapath Controller Flow Controller PacketsEntryPoint VirtualTo PhysicalMapper DeduplicationActor PacketWorkflow Disruptor

    Ring Buffer Supended Packets Waiting Room el) (NetlinkOutputChannel) DatapathChannel Flow Table DatapathReady WildcardFlow validation y Tag Virtual Topology State data / Messages Packet Flow Wildcard Flows DatapathReady Datapath port operations Flow Management 2. Packet processing stage Local datapath management • Create local datapath ports • Track UUID to port # mapping • Manage overlay tunnels PacketContext
  21. Datapath Controller Flow Controller PacketsEntryPoint VirtualTo PhysicalMapper DeduplicationActor PacketWorkflow Disruptor

    Ring Buffer Supended Packets Waiting Room (NetlinkOutputChannel) DatapathChannel Flow Table DatapathReady WildcardFlow n Virtual Topology State data / Messages Packet Flow Wildcard Flows DatapathReady Datapath port operations Flow Management 2. Packet processing stage Flow Flow Flow Flow Query statistics Invalidate flows PacketContext Flow management • Cache flows • Invalidate flows when the virtual topology changed • Add flow to the datapath
  22. Datapath Controller Flow Controller PacketsEntryPoint NetlinkCallback Dispatcher DeduplicationActor PacketWorkflow One-to-Many

    One-to-One B Disruptor Ring Buffer Supended Packets Waiting Room (NetlinkOutputChannel) DatapathChannel Open vSwitch Datapath Flow Table WildcardFlow Virtual Topology State data / Messages Packet Flow Wildcard Flows Datapath port operations Flow Management 3. Output stage Select Loop
  23. Netlink/rtnetlink in MidoNet • odp library to communicate with OVS

    datapath • Hard-coded ip command • InterfaceScanner scans interface information on the host • e.g., interface type, MTU, …
  24. InterfaceScanner • Scans interface information on the host periodically •

    Exposes the subscription interface • e.g., Notify the current status of all interfaces to other components
  25. New InterfaceScanner • Gets the notifications from the kernel through

    rtnetlink for the updates • Exposes the subscription interface • e.g., Notify the current status of all interfaces to other components
  26. Do you know what ip a does? 1. Retrieve link

    information 2. Retrieve address information 3. Combine them into the single representation format 4. Display the result
  27. Do you know what ip a does? 1. Retrieve link

    information 2. Retrieve address information 3. Combine them into the single representation format 4. Display the result Blocking operation from the! perspective of Midolman
  28. Good bye, old InterfaceScanner Retrieved from “Terminator 2: Judgment Day”.

    © Carolco Pictures, Lightstorm Entertainment, Le Studio Canal+, and TriStar Pictures
  29. rtnelink library • Retrieve/create rtnetlink resources • Observer or Subject

    consumes retrieved data • Coordinate async operations with RxJava • map ByteBuffer to rtnetlink resource • filter some resources • zip few different resources
  30. New InterfaceScanner • Get the notifications from the kernel •

    Update the single representation format • Let Observers consume the data
  31. Acknowledgements • The following people helped me a lot •

    Takayuki Usui, Guillermo Ontañón, Duarte Nunes and Ivan Kelly • Special thanks to Antoni Segura Puimedon and Hugo Benichi
  32. (Non-academic) References • The Netlink protocol: Mysteries Uncovered , Jan

    Engelhardt, 2010 • http://inai.de/documents/Netlink_Protocol.pdf • Understanding And Programming With Netlink Sockets, Neil Horman, 2004 • http://people.redhat.com/nhorman/papers/netlink.pdf • Netlink - Wikipedia, the free encyclopaedia • http://en.wikipedia.org/wiki/Netlink