Slide 1

Slide 1 text

rtnetlink @takufukushima

Slide 2

Slide 2 text

@takufukushima • I just worked on rtnetlink library for Java/Scala and its application • I’m not a kernel expert :-p

Slide 3

Slide 3 text

Agenda 1. What is Netlink and rtnetlink 2. MidoNet and Netlink 3. Enter rtnetlink 4. Wrap-up

Slide 4

Slide 4 text

1. What is Netlink and rtnetlink?

Slide 5

Slide 5 text

Netlink as an IPC • Netlink is an intra-kernel messaging system • Netlink is an IPC between the Linux kernel and the userspace that has: • Socket interface (AF_NETLINK family with various protocols) • Broadcast messages (notifications) from the kernel triggered by other processes

Slide 6

Slide 6 text

History of Netlink • It was introduced in Linux 2.2, 1999, by Alexey Kuznetsov in INR RAS as a successor of ioctl for the networking interfaces • In 1995, Linux 1.3 had /dev/netlink (Skiplink; obsolete) by Alan Cox • Generic Netlink was supported in 2.6.15, 2006

Slide 7

Slide 7 text

Netlink use cases • iproute2, a.k.a. ip command • by Alexey Kuznetsov and Stephen Hemminger • Open vSwitch (OVS) • e.g., the communication between the datapath in the kernel and ova- vswitchd in the userspace

Slide 8

Slide 8 text

Netlink protocols OVS datapath • Link • Address • Route • Neighbor • Rule • QDisc • Traffic Class • Traffic Filter • … iproute2 (a.k.a. ip) Netlink multiplexer include/uapi/linux/netlink.h rtnetlink

Slide 9

Slide 9 text

Netlink documentations

Slide 10

Slide 10 text

• man 7 netlink • Netlink Protocol Library Suite (libnl) • http://www.carisma.slowglass.com/~tgr/libnl/ • RFC 3549 • https://tools.ietf.org/html/rfc3549 • Linux kernel and iproute2 source code Netlink documentations

Slide 11

Slide 11 text

Linux source code • Use cscope, global or your preferred tag system • net/netlink/*.[ch] • include/linux/{genetlink, netlink, rtnetlink}.h • include/net/{genetlink, netlink, rtnetlink}.h • include/uapi/linux/*.h

Slide 12

Slide 12 text

iproute2 source code • Use cscope, global or your preferred tag system • ip/*.[ch] • ip/include/linux/*.h • e.g., ip/include/linux/ip_link*.h for links • lib/libnetlink.c

Slide 13

Slide 13 text

Debugging Netlink • nltrace • https://github.com/socketpair/nltrace • nlmon & (tcpdump | netsniff-ng)

Slide 14

Slide 14 text

Debugging w/ nltrace

Slide 15

Slide 15 text

Debugging w/ nlmon

Slide 16

Slide 16 text

Notes on Netlink • Netlink data should be transferred in the native endian • Little endian on the little endian system • Big endian on the big endian system • You need to subscribe some groups to get the notifications from the kernel

Slide 17

Slide 17 text

2. MidoNet and Netlink

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

MidoNet 101 Open vSwitch datapath in_port=1, action= ... in_port=2, action= ... . . . in_port=29, action= ... FlowTable MidoNet Agent 3. Packet execute Flow Matched Packets Unmatched Packets 1. Upcall 2. Set Flow Table Entry Userspace Kernel Host

Slide 20

Slide 20 text

NSDB NSDB NSDB Private Network Host Midol man Cache Datapath VM VM VM Flow Table Nova compute MidoNet API Nova API Horizon MidoNet CLI Neutron API MidoNet Plugin Clients / Users Host Midol man Cache Datapath VM VM VM Flow Table Nova compute BGP Gateway Midol man Datapath Flow Table BGP Gateway Midol man Datapath Flow Table GRE/VXLAN Tunneling Internet

Slide 21

Slide 21 text

Midolman (MidoNet agent) NSDB NSDB NSDB Open vSwitch Datapath IF IF Interfaces on the host IF VM VM VM Midolman (MidoNet agent) Network Flow Table Watch/modify Add/remove flows Host Cache + local state Store virtual topology information Nova compute

Slide 22

Slide 22 text

Open vSwitch Datapath IF IF e host IF VM VM VM Midolman (MidoNet agent) Flow Table Watch/modify Add/remove flows Host Cache + local state Nova compute

Slide 23

Slide 23 text

Open vSwitch Datapath IF IF e host IF VM VM VM Midolman (MidoNet agent) Flow Table Watch/modify Add/remove flows Host Cache + local state Nova compute Open vSwitch Datapath IF IF e host IF VM VM VM Midolman (MidoNet agent) Flow Table Watch/modify Add/remove flows Host Cache + local state Nova compute Netlink

Slide 24

Slide 24 text

MidoNet speaks Netlink • MidoNet agent drives OVS datapath kernel module • MidoNet agent communicates with the kernel through Netlink • e.g., upcalls and flow installations/ invalidations

Slide 25

Slide 25 text

Upcall Lifecycle 1. Input stage • Get upcalls with packets from the datapath 2. Packet processing stage 1. Deduplicate and queue packets 2. Simulate packets on the virtual topology 3. Deal with the wildcard flows 4. Determine the egress physical port 3. Output stage • Emit packets and install flows based on the sims Netlink

Slide 26

Slide 26 text

Module diagrams

Slide 27

Slide 27 text

F PacketsEntryPoint NetlinkCallback Dispatcher DeduplicationActor PacketWorkflow UpcallDatapath ConnectionManager One-to-Man One-to-On HTB Supended Packets Waiting Room (NetlinkInputChannel) NetlinkChannel Fast Path State Management Open vSwitch Datapath Flow Table Upcall Packet 1. Input stage Select Loop

Slide 28

Slide 28 text

Datapath Controller Flow Controller PacketsEntryPoint NetlinkCallback Dispatcher DeduplicationActor PacketWorkflow UpcallDatapath ConnectionManager One-to-Many One-to-One HTB Supended Packets Waiting Room (NetlinkInputChannel) NetlinkChannel (NetlinkOutputChannel) DatapathChannel ment path Open WildcardFlow Pa Wildcard Flows Flow Managem 2. Packet processing stage PacketContext PacketContext PacketContext PacketContext PacketContext PacketContext Routing by hashing with FlowKey

Slide 29

Slide 29 text

Datapath Controller Flow Controller PacketsEntryPoint VirtualTo PhysicalMapper DeduplicationActor PacketWorkflow Disruptor Ring Buffer Supended Packets Waiting Room el) (NetlinkOutputChannel) DatapathChannel Flow Table DatapathReady WildcardFlow validation y Tag Virtual Topology State data / Messages Packet Flow Wildcard Flows DatapathReady Datapath port operations Flow Management 2. Packet processing stage Local datapath management • Create local datapath ports • Track UUID to port # mapping • Manage overlay tunnels PacketContext

Slide 30

Slide 30 text

Datapath Controller Flow Controller PacketsEntryPoint VirtualTo PhysicalMapper DeduplicationActor PacketWorkflow Disruptor Ring Buffer Supended Packets Waiting Room (NetlinkOutputChannel) DatapathChannel Flow Table DatapathReady WildcardFlow n Virtual Topology State data / Messages Packet Flow Wildcard Flows DatapathReady Datapath port operations Flow Management 2. Packet processing stage Flow Flow Flow Flow Query statistics Invalidate flows PacketContext Flow management • Cache flows • Invalidate flows when the virtual topology changed • Add flow to the datapath

Slide 31

Slide 31 text

Datapath Controller Flow Controller PacketsEntryPoint NetlinkCallback Dispatcher DeduplicationActor PacketWorkflow One-to-Many One-to-One B Disruptor Ring Buffer Supended Packets Waiting Room (NetlinkOutputChannel) DatapathChannel Open vSwitch Datapath Flow Table WildcardFlow Virtual Topology State data / Messages Packet Flow Wildcard Flows Datapath port operations Flow Management 3. Output stage Select Loop

Slide 32

Slide 32 text

Netlink/rtnetlink in MidoNet • odp library to communicate with OVS datapath • Hard-coded ip command • InterfaceScanner scans interface information on the host • e.g., interface type, MTU, …

Slide 33

Slide 33 text

3. Enter rtnetlink

Slide 34

Slide 34 text

3. Enter rtnetlink Kill InterfaceScanner Retrived from “The Terminator”. © Hemdale Film Corporation and Orion Pictures

Slide 35

Slide 35 text

InterfaceScanner • Scans interface information on the host periodically • Exposes the subscription interface • e.g., Notify the current status of all interfaces to other components

Slide 36

Slide 36 text

New InterfaceScanner • Gets the notifications from the kernel through rtnetlink for the updates • Exposes the subscription interface • e.g., Notify the current status of all interfaces to other components

Slide 37

Slide 37 text

Notifications? Events? Async ops? Sounds familiar…

Slide 38

Slide 38 text

Do you know what ip a does? 1. Retrieve link information 2. Retrieve address information 3. Combine them into the single representation format 4. Display the result

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Link

Slide 41

Slide 41 text

Addr

Slide 42

Slide 42 text

Do you know what ip a does? 1. Retrieve link information 2. Retrieve address information 3. Combine them into the single representation format 4. Display the result Blocking operation from the! perspective of Midolman

Slide 43

Slide 43 text

Now everything is asynchronous. How can we coordinate them?

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

InterfaceScanner and RtnetlinkConnection

Slide 48

Slide 48 text

4. Wrap-up

Slide 49

Slide 49 text

Good bye, old InterfaceScanner Retrieved from “Terminator 2: Judgment Day”. © Carolco Pictures, Lightstorm Entertainment, Le Studio Canal+, and TriStar Pictures

Slide 50

Slide 50 text

rtnelink library • Retrieve/create rtnetlink resources • Observer or Subject consumes retrieved data • Coordinate async operations with RxJava • map ByteBuffer to rtnetlink resource • filter some resources • zip few different resources

Slide 51

Slide 51 text

New InterfaceScanner • Get the notifications from the kernel • Update the single representation format • Let Observers consume the data

Slide 52

Slide 52 text

Acknowledgements • The following people helped me a lot • Takayuki Usui, Guillermo Ontañón, Duarte Nunes and Ivan Kelly • Special thanks to Antoni Segura Puimedon and Hugo Benichi

Slide 53

Slide 53 text

(Non-academic) References • The Netlink protocol: Mysteries Uncovered , Jan Engelhardt, 2010 • http://inai.de/documents/Netlink_Protocol.pdf • Understanding And Programming With Netlink Sockets, Neil Horman, 2004 • http://people.redhat.com/nhorman/papers/netlink.pdf • Netlink - Wikipedia, the free encyclopaedia • http://en.wikipedia.org/wiki/Netlink

Slide 54

Slide 54 text

This is the end of slides. Any questions?