Slide 1

Slide 1 text

netmap: a novel framework for fast packet I/O Luigi Rizzo, Universita` di Pisa, Italy In Proceedings of the 2012 USENIX Annual Technical Conference, June 2012. (Best Paper award at Usenix ATC'12) id:y_uuki ྠߨࢿྉ

Slide 2

Slide 2 text

༻ޠ NIC ≒ device ≒ network adapter ≒ hardware OS ≒ host stack ≒ network stack

Slide 3

Slide 3 text

Introduction (1) ωοτϫʔΫϞχλʔ, τϥϑΟοΫδΣωϨʔλͳͲͷΞ ϓϦέʔγϣϯʹٻΊΒΕΔߴϨʔτͳRawύέοτI/OΛ൚ ༻OSͰ͸αϙʔτ͍ͯ͠ͳ͍ APIͱͯ͠Raw Socket, Berkeley Packet Filter, AF SOCKET familyͳͲ͕࢖ΘΕΔ ύϑΥʔϚϯε͕े෼Ͱ͸ͳ͍ طଘͷੑೳ޲্ख๏Ͱ͸ಛघͳϋʔυ΢ΣΞͷػೳ(NICͳ Ͳ)ʹґଘ͕ͪ͠

Slide 4

Slide 4 text

Introduction (2) netmap͸طଘͷΞΠσΞͷ࿈ܞͱ֦ு ੑೳ͕େ෯ʹ޲্͠ɼͳ͓͔ͭಛघͳϋʔυ΢ΣΞʹґ ଘ͠ͳ͍ ࠷খͷมߋ͚ͩͰLinux΍FreeBSDʹ׬શʹ౷߹Ͱ͖Δ wire͔Βuserspace application·Ͱ70CPUΫϩο Ϋ(ैདྷͷAPIΑΓ1ܻҎ্଎͍) 900MHzͷCPUͰ14.88Mpps(packet per second) on 10Gbps link

Slide 5

Slide 5 text

Introduction (3) netmap͸҆શ͔ͭ؆୯ʹ࢖͑Δ netmap client͸γεςϜΛյͤͳ͍ σόΠεϨδελ΍ΧʔωϧϝϞϦΛclientʹݟͤͳ͍ zero-copyసૹʹదͨ͠ۃ୺ʹ୯७ͳσʔλϞσϧΛ࢖ ༻͢Δ multiqueue adapterͷαϙʔτ Πϕϯτͷ௨஌ʹඪ४ͷγεςϜίʔϧ(select(2), poll(2))Λ࢖༻͢Δ طଘͷΞϓϦέʔγϣϯ͔ΒͷҠߦ͕؆୯

Slide 6

Slide 6 text

Background ҎԼͷΑ͏ͳΞϓϦέʔγϣϯ͸൚༻ͷϋʔυ΢ΣΞͱ OSΛ࢖͏͜ͱʹৗʹؔ৺͕͋Δ ιϑτ΢ΣΞεΠονɼϧʔλɼϑΝΠΞ΢Υʔϧɼτϥώο ΫϞχλɼ৵ೖݕ஌γεςϜ΍τϥώοΫδΣωϨʔλ

Slide 7

Slide 7 text

NIC data structures and operation ϦϯάΩϡʔΛհͯ͠ύέοτΛૹड৴͢Δ Ϧϯάͷ֤εϩοτʹ͸όοϑΝͷαΠζͱ෺ཧΞυϨ ε͕ೖΔ ෳ਺ͷϦϯάΩϡʔΛαϙʔτ͢ΔߴੑೳNIC΋͋Δ ෛՙΛϚϧνίΞ෼ࢄ Ծ૝؀ڥͰͷϦιʔεڞ༗

Slide 8

Slide 8 text

Kernel and User API OS͸NICͷσʔλߏ଄ͷίϐʔΛ΋ͭ όοϑΝ͸OSઐ༻·ͨ͸σόΠεʹґଘ͠ͳ͍ίϯςφ (mbufs, sk_buffs)ʹϦϯΫ͍ͯ͠Δ ֤ύέοτʹؔ͢ΔେྔͷϝλσʔλΛؚΉ Driver/OS σόΠευϥΠόͱOS͸ύέοτΛϑϥάϝϯτʹ෼ׂ͢Δ ϑϥάϝϯτԽͷͨΊͷΦʔόϔου͕େ͖͍ RawύέοτI/O RawύέοτΛಡΈॻ͖͢Δඪ४API͸ΧʔωϧɾϢʔβۭؒ ͷؒͷσʔλίϐʔʹ࠷௿1ճͷϝϞϦίϐʔΛཁ͢Δ 1ύέοτ͋ͨΓ1ճͷγεςϜίʔϧΛཁ͢Δ

Slide 9

Slide 9 text

Related Work ύέοτॲཧ଎౓޲্ͷͨΊͷطଘख๏ Socket APIs BPF, AF_PACKETͳͲ ͍ΘΏΔRawιέοτɻύέοτΛෳ੡ͯ͠userlandʹΈͤΔ Packet Filter hooks Netgraph(FreeBSD), Netfilter(FreeBSD) ύέοτͷෳ੡͕ඞཁͳ͍ (in kernel) application͸ύέοτॲཧʹڬ·ΕΔ (firewall)ͳͲ Direct buffer access Kernel mode Click (applicationΛkernelͰ࣮ߦ͢Δ) PF_RING, PACKET_MMAP(userlandʹpacket bufferΛΈͤΔ) NIC DMA engine, NetChannels, PacketShader I/O Engine Hardware solutions (FPGA)

Slide 10

Slide 10 text

Netmap ϢʔβۭؒͷΞϓϦέʔγϣϯʹߴ଎ͳύέοτΞΫη εΛఏڙ͢Δ ܰྔͳϝλσʔλ: ίϯύΫτɼ࢖͍΍͍͢ 1ճͷγεςϜίʔϧͰଟ͘ͷύέοτΛॲཧ ઢܗͷݻఆ௕ύέοτόοϑΝ σόΠεΛopenͨ͠ͱ͖ʹࣄલ֬อ͞ΕΔ ύέοτ͝ͱʹϝϞϦΛ֬อɾഁغ͢ΔίετΛ࡟Δ ΞϓϦέʔγϣϯʹύέοτόοϑΝ΁ͷอޢΞΫηε ΛڐՄ͢Δ͜ͱʹΑΓσʔλίϐʔίετ࡟ݮ ศརͳϋʔυ΢ΣΞػೳͷαϙʔτ(ϚϧνΩϡʔͳͲ)

Slide 11

Slide 11 text

netmap netmap modeͰ͸NIC rings͸host stack͔Β઀ଓ Λ੾அ͞Εͯnetmap APIΛհͯ͠ύέοτΛަ׵͢Δ ௥Ճ͞Εͨ2ͭͷnetmap ringsʹΑΓApplication ͸host stackͱ࿩ͤΔ netmap rings͸ڞ༗ϝϞϦ্ɹɹɹɹɹɹɹɹɹɹɹ ʹ࣮૷͞ΕΔ OS͸੾அʹ͸ؾ͔ͮͣʹɼɹɹɹɹɹɹɹɹɹɹɹɹɹ ௨ৗͲ͓ΓΠϯλϑΣʔεɹɹɹɹɹɹɹɹɹɹɹɹɹ Λ࢖༻ɾ؅ཧ͢Δ select(2)/poll(2)͸ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹಉ ظʹ࢖༻͞ΕΔ

Slide 12

Slide 12 text

Data Structures(1) netmapͷΩʔίϯϙʔωϯτ per-packetΦʔόϔουͷ࡟ݮ ΠϯλϑΣʔεؒͷߴ଎ͳϑΥϫʔσΟϯά NICͱOSελοΫؒͷޮ཰తͳσʔλͷ΍ΓͱΓ ϚϧνΩϡʔϦϯάͱϚϧνίΞͷαϙʔτ ֤ΠϯλϑΣʔεΛ3छྨͷΦϒδΣΫτʹؔ࿈෇͚Δ packet buffers, netmap rings, netmap_if શͯͷΦϒδΣΫτ͸Χʔωϧ಺ಠཱͨ͠ྖҬʹ഑ஔ શͯͷϢʔβϓϩηεʹڞ༗͞ΕΔ ୯ҰྖҬͷ࢖༻͸zero-copyసૹʹศར

Slide 13

Slide 13 text

Data Structures(2) packet buffers (pkt_buf) ݻఆ௕ͰNICͱϢʔβϓϩηεʹΑΓڞ༗͞ΕΔ netmap modeʹҠߦ͢Δͱશͯͷnetmap ringsʹରԠ͢Δ buffer͕ࣄલʹ֬อ͞ΕΔ (࠶֬อ͸͞Εͳ͍) netmap ring: NIC ringɹɹɹɹɹɹɹɹɹɹɹɹɹ ͷσόΠεඇґଘͳෳ੡ ring-size:εϩοτͷݸ਺ cur: ring্ͷݱࡏͷɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ read/writeҐஔ avail:ར༻Մೳͳbufferͷݸ਺ buf_ofs, slots netmap_if: read-onlyͳ৘ใ

Slide 14

Slide 14 text

The netmap API /dev/netmapΛopenͯ͠ɼioctl(fd,NIOCREQ,arg) Λ࣮ߦ͢Δ͜ͱʹΑΓɼnetmap modeʹͳΔ mmap(2)ʹΑΓϓϩηεͷΞυϨεۭ͔ؒΒڞ༗ϝϞϦ ΁ΞΫηεՄೳʹ͢Δ ioctl(2)͕ύέοτͷૹड৴Λαϙʔτ ioctl(fd, NIOCTXSYNC) OSʹ৽͍͠ύέοτͷૹ৴Λ௨஌ ioctl(fd, NIOCRXSYNC) ಡΈࠐΈՄೳͳύέοτͷݸ਺ΛOSʹฉ͘ non blockingͳͨΊσʔλίϐʔ͕ͳ͘(netmapͱ hardware ringͷಉظҎ֎)ͷෳ਺ύέοτΛಉ࣌ʹѻ͑Δ per-packetΦʔόϔουΛ࡟ݮͰ͖Δ

Slide 15

Slide 15 text

Talkinkg to the host stack netmap client͸2ͭͷnetmap ringʹΑΓOSελο Ϋͱ΍ΓͱΓ͢Δ ૹ৴ύέοτ͸·ΔͰ෺ཧΠϯλϑΣʔε͔Βདྷ͔ͨͷΑ͏ʹ OSελοΫʹ౉͞ΕΔ OSελοΫ͔Βདྷͨύέοτ͸netmap ringʹܨ͕ΕΔ netmap client͸ɼOSελοΫʹ઀ଓ͞Εͨnetmap ringͱNICʹ઀ଓ͞Εͨnetmap ringͱͷؒͰύέοτ ͕ਖ਼͘͠΍ΓͱΓ͞Ε͔ͨΛ֬ೝ͢Δ

Slide 16

Slide 16 text

Zero-copy packet forwarding ringؒͷzero-copy͸ड৴εϩοτͱૹ৴εϩοτͷ bufferͷΠϯσοΫεΛswap͢Δ͚ͩ શͯͷbuffer͕ಉ͡ྖҬʹ͋ΔͨΊ swapʹΑΓoutput ringʹύέοτ͕ੵ·ΕΔ ಉ࣌ʹinput ringΛۭͷbufferͰຒΊΔ ϝϞϦΛ࠶֬อ͠ͳ͍ͨΊ

Slide 17

Slide 17 text

Implementation FreeBSDͷ࣮૷ΛؚΉݱߦόʔδϣϯ͸໿2000ߦఔ౓ γεςϜίʔϧ(ioctl,select/poll)ͱυϥΠόͷมߋআ͘ ֤σόΠευϥΠόͷύον͸ͦΕͧΕ500ߦఔ౓ σόΠευϥΠόͷมߋΛ࠷খʹ͢ΔͨΊʹػೳͷ΄ͱΜͲ͕֤ υϥΠόͰڞ௨ͨ͠ίʔυͰ࣮૷͞Ε͍ͯΔ ֤υϥΠόͷมߋ఺͸2ͭͷػೳ netmap mode༗ޮ࣌ͷ࠶ॳظԽ υϥΠόͷϩοΫॲཧΛڞ௨ίʔυʹҠߦ γεςϜίʔϧͷ࣮ߦதɼnetmapͰ͸device driver͕ ΄ͱΜͲͷ࢓ࣄΛ͢Δ ΩϟογϡϩʔΧϦςΟͷ޲্ɼϦιʔε؅ཧͷ୯७Խ ׂΓࠐΈͰͳ͍ίϯςΩετͰଟ͗͢ΔॲཧΛ࣮ߦ͢Δ৺഑͕ ͳ͍

Slide 18

Slide 18 text

Performance metrics ύέοτॲཧ͸ෳ਺ͷαϒγεςϜͱؔ࿈͢Δ CPUύΠϓϥΠϯɼΩϟογϡɼϝϞϦɼI/Oόε ର৅ͷΞϓϦέʔγϣϯ͸CPUό΢ϯυ -> CPUίετ Λܭଌ ΞϓϦέʔγϣϯ͔ΒNIC·ͰͷύέοτҠಈ͕ର৅ Per-byte costs NICͷbuffer͔ΒͷσʔλҠಈʹফඅ͢ΔCPUαΠΫϧ Per-packet costs ֤ύέοτʹରͯ͠NIC ringͷεϩοτΛߋ৽͕ඞཁ memory allocation, system calls, ... very simple test programs a packet generator, a packet receiver

Slide 19

Slide 19 text

Test equipment netmap͸ඇৗʹޮ཰తͰ10GbpsͷଳҬΛ࢖͍੾Δͨ ΊɼCPUΫϩοΫΛམͱ͢ ૹ৴ଳҬ͸packet generatorͰܭଌ͢Δ Ωϡʔ,ίΞ/εϨουͷ਺ͱύέοτ௕͸࣮ߦ࣌ʹܾఆ ύέοτ͸͋Β͔͡Ί४උ͞ΕΔͨΊɼper-byteίετ ͸΄΅θϩ CPU i7-870 4 core 2.93GHz memory 1.33GHz NIC dual port 10Gbps netmap FreeBSD HEAD/amd64 Apr 2012

Slide 20

Slide 20 text

Transmit speed VS clock rate ύέοτ௕64όΠτͰɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ΫϩοΫϨʔτͱίΞ਺Λɹɹɹɹɹɹɹɹɹɹɹɹɹ มԽͤͨ͞ͱ͖ͷૹ৴ੑೳ 1ίΞͷͱ͖900MHz·Ͱ͸ɹɹɹɹɹɹɹɹɹɹɹɹɹ ΫϩοΫʹରͯ͠εϧʔϓɹɹɹɹɹɹɹɹɹɹɹɹɹɹ οτ͸εέʔϧ͢Δ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹ (60-65 cycles/packet) ͜ͷςετͰ͸per-packetॲཧ͸ҎԼͷ2͔ͭ͠ͳ͍ netmap ringεϩοτͷத਎ͷݕূ ରԠ͢ΔNIC ringεϩοτͷߋ৽

Slide 21

Slide 21 text

Speed VS packet size ૹ৴ड৴ͱ΋ʹύέοτ௕ͷมԽʹର͢Δεϧʔϓοτ ड৴ଆ͸150όΠτҎԼͰsurprisingͳۂઢ 64όΠτͷഒ਺ͷͱ͖ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ͚ͩ࠷େϨʔτ ΩϟογϡϥΠϯ͕ؔ࿈ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ͍ͯ͠Δʁ (͜ͷ͋ͨΓͷઆ໌͸ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ཧղͰ͖͍ͯͳ͍)

Slide 22

Slide 22 text

Transmit speed VS batch size ύέοτΛ·ͱΊͯѻ͏͜ͱʹΑΔੑೳ޲্Λ֬ೝ͢Δ γεςϜίʔϧͷίετ΍NICͷϨδελ΁ͷΞΫηεͳͲ͕ ࡟ݮ͞Ε͍ͯΔ͸ͣ batch size=ύέοτͷݸ਺ batch size=1ͷͱ͖ɹɹɹɹɹɹɹɹɹɹɹɹɹɹ 2.45 Mpps(408ns/pkt) batch size=8ͷͱ͖ɹɹɹɹɹɹɹɹɹɹɹɹɹ 14.88 Mpps FreeBSDͷඪ४తͳpoll(2)͸1ճ 250ns 1 callͰෳ਺ͷύέοτΛѻ͏͜ͱ͸ඞਢ

Slide 23

Slide 23 text

packet forwarding performance wire - applicationؒͷforwardingੑೳ લεϥΠυ·Ͱ͸applicationΦʔόϔουؚ·Ε͍ͯͳ͍ netmap APIҎ֎ͷAPI ͞·͟ͳapplication

Slide 24

Slide 24 text

Conclusions and feature work netmap͸ϢʔβۭؒͷΞϓϦέʔγϣϯʹରͯ͠Rawύ έοτΛૹड৴͢Δߴ଎ͳνϟϯωϧΛఏڙ͢Δ ಛघͳϋʔυ΢ΣΞʹ͸ґଘ͠ͳ͍ ҆શ͔ͭ࢖͍΍͍͢ ࣮ݧͷ݁ՌɼlowϨϕϧͳύέοτI/OΛ࢖ͬͨ෯޿͍Ξ ϓϦέʔγϣϯʹରͯ͠େ෯ͳੑೳ޲্ΛఏڙͰ͖Δ͜ͱ ͕Θ͔ͬͨ Future work netmapͷػೳΛOSͷωοτϫʔΫελοΫʹ࠾༻ Ծ૝؀ڥͰͷޮ཰తͳωοτϫʔΫॲཧͷαϙʔτ

Slide 25

Slide 25 text

http://info.iet.unipi.it/~luigi/netmap/ netmap project page