超高速なパケットI/Oフレームワーク netmap の紹介

超高速なパケットI/Oフレームワーク netmap の紹介

netmap: a novel framework for fast packet I/O

A658ec7f1badf73819dfa501165016c1?s=128

Yuuki Tsubouchi (yuuk1)

August 01, 2013
Tweet

Transcript

  1. netmap: a novel framework for fast packet I/O Luigi Rizzo,

    Universita` di Pisa, Italy In Proceedings of the 2012 USENIX Annual Technical Conference, June 2012. (Best Paper award at Usenix ATC'12) id:y_uuki ྠߨࢿྉ
  2. ༻ޠ NIC ≒ device ≒ network adapter ≒ hardware OS

    ≒ host stack ≒ network stack
  3. Introduction (1) ωοτϫʔΫϞχλʔ, τϥϑΟοΫδΣωϨʔλͳͲͷΞ ϓϦέʔγϣϯʹٻΊΒΕΔߴϨʔτͳRawύέοτI/OΛ൚ ༻OSͰ͸αϙʔτ͍ͯ͠ͳ͍ APIͱͯ͠Raw Socket, Berkeley Packet

    Filter, AF SOCKET familyͳͲ͕࢖ΘΕΔ ύϑΥʔϚϯε͕े෼Ͱ͸ͳ͍ طଘͷੑೳ޲্ख๏Ͱ͸ಛघͳϋʔυ΢ΣΞͷػೳ(NICͳ Ͳ)ʹґଘ͕ͪ͠
  4. Introduction (2) netmap͸طଘͷΞΠσΞͷ࿈ܞͱ֦ு ੑೳ͕େ෯ʹ޲্͠ɼͳ͓͔ͭಛघͳϋʔυ΢ΣΞʹґ ଘ͠ͳ͍ ࠷খͷมߋ͚ͩͰLinux΍FreeBSDʹ׬શʹ౷߹Ͱ͖Δ wire͔Βuserspace application·Ͱ70CPUΫϩο Ϋ(ैདྷͷAPIΑΓ1ܻҎ্଎͍) 900MHzͷCPUͰ14.88Mpps(packet

    per second) on 10Gbps link
  5. Introduction (3) netmap͸҆શ͔ͭ؆୯ʹ࢖͑Δ netmap client͸γεςϜΛյͤͳ͍ σόΠεϨδελ΍ΧʔωϧϝϞϦΛclientʹݟͤͳ͍ zero-copyసૹʹదͨ͠ۃ୺ʹ୯७ͳσʔλϞσϧΛ࢖ ༻͢Δ multiqueue adapterͷαϙʔτ

    Πϕϯτͷ௨஌ʹඪ४ͷγεςϜίʔϧ(select(2), poll(2))Λ࢖༻͢Δ طଘͷΞϓϦέʔγϣϯ͔ΒͷҠߦ͕؆୯
  6. Background ҎԼͷΑ͏ͳΞϓϦέʔγϣϯ͸൚༻ͷϋʔυ΢ΣΞͱ OSΛ࢖͏͜ͱʹৗʹؔ৺͕͋Δ ιϑτ΢ΣΞεΠονɼϧʔλɼϑΝΠΞ΢Υʔϧɼτϥώο ΫϞχλɼ৵ೖݕ஌γεςϜ΍τϥώοΫδΣωϨʔλ

  7. NIC data structures and operation ϦϯάΩϡʔΛհͯ͠ύέοτΛૹड৴͢Δ Ϧϯάͷ֤εϩοτʹ͸όοϑΝͷαΠζͱ෺ཧΞυϨ ε͕ೖΔ ෳ਺ͷϦϯάΩϡʔΛαϙʔτ͢ΔߴੑೳNIC΋͋Δ ෛՙΛϚϧνίΞ෼ࢄ

    Ծ૝؀ڥͰͷϦιʔεڞ༗
  8. Kernel and User API OS͸NICͷσʔλߏ଄ͷίϐʔΛ΋ͭ όοϑΝ͸OSઐ༻·ͨ͸σόΠεʹґଘ͠ͳ͍ίϯςφ (mbufs, sk_buffs)ʹϦϯΫ͍ͯ͠Δ ֤ύέοτʹؔ͢ΔେྔͷϝλσʔλΛؚΉ Driver/OS

    σόΠευϥΠόͱOS͸ύέοτΛϑϥάϝϯτʹ෼ׂ͢Δ ϑϥάϝϯτԽͷͨΊͷΦʔόϔου͕େ͖͍ RawύέοτI/O RawύέοτΛಡΈॻ͖͢Δඪ४API͸ΧʔωϧɾϢʔβۭؒ ͷؒͷσʔλίϐʔʹ࠷௿1ճͷϝϞϦίϐʔΛཁ͢Δ 1ύέοτ͋ͨΓ1ճͷγεςϜίʔϧΛཁ͢Δ
  9. Related Work ύέοτॲཧ଎౓޲্ͷͨΊͷطଘख๏ Socket APIs BPF, AF_PACKETͳͲ ͍ΘΏΔRawιέοτɻύέοτΛෳ੡ͯ͠userlandʹΈͤΔ Packet Filter

    hooks Netgraph(FreeBSD), Netfilter(FreeBSD) ύέοτͷෳ੡͕ඞཁͳ͍ (in kernel) application͸ύέοτॲཧʹڬ·ΕΔ (firewall)ͳͲ Direct buffer access Kernel mode Click (applicationΛkernelͰ࣮ߦ͢Δ) PF_RING, PACKET_MMAP(userlandʹpacket bufferΛΈͤΔ) NIC DMA engine, NetChannels, PacketShader I/O Engine Hardware solutions (FPGA)
  10. Netmap ϢʔβۭؒͷΞϓϦέʔγϣϯʹߴ଎ͳύέοτΞΫη εΛఏڙ͢Δ ܰྔͳϝλσʔλ: ίϯύΫτɼ࢖͍΍͍͢ 1ճͷγεςϜίʔϧͰଟ͘ͷύέοτΛॲཧ ઢܗͷݻఆ௕ύέοτόοϑΝ σόΠεΛopenͨ͠ͱ͖ʹࣄલ֬อ͞ΕΔ ύέοτ͝ͱʹϝϞϦΛ֬อɾഁغ͢ΔίετΛ࡟Δ ΞϓϦέʔγϣϯʹύέοτόοϑΝ΁ͷอޢΞΫηε

    ΛڐՄ͢Δ͜ͱʹΑΓσʔλίϐʔίετ࡟ݮ ศརͳϋʔυ΢ΣΞػೳͷαϙʔτ(ϚϧνΩϡʔͳͲ)
  11. netmap netmap modeͰ͸NIC rings͸host stack͔Β઀ଓ Λ੾அ͞Εͯnetmap APIΛհͯ͠ύέοτΛަ׵͢Δ ௥Ճ͞Εͨ2ͭͷnetmap ringsʹΑΓApplication ͸host

    stackͱ࿩ͤΔ netmap rings͸ڞ༗ϝϞϦ্ɹɹɹɹɹɹɹɹɹɹɹ ʹ࣮૷͞ΕΔ OS͸੾அʹ͸ؾ͔ͮͣʹɼɹɹɹɹɹɹɹɹɹɹɹɹɹ ௨ৗͲ͓ΓΠϯλϑΣʔεɹɹɹɹɹɹɹɹɹɹɹɹɹ Λ࢖༻ɾ؅ཧ͢Δ select(2)/poll(2)͸ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹಉ ظʹ࢖༻͞ΕΔ
  12. Data Structures(1) netmapͷΩʔίϯϙʔωϯτ per-packetΦʔόϔουͷ࡟ݮ ΠϯλϑΣʔεؒͷߴ଎ͳϑΥϫʔσΟϯά NICͱOSελοΫؒͷޮ཰తͳσʔλͷ΍ΓͱΓ ϚϧνΩϡʔϦϯάͱϚϧνίΞͷαϙʔτ ֤ΠϯλϑΣʔεΛ3छྨͷΦϒδΣΫτʹؔ࿈෇͚Δ packet buffers,

    netmap rings, netmap_if શͯͷΦϒδΣΫτ͸Χʔωϧ಺ಠཱͨ͠ྖҬʹ഑ஔ શͯͷϢʔβϓϩηεʹڞ༗͞ΕΔ ୯ҰྖҬͷ࢖༻͸zero-copyసૹʹศར
  13. Data Structures(2) packet buffers (pkt_buf) ݻఆ௕ͰNICͱϢʔβϓϩηεʹΑΓڞ༗͞ΕΔ netmap modeʹҠߦ͢Δͱશͯͷnetmap ringsʹରԠ͢Δ buffer͕ࣄલʹ֬อ͞ΕΔ

    (࠶֬อ͸͞Εͳ͍) netmap ring: NIC ringɹɹɹɹɹɹɹɹɹɹɹɹɹ ͷσόΠεඇґଘͳෳ੡ ring-size:εϩοτͷݸ਺ cur: ring্ͷݱࡏͷɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ read/writeҐஔ avail:ར༻Մೳͳbufferͷݸ਺ buf_ofs, slots netmap_if: read-onlyͳ৘ใ
  14. The netmap API /dev/netmapΛopenͯ͠ɼioctl(fd,NIOCREQ,arg) Λ࣮ߦ͢Δ͜ͱʹΑΓɼnetmap modeʹͳΔ mmap(2)ʹΑΓϓϩηεͷΞυϨεۭ͔ؒΒڞ༗ϝϞϦ ΁ΞΫηεՄೳʹ͢Δ ioctl(2)͕ύέοτͷૹड৴Λαϙʔτ ioctl(fd,

    NIOCTXSYNC) OSʹ৽͍͠ύέοτͷૹ৴Λ௨஌ ioctl(fd, NIOCRXSYNC) ಡΈࠐΈՄೳͳύέοτͷݸ਺ΛOSʹฉ͘ non blockingͳͨΊσʔλίϐʔ͕ͳ͘(netmapͱ hardware ringͷಉظҎ֎)ͷෳ਺ύέοτΛಉ࣌ʹѻ͑Δ per-packetΦʔόϔουΛ࡟ݮͰ͖Δ
  15. Talkinkg to the host stack netmap client͸2ͭͷnetmap ringʹΑΓOSελο Ϋͱ΍ΓͱΓ͢Δ ૹ৴ύέοτ͸·ΔͰ෺ཧΠϯλϑΣʔε͔Βདྷ͔ͨͷΑ͏ʹ

    OSελοΫʹ౉͞ΕΔ OSελοΫ͔Βདྷͨύέοτ͸netmap ringʹܨ͕ΕΔ netmap client͸ɼOSελοΫʹ઀ଓ͞Εͨnetmap ringͱNICʹ઀ଓ͞Εͨnetmap ringͱͷؒͰύέοτ ͕ਖ਼͘͠΍ΓͱΓ͞Ε͔ͨΛ֬ೝ͢Δ
  16. Zero-copy packet forwarding ringؒͷzero-copy͸ड৴εϩοτͱૹ৴εϩοτͷ bufferͷΠϯσοΫεΛswap͢Δ͚ͩ શͯͷbuffer͕ಉ͡ྖҬʹ͋ΔͨΊ swapʹΑΓoutput ringʹύέοτ͕ੵ·ΕΔ ಉ࣌ʹinput ringΛۭͷbufferͰຒΊΔ

    ϝϞϦΛ࠶֬อ͠ͳ͍ͨΊ
  17. Implementation FreeBSDͷ࣮૷ΛؚΉݱߦόʔδϣϯ͸໿2000ߦఔ౓ γεςϜίʔϧ(ioctl,select/poll)ͱυϥΠόͷมߋআ͘ ֤σόΠευϥΠόͷύον͸ͦΕͧΕ500ߦఔ౓ σόΠευϥΠόͷมߋΛ࠷খʹ͢ΔͨΊʹػೳͷ΄ͱΜͲ͕֤ υϥΠόͰڞ௨ͨ͠ίʔυͰ࣮૷͞Ε͍ͯΔ ֤υϥΠόͷมߋ఺͸2ͭͷػೳ netmap mode༗ޮ࣌ͷ࠶ॳظԽ υϥΠόͷϩοΫॲཧΛڞ௨ίʔυʹҠߦ

    γεςϜίʔϧͷ࣮ߦதɼnetmapͰ͸device driver͕ ΄ͱΜͲͷ࢓ࣄΛ͢Δ ΩϟογϡϩʔΧϦςΟͷ޲্ɼϦιʔε؅ཧͷ୯७Խ ׂΓࠐΈͰͳ͍ίϯςΩετͰଟ͗͢ΔॲཧΛ࣮ߦ͢Δ৺഑͕ ͳ͍
  18. Performance metrics ύέοτॲཧ͸ෳ਺ͷαϒγεςϜͱؔ࿈͢Δ CPUύΠϓϥΠϯɼΩϟογϡɼϝϞϦɼI/Oόε ର৅ͷΞϓϦέʔγϣϯ͸CPUό΢ϯυ -> CPUίετ Λܭଌ ΞϓϦέʔγϣϯ͔ΒNIC·ͰͷύέοτҠಈ͕ର৅ Per-byte

    costs NICͷbuffer͔ΒͷσʔλҠಈʹফඅ͢ΔCPUαΠΫϧ Per-packet costs ֤ύέοτʹରͯ͠NIC ringͷεϩοτΛߋ৽͕ඞཁ memory allocation, system calls, ... very simple test programs a packet generator, a packet receiver
  19. Test equipment netmap͸ඇৗʹޮ཰తͰ10GbpsͷଳҬΛ࢖͍੾Δͨ ΊɼCPUΫϩοΫΛམͱ͢ ૹ৴ଳҬ͸packet generatorͰܭଌ͢Δ Ωϡʔ,ίΞ/εϨουͷ਺ͱύέοτ௕͸࣮ߦ࣌ʹܾఆ ύέοτ͸͋Β͔͡Ί४උ͞ΕΔͨΊɼper-byteίετ ͸΄΅θϩ CPU

    i7-870 4 core 2.93GHz memory 1.33GHz NIC dual port 10Gbps netmap FreeBSD HEAD/amd64 Apr 2012
  20. Transmit speed VS clock rate ύέοτ௕64όΠτͰɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ΫϩοΫϨʔτͱίΞ਺Λɹɹɹɹɹɹɹɹɹɹɹɹɹ มԽͤͨ͞ͱ͖ͷૹ৴ੑೳ 1ίΞͷͱ͖900MHz·Ͱ͸ɹɹɹɹɹɹɹɹɹɹɹɹɹ ΫϩοΫʹରͯ͠εϧʔϓɹɹɹɹɹɹɹɹɹɹɹɹɹɹ

    οτ͸εέʔϧ͢Δ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹ (60-65 cycles/packet) ͜ͷςετͰ͸per-packetॲཧ͸ҎԼͷ2͔ͭ͠ͳ͍ netmap ringεϩοτͷத਎ͷݕূ ରԠ͢ΔNIC ringεϩοτͷߋ৽
  21. Speed VS packet size ૹ৴ड৴ͱ΋ʹύέοτ௕ͷมԽʹର͢Δεϧʔϓοτ ड৴ଆ͸150όΠτҎԼͰsurprisingͳۂઢ 64όΠτͷഒ਺ͷͱ͖ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ͚ͩ࠷େϨʔτ ΩϟογϡϥΠϯ͕ؔ࿈ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ͍ͯ͠Δʁ

    (͜ͷ͋ͨΓͷઆ໌͸ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ཧղͰ͖͍ͯͳ͍)
  22. Transmit speed VS batch size ύέοτΛ·ͱΊͯѻ͏͜ͱʹΑΔੑೳ޲্Λ֬ೝ͢Δ γεςϜίʔϧͷίετ΍NICͷϨδελ΁ͷΞΫηεͳͲ͕ ࡟ݮ͞Ε͍ͯΔ͸ͣ batch size=ύέοτͷݸ਺

    batch size=1ͷͱ͖ɹɹɹɹɹɹɹɹɹɹɹɹɹɹ 2.45 Mpps(408ns/pkt) batch size=8ͷͱ͖ɹɹɹɹɹɹɹɹɹɹɹɹɹ 14.88 Mpps FreeBSDͷඪ४తͳpoll(2)͸1ճ 250ns 1 callͰෳ਺ͷύέοτΛѻ͏͜ͱ͸ඞਢ
  23. packet forwarding performance wire - applicationؒͷforwardingੑೳ લεϥΠυ·Ͱ͸applicationΦʔόϔουؚ·Ε͍ͯͳ͍ netmap APIҎ֎ͷAPI ͞·͟ͳapplication

  24. Conclusions and feature work netmap͸ϢʔβۭؒͷΞϓϦέʔγϣϯʹରͯ͠Rawύ έοτΛૹड৴͢Δߴ଎ͳνϟϯωϧΛఏڙ͢Δ ಛघͳϋʔυ΢ΣΞʹ͸ґଘ͠ͳ͍ ҆શ͔ͭ࢖͍΍͍͢ ࣮ݧͷ݁ՌɼlowϨϕϧͳύέοτI/OΛ࢖ͬͨ෯޿͍Ξ ϓϦέʔγϣϯʹରͯ͠େ෯ͳੑೳ޲্ΛఏڙͰ͖Δ͜ͱ

    ͕Θ͔ͬͨ Future work netmapͷػೳΛOSͷωοτϫʔΫελοΫʹ࠾༻ Ծ૝؀ڥͰͷޮ཰తͳωοτϫʔΫॲཧͷαϙʔτ
  25. http://info.iet.unipi.it/~luigi/netmap/ netmap project page