10GbE時代のネットワークI/O高速化

 10GbE時代のネットワークI/O高速化

C5c954985fb774ff7731dbb0cfc27f89?s=128

Takuya ASADA

June 07, 2013
Tweet

Transcript

  1. 10GbE࣌୅ͷωοτ ϫʔΫI/Oߴ଎Խ Takuya ASADA<syuu@freebsd.org>

  2. Slide URL •http://slidesha.re/16OV9Yx

  3. ͸͡Ίʹ • 10GbEɺ40GbEͳͲͷۃΊͯߴ଎ͳ௨৴Λ αϙʔτ͢ΔNIC͕ɺPCαʔόͷྖҬͰ ΋࢖ΘΕΔΑ͏ʹͳ͖͍ͬͯͯΔ • ͜ͷΑ͏ͳ଎౓ͷ௨৴Λιϑτ΢ΣΞ ʢOSʣͰॲཧ͠ߴ͍ੑೳΛಘΔʹ͸༷ʑ ͳো֐͕͋Γɺϋʔυ΢ΣΞɾιϑτ΢Σ Ξ྆໘ͷ࣮૷Λݟ௚͢ඞཁ͕͋Δ

  4. ࠓ೔ͷτϐοΫ 1. ׂΓࠐΈ͕ଟ͗͢Δ 2. ϓϩτίϧॲཧ͕ॏ͍ 3. ෳ਺ͷCPUͰύέοτॲཧ͍ͨ͠ 4. σʔλҠಈʹ൐͏ϨΠςϯγͷ࡟ݮ 5.

    ϓϩτίϧελοΫΛܦ༝͠ͳ͍ωοτϫʔ ΫIO
  5. 1. ׂΓࠐΈ͕ଟ͗͢Δ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler

    ύέοτड৴ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS input queue socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ
  6. ׂΓࠐΈ͕ଟ͗͢Δ • NICͷੑೳ޲্ʹΑͬͯɺҰఆ࣌ؒʹ NIC͕ॲཧͰ͖Δύέοτ਺͕ඈ༂తʹ ૿Ճ • ̍ύέοτຖʹׂΓࠐΈ͕དྷΔͱɺ௨ ৴ྔ͕ଟ͍ͱ͖ʹίϯςΩετεΠο νճ਺͕૿͑͗͢ੑೳ͕ྼԽ

  7. چདྷͷύέοτड৴ॲཧ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ύέοτड৴

    ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS input queue socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ϋʔυ΢ΣΞׂΓࠐΈ
 ˣ
 ड৴ΩϡʔʹΩϡʔ Πϯά
 ˣ
 ιϑτ΢ΣΞׂΓࠐ Έεέδϡʔϧ
  8. چདྷͷύέοτड৴ॲཧ • ̍ύέοτड৴͢ΔͨͼʹׂΓࠐΈΛ ड͚ͯॲཧΛߦ͍ͬͯΔ • 64byte frameͷ࠷େड৴Մೳ਺ɿ • GbEɿ໿1.5Mppsʢ150ສʣ •

    10GbEɿ໿15Mppsʢ1500ສʣ
  9. ׂΓࠐΈΛແޮʹ͢Δʁ • ϙʔϦϯάํࣜ • NICͷׂΓࠐΈΛېࢭ͠ɺ୅ΘΓʹΫϩοΫׂΓࠐΈΛ༻ ͍ͯఆظతʹड৴ΩϡʔΛνΣοΫ • σϝϦοτɿϨΠςϯγ্͕͕ΔɾఆظతʹCPUΛى͜͢ ඞཁ͕͋Δ •

    ϋΠϒϦουํࣜ • ௨৴ྔ͕ଟ͘࿈ଓͯ͠ύέοτॲཧΛߦ͍ͬͯΔ࣌ͷΈׂ ΓࠐΈΛແޮԽͯ͠ϙʔϦϯάͰಈ࡞

  10. NAPIʢϋΠϒϦουํ ࣜʣ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler

    ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ ϋʔυ΢ΣΞׂΓࠐΈ
 ˣ
 ׂΓࠐΈແޮԽˍ
 ϙʔϦϯά։࢝ ↓ ύέοτ͕ແ͘ͳͬ ͨΒׂΓࠐΈ༗ޮԽ
  11. Interrupt Coalescing • NIC͕OSෛՙΛߟׂྀͯ͠ΓࠐΈΛؒ Ҿ͘ • ύέοτ਺ݸʹҰճׂΓࠐΉɺ
 ͍҃͸Ұఆظؒ଴͔ͬͯΒׂΓࠐΉ • σϝϦοτɿϨΠςϯγ্͕͕Δ

  12. Interrupt CoalescingͷޮՌ • Intel 82599(ixgbe)ͰInterrupt Coalescingແޮɺ
 ༗ޮʢׂΓࠐΈස౓ࣗಈௐ੔ʣͰൺֱ • MultiQueue, GRO,

    LRO౳͸ແޮԽ • iperfͷTCPϞʔυͰܭଌ interrupts throughput packets CPU%(sy+si) ແޮ 46687 int/s 7.82 Gbps 660386 pkt/s 97.6% ༗ޮ 7994 int/s 8.24 Gbps 711132 pkt/s 79.6%
  13. Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ

    ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ 2.ϓϩτίϧॲཧ͕ॏ͍
  14. ϓϩτίϧॲཧ͕ॏ͍ • ಛʹখ͞ͳύέοτ͕େྔʹಧ͘৔߹ʹ ϓϩτίϧॲཧͰCPU࣌ؒΛେྔʹ࢖ͬͯ ͠·͏ • ύέοτ਺෼ϓϩτίϧελοΫ͕ݺͼग़ ͞ΕΔ
 ྫɿ64byte frameͷ৔߹


    ˠཧ࿦্ͷ࠷େ஋͸1500ສճ/s
  15. TOE (TCP Offload Engine) • OSͰϓϩτίϧॲཧ͢ΔͷΛ΍ΊͯɺNICͰॲཧ͢Δ • σϝϦοτ • ηΩϡϦςΟɿTOEʹηΩϡϦςΟϗʔϧ͕ੜͯ͡΋ɺOS

    ଆ͔Βରॲ͕ग़དྷͳ͍ • ෳࡶੑɿOSͷωοτϫʔΫελοΫΛTOEͰஔ͖׵͑Δʹ ͸͔ͳΓ޿ൣғͷมߋ͕ඞཁ
 ϝʔΧʹΑͬͯTOEͷ࣮૷͕ҟͳΓڞ௨ΠϯλϑΣʔεఆ ͕ٛࠔ೉ • Linuxɿαϙʔτ༧ఆແ͠
  16. Checksum Offloading
 • IPɾTCPɾUDP checksumͷܭࢉΛNICͰ ߦ͏

  17. Checksum Offloading ͷޮՌ • Intel 82599(ixgbe)Ͱൺֱ • iperfͷTCPϞʔυͰܭଌ • MultiQueue͸ແޮԽ

    • ethtool -K ix0 rx off throughput CPU%(sy+si) ແޮ 8.27 Gbps 86 ༗ޮ 8.27 Gbps 85.2
  18. LRO (Large Receive Offload) • NIC͕ड৴ͨ͠TCPύέοτΛ݁߹͠ɺ
 େ͖ͳύέοτʹ͔ͯ͠ΒOS΁౉͢ • ϓϩτίϧελοΫͷݺͼग़͠ճ਺Λ ࡟ݮ

    • LinuxͰ͸ιϑτ΢ΣΞʹΑΔLRO͕࣮ ૷͞Ε͍ͯΔʢGROʣ
  19. LRO͕ແ͍৔߹ • ύέοτຖʹωοτϫʔΫελοΫΛ ࣮ߦ seq 10000 seq 10001 seq 10002

    seq 10003 ←1500bytes→ To network stack
  20. LRO͕༗Δ৔߹ • ύέοτΛ݁߹͔ͯ͠ΒωοτϫʔΫελοΫΛ ࣮ߦɺωοτϫʔΫελοΫͷ࣮ߦճ਺Λ࡟ݮ seq 10000 seq 10001 seq 10002

    seq 10003 ←1500bytes→ To network stack big one packet
  21. GROͷޮՌ • Intel 82599(ixgbe)Ͱൺֱ • MultiQueue͸ແޮԽ • iperfͷTCPϞʔυͰܭଌ • ethtool

    -K ix0 gro off packets network stack called count throughput CPU%(sy+si) ແޮ 632139 pkt/s 632139 call/s 7.30 Gbps 97.6% ༗ޮ 712387 pkt/s 47957 call/s 8.25 Gbps 79.6%
  22. TSO (TCP Segmentation Offload) • LROͷٯ • ύέοτΛϑϥάϝϯτԽͤͣʹૹ৴
 NIC͕ύέοτΛMTUαΠζʹ෼ׂ •

    OS͸ύέοτ෼ׂॲཧΛলུग़དྷΔ • LinuxͰ͸ιϑτ΢ΣΞʹΑΔGSOɺ
 ϋʔυ΢ΣΞʹΑΔTSOʗUFOΛαϙʔτ
  23. TSOͷޮՌ • Intel 82599(ixgbe)Ͱൺֱ • MultiQueue͸ແޮԽ • iperfͷTCPϞʔυͰܭଌ • ethtool

    -K ix0 gso off tso off packets throughput CPU%(sy+si) ແޮ 247794 pkt/s 2.87 Gbps 53.5% ༗ޮ 713127 pkt/s 8.16 Gbps 26.8%
  24. 3.ෳ਺ͷCPUͰύέοτॲཧ͍ͨ͠ cpu0 Process(User) Process(Kernel) HW Intr Handler SW Intr Handler

    ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ cpu1 Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠
  25. ιϑτׂΓࠐΈ͕ ̍ͭͷίΞʹภΔ

  26. ιϑτׂΓࠐΈͱ͸ʁ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ

    ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ ϙʔϦϯά͔Β
 ϓϩτίϧॲཧ·Ͱ →ωοτϫʔΫIOͷ େ൒෦෼
  27. ԿނภΔʁ ιϑτׂΓࠐΈ͸NICͷׂΓࠐΈ͕͔͔ͬͨCPU΁
 εέδϡʔϧ͞ΕΔ
 ˣ ϙʔϦϯά͔ΒϓϩτίϧελοΫͷ࣮ߦ·Ͱ
 ιϑτׂΓࠐΈ಺Ͱ࣮ߦ͞ΕΔ ↓ NICͷׂΓࠐΈ͕͔͔͍ͬͯΔCPU͚ͩʹ
 ෛՙ͕͔͔Δ

  28. ιϑτׂΓࠐΈ͕̍ͭͷ ίΞʹภͬͯੑೳ͕ग़ͳ͍ • memcachedͳͲγϣʔτύέοτΛେ ྔʹࡹ͘ϫʔΫϩʔυͰݦࡏԽ • ιϑτ΢ΣΞׂΓࠐΈΛ࣮ߦ͍ͯ͠Δ CPU͕ϘτϧωοΫʹͳΓɺੑೳ͕ε έʔϧ͠ͳ͘ͳΔ

  29. ղܾํ๏ • ύέοτΛෳ਺ͷCPU΁෼ࢄ͔ͤͯ͞Βϓ ϩτίϧॲཧ͢Δ࢓૊Έ͕͋Ε͹ྑ͍ • ୠ͠ɺTCPʹ͸ॱংอূ͕༗ΔͷͰฒྻʹ ॲཧ͞ΕΔͱύέοτͷฒ΂௚͠ʢϦΦʔ μʣ͕ൃੜͯ͠ύϑΥʔϚϯε͕མͪΔ


  30. TCP Reordering • γʔέϯεφϯόʔ௨ΓͷॱংͰύέο τ͕ண৴͍ͯ͠Ε͹ॱʹόοϑΝ΁ίϐʔ ͍͚ͯͩ͘͠ͰΑ͍͕… ̍   

      ̍      protocol processing user buffer
  31. TCP Reordering ̍      ̍ 

        protocol processing user buffer SFPSEFS RVFVF    • ॱং͕ཚΕ͍ͯΔͱύέοτͷฒ΂௚ ͠ʢϦΦʔμʣ࡞ۀ͕ඞཁʹͳΔ
  32. ղܾํ๏ʢଓʣ • ̍ͭͷϑϩʔ͸̍ͭͷCPUͰॲཧ͞Ε Δํ͕౎߹͕ྑ͍

  33. RSS ʢReceive Side Scalingʣ • CPU͝ͱʹผʑͷड৴ΩϡʔΛ࣋ͭNIC
 ʢMultiQueue NICͱݺ͹ΕΔʣ • ड৴Ωϡʔ͝ͱʹಠׂཱͨ͠ΓࠐΈΛ࣋ͭ

    • ಉ͡ϑϩʔʹଐ͢Δύέοτ͸ಉ͡Ωϡʔ΁ɺ
 ҟͳΔϑϩʔʹଐ͢Δύέοτ͸ͳΔ΂͘ผͷ Ωϡʔ΁෼ࢄ
 ˠύέοτϔομͷϋογϡ஋Λܭࢉ͢ΔࣄʹΑ ΓѼઌΩϡʔΛܾఆ
  34. MSI-XׂΓࠐΈ • PCI ExpressͰαϙʔτ • σόΠε͋ͨΓ2048ݸͷIRQΛ࣋ͯΔ • ͦΕͧΕͷIRQͷׂΓࠐΈઌCPUΛબ΂ Δ
 ˠ1ͭͷNIC͕CPUίΞ਺෼ͷIRQΛ࣋

    ͯΔ
  35. RSSʹΑΔ ύέοτৼΓ෼͚ NIC ύέοτ ύέοτ ύέοτ ϋογϡܭࢉ ύέοτண৴ hash queue

    σΟεύον ࢀর RX Queue #0 RX Queue #1 RX Queue #2 RX Queue #3 cpu0 cpu1 cpu2 cpu3 ड৴ॲཧ ׂΓࠐΈ ड৴ॲཧ ▪ ▪ 0 1
  36. Ωϡʔબ୒ͷखॱ indirection_table[64] = initial_value input[12] = 
 {src_addr, dst_addr, src_port,

    dst_port} key = toeplitz_hash(input, 12) index = key & 0x3f queue = indirection_table[index]
  37. RSSಋೖલ

  38. RSSಋೖޙ

  39. RPS • RSSඇରԠͷΦϯϘʔυNICΛ͏·͔ͭͬͯ͘αʔ όͷੑೳΛ޲্͍ͤͨ͞ • ιϑτͰRSSΛ࣮૷ͯ͠͠·͓͏ • ιϑτׂΓࠐΈͷஈ֊ͰύέοτΛ֤CPU΁͹Β ·͘ •

    CPUׂؒΓࠐΈΛ࢖ͬͯଞͷCPUΛՔಈͤ͞Δ • RSSͷιϑτ΢ΤΞʹΑΔΤϛϡϨʔγϣϯ
  40. cpu3 cpu2 cpu1 cpu0 ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ

    VTFS CV⒎FS socket queue ύέοτ γεςϜ ίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈ ύέοτड৴ ϋογϡܭࢉ σΟεύον ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue backlog #1 hash queue ࢀর ▪ ▪ 0 1 $16ؒ ׂΓࠐΈ backlog #2 backlog #3
  41. RPSͷ࢖͍ํ # echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus # echo 4096 >

    /sys/class/net/eth0/queues/rx-0/ rps_flow_cnt
  42. RPSಋೖલ

  43. RPSಋೖޙ

  44. RPS netperf result netperf benchmark result on lwn.net: e1000e on

    8 core Intel Without RPS: 90K tps at 33% CPU With RPS: 239K tps at 60% CPU ! foredeth on 16 core AMD Without RPS: 103K tps at 15% CPU With RPS: 285K tps at 49% CPU
  45. RFS • ϓϩηε௥੻ػೳΛRPSʹ௥Ճ

  46. RFS ϑϩʔʹׂΓ౰ͯΒ ΕͨΩϡʔ͕Ѽઌϓ ϩηεͷCPUͱҟͳ ΔͱΦʔόϔου͕ ൃੜ͢Δ


  47. RFS ϋογϡςʔϒϧͷ ઃఆ஋Λมߋ͢Δ ࣄͰCPUΛҰக͞ ͤΔࣄ͕Ͱ͖Δ

  48. RFSͷ࢖͍ํ # echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus # echo 4096 >

    /sys/class/net/eth0/queues/rx-0/rps_flow_cnt # echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
  49. RFS netperf result netperf benchmark result on lwn.net: e1000e on

    8 core Intel No RFS or RPS 104K tps at 30% CPU No RFS (best RPS config): 290K tps at 63% CPU RFS 303K tps at 61% CPU ! RPC test tps CPU% 50/90/99% usec latency StdDev No RFS or RPS 103K 48% 757/900/3185 4472.35 RPS only: 174K 73% 415/993/2468 491.66 RFS 223K 73% 379/651/1382 315.61
  50. Accelerated RFS • RFSΛMultiQueue NICͰ΋࣮ݱ͢ΔͨΊ ͷNICυϥΠό֦ு • Linux kernel͸ϓϩηεͷ࣮ߦதCPUΛ NICυϥΠόʹ௨஌

    • NICυϥΠό͸௨஌Λड͚ͯϑϩʔͷ ΩϡʔׂΓ౰ͯΛߋ৽
  51. Receive Side Scalingͷ੍ݶ • 32bitͷϋογϡ஋Λͦͷ··࢖༻ͯ͠ ͍Ε͹ϋογϡিಥ͠ʹ͍͕͘ɺ Indirection Table͕খ͍͞ͷͰগͳ͍Ϗο τ਺Ͱindex஋ΛϚεΫ͍ͯ͠Δ
 ˠϑϩʔ͕ଟ͍࣌ʹϋογϡিಥ͢Δ

    • Accelerated RFSʹ͸ෆ޲͖
  52. Flow Steering • ϑϩʔͱΩϡʔͷରԠ৘ใΛهԱ
 4tupleɿΩϡʔ൪߸ͷΑ͏ͳܗࣜͰઃఆ • RSSͷΑ͏ͳ໌֬ͳڞ௨࢓༷͸ແ͍͕ɺ ֤ࣾͷ10GbEʹ࣮૷͞Ε͍ͯΔ • Accelerated

    RFS͸Flow SteeringΛલఏͱ ͍ͯ͠Δ
  53. Flow SteeringͰ खಈϑΟϧλઃఆ # ethtool --config-nfc ix00 flow-type tcp4 src-ip

    10.0.0.1 dst-ip 10.0.0.2 src-port 10000 dst-port 10001 action 6 Added rule with ID 2045
  54. XPS • MultiQueue NIC͸ૹ৴Ωϡʔ΋ෳ਺ ͍࣋ͬͯΔ • XPS͸CPUͱૹ৴ΩϡʔͷׂΓ౰ͯΛܾ ΊΔΠϯλϑΣʔε

  55. XPSͷ࢖͍ํ # echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus # echo 2 >

    /sys/class/net/eth0/queues/tx-1/xps_cpus # echo 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus # echo 8 > /sys/class/net/eth0/queues/tx-3/xps_cpus
  56. 4.σʔλҠಈʹ൐͏ ϨΠςϯγͷ࡟ݮ

  57. σʔλҠಈʹ൐͏ ϨΠςϯγͷ࡟ݮ • ϓϩτίϧॲཧΑΓ΋Ή͠ΖNIC㲗ϝϞ Ϧ㲗CPUΩϟογϡͷؒͰͷσʔλҠ ಈʹ൐͏Φʔόϔουͷํ͕ॏ͍έʔε ͕͋Δ • ಛʹϝϞϦΞΫηε͕௿଎

  58. Intel Data Direct I/O Technology • NIC͕DMAͨ͠ύέοτͷσʔλ͸ɺ࠷ॳʹCPU ͕ΞΫηεͨ࣌͠ʹඞͣΩϟογϡώοτϛεΛ ى͜͢
 ɹɹɹɹɹɹɹɹɹˣ

    • CPUͷLLCʢࡾ࣍ΩϟογϡʣʹDMAͯ͠͠·͑ʂ • ৽͍͠XeonͱIntel 10GbEͰαϙʔτ • OSରԠ͸ෆཁʢHW͕ಁաతʹఏڙ͢Δػೳʣ
  59. ίϐʔ͕ॏ͍ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ύέοτड৴

    ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS input queue socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ
  60. ίϐʔ͕ॏ͍͕ θϩίϐʔԽ͸ࠔ೉ • NICͷDMAόοϑΝ͸ΩϡʔຖʹઃఆͰ͖Δ͕ϑ ϩʔຖͰ͸ͳ͍
 ˠͦ΋ͦ΋ΩϡʔΛҰͭͷΞϓϦͰઐ༗ग़དྷΔલ ఏͰͳ͍ͱແཧ • όοϑΝ͕ϖʔδαΠζʹΞϥΠϯɾΞϩέʔτ ͞Εͯͳ͍ͱແཧ

    • ύέοτϔομͱϖΠϩʔυ͕෼཭͞Εͯͳ͍ͱ όοϑΝʹύέοτϔομ·Ͱॻ͔Εͯ͠·͏
  61. • ʢIntel I/O ATͱ΋ݺ͹ΕΔʣ • NICͷόοϑΝˠΞϓϦέʔγϣϯͷόο ϑΝ΁DMAసૹ • CPUෛՙΛ࡟ݮ •

    νοϓηοτʹ࣮૷ • CONFIG_NET_DMA=y in Linux Intel QuickData Technology
  62. 5.ϓϩτίϧελοΫΛ ܦ༝͠ͳ͍ωοτϫʔΫIO

  63. ϓϩτίϧελοΫΛ ܦ༝͠ͳ͍ωοτϫʔΫIO • ϓϩτίϧॲཧΛ͢Δඞཁ΋Socket APIͰ͋Δඞཁ ΋ແ͍ͳΒɺωοτϫʔΫIO͸΋ͬͱ଎͘ग़དྷΔ • ಛఆ༻్޲͚ • ϓϩτίϧॲཧΛඞཁͱ͠ͳ͍ΞϓϦέʔγϣϯ


    ˠsnortɺOpenvSwitchͳͲ • ϓϩτίϧॲཧΛࣗલͰߦͳͬͯͰ΋ੑೳΛ্͛ ͍ͨΞϓϦέʔγϣϯ
  64. جຊతͳ࢓૊Έ • ઐ༻NICυϥΠόͱઐ༻ ϥΠϒϥϦΛ༻͍ͯɺ NICͷड৴όοϑΝΛ MMAP • ύέοτΛϙʔϦϯά • ΞϓϦݻ༗ͷύέοτ

    ʹର͢ΔॲཧΛ࣮ߦ NIC RX1 RX2 RX3 Kernel Driver App RX1 RX2 RX3 MMAP 1BD LFUT Polling Do some work
  65. RAWιέοτɾBPF ͱͷҧ͍ʁ • θϩίϐʔ͕جຊ • ϚϧνΩϡʔͷड৴όοϑΝΛͦͷ··Ϣʔ βϥϯυʹΤΫεϙʔτ͍ͯ͠Δ • ↑ʹΑΓɺϚϧνεϨουੑೳ͕ߴ͍
 ʢRAWιέοτɾBPF͸γϯάϧεϨουʣ

    • ্ड़ͷػೳΛ࣮ݱ͢ΔͨΊNICͷυϥΠόΛվ ଄
  66. Intel DPDK • ׂΓࠐΈΛ΍ΊͯϙʔϦϯάΛ࢖༻͠Φʔόϔου࡟ݮ • ड৴όοϑΝʹHugePageΛ࢖͏ࣄʹΑΓTLB missΛ௿ݮ • 64 byte

    packetͷL3ϑΥϫʔσΟϯάੑೳʢIntelࢿྉΑΓʣ • Linux network stackɿXeon E5645 x 2 → 12.2Mpps • DPDKɿXeon E5645 x 1 → 35.2Mpps • DPDK : Next generation Intel Processor x 1 → 80Mpps
 • OpenvSwitchରԠ • ରԠNICɿIntel

  67. ྨࣅͷ࣮૷ • PF_RING DNA
 ntopͷ࣮૷ɺLinux޲͚
 libpcapαϙʔτ
 ରԠNICɿIntel • Netmap
 FreeBSD޲͚ͷ࣮૷ɺҰԠLinux൛͋Γ


    libpcap, OpenvSwitchαϙʔτ
 ରԠNICɿIntel, Realtek...
  68. ·ͱΊ • ߴ଎ͳωοτϫʔΫIOΛࡹͨ͘Ίʹ༷ʑͳվળ͕ ߦΘΕ͍ͯΔࣄΛ঺հ • ϋʔυ΢ΣΞɾιϑτ΢ΣΞͷ྆໘Ͱ࣮૷ͷݟ௚ ͕͠ཁٻ͞Ε͓ͯΓɺͦͷൣғ͸ωοτϫʔΫʹ ௚઀ؔ܎ͳ͍Α͏ͳॴʹ·ͰٴͿ • औΓ׶͑ͣ໌೔͔Βग़དྷΔ͜ͱɿ


    ·ͣ͸αʔόʹऔΓ෇͚ΔNICΛ
 ʮϚϧνΩϡʔNICʯʮRSSରԠʯʹ͠Α͏