Upgrade to Pro — share decks privately, control downloads, hide ads and more …

10GbE時代のネットワークI/O高速化

 10GbE時代のネットワークI/O高速化

Takuya ASADA

June 07, 2013
Tweet

More Decks by Takuya ASADA

Other Decks in Technology

Transcript

  1. 10GbE࣌୅ͷωοτ
    ϫʔΫI/Oߴ଎Խ
    Takuya ASADA

    View Slide

  2. Slide URL
    •http://slidesha.re/16OV9Yx

    View Slide

  3. ͸͡Ίʹ
    • 10GbEɺ40GbEͳͲͷۃΊͯߴ଎ͳ௨৴Λ
    αϙʔτ͢ΔNIC͕ɺPCαʔόͷྖҬͰ
    ΋࢖ΘΕΔΑ͏ʹͳ͖͍ͬͯͯΔ

    • ͜ͷΑ͏ͳ଎౓ͷ௨৴Λιϑτ΢ΣΞ
    ʢOSʣͰॲཧ͠ߴ͍ੑೳΛಘΔʹ͸༷ʑ
    ͳো֐͕͋Γɺϋʔυ΢ΣΞɾιϑτ΢Σ
    Ξ྆໘ͷ࣮૷Λݟ௚͢ඞཁ͕͋Δ

    View Slide

  4. ࠓ೔ͷτϐοΫ
    1. ׂΓࠐΈ͕ଟ͗͢Δ

    2. ϓϩτίϧॲཧ͕ॏ͍

    3. ෳ਺ͷCPUͰύέοτॲཧ͍ͨ͠

    4. σʔλҠಈʹ൐͏ϨΠςϯγͷ࡟ݮ

    5. ϓϩτίϧελοΫΛܦ༝͠ͳ͍ωοτϫʔ
    ΫIO

    View Slide

  5. 1. ׂΓࠐΈ͕ଟ͗͢Δ
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ύέοτड৴
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    input
    queue
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ

    View Slide

  6. ׂΓࠐΈ͕ଟ͗͢Δ
    • NICͷੑೳ޲্ʹΑͬͯɺҰఆ࣌ؒʹ
    NIC͕ॲཧͰ͖Δύέοτ਺͕ඈ༂తʹ
    ૿Ճ

    • ̍ύέοτຖʹׂΓࠐΈ͕དྷΔͱɺ௨
    ৴ྔ͕ଟ͍ͱ͖ʹίϯςΩετεΠο
    νճ਺͕૿͑͗͢ੑೳ͕ྼԽ

    View Slide

  7. چདྷͷύέοτड৴ॲཧ
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ύέοτड৴
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    input
    queue
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ϋʔυ΢ΣΞׂΓࠐΈ

    ˣ

    ड৴ΩϡʔʹΩϡʔ
    Πϯά

    ˣ

    ιϑτ΢ΣΞׂΓࠐ
    Έεέδϡʔϧ

    View Slide

  8. چདྷͷύέοτड৴ॲཧ
    • ̍ύέοτड৴͢ΔͨͼʹׂΓࠐΈΛ
    ड͚ͯॲཧΛߦ͍ͬͯΔ

    • 64byte frameͷ࠷େड৴Մೳ਺ɿ

    • GbEɿ໿1.5Mppsʢ150ສʣ

    • 10GbEɿ໿15Mppsʢ1500ສʣ

    View Slide

  9. ׂΓࠐΈΛແޮʹ͢Δʁ
    • ϙʔϦϯάํࣜ

    • NICͷׂΓࠐΈΛېࢭ͠ɺ୅ΘΓʹΫϩοΫׂΓࠐΈΛ༻
    ͍ͯఆظతʹड৴ΩϡʔΛνΣοΫ

    • σϝϦοτɿϨΠςϯγ্͕͕ΔɾఆظతʹCPUΛى͜͢
    ඞཁ͕͋Δ

    • ϋΠϒϦουํࣜ

    • ௨৴ྔ͕ଟ͘࿈ଓͯ͠ύέοτॲཧΛߦ͍ͬͯΔ࣌ͷΈׂ
    ΓࠐΈΛແޮԽͯ͠ϙʔϦϯάͰಈ࡞


    View Slide

  10. NAPIʢϋΠϒϦουํ
    ࣜʣ
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ׂΓࠐΈແޮԽ
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ύέοτ
    ύέοτ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ύέοτड৴
    ύέοτ͕ແ͘ͳΔ
    ·Ͱ܁Γฦ͠
    ϋʔυ΢ΣΞׂΓࠐΈ

    ˣ

    ׂΓࠐΈແޮԽˍ

    ϙʔϦϯά։࢝



    ύέοτ͕ແ͘ͳͬ
    ͨΒׂΓࠐΈ༗ޮԽ

    View Slide

  11. Interrupt Coalescing
    • NIC͕OSෛՙΛߟׂྀͯ͠ΓࠐΈΛؒ
    Ҿ͘

    • ύέοτ਺ݸʹҰճׂΓࠐΉɺ

    ͍҃͸Ұఆظؒ଴͔ͬͯΒׂΓࠐΉ

    • σϝϦοτɿϨΠςϯγ্͕͕Δ

    View Slide

  12. Interrupt CoalescingͷޮՌ
    • Intel 82599(ixgbe)ͰInterrupt Coalescingແޮɺ

    ༗ޮʢׂΓࠐΈස౓ࣗಈௐ੔ʣͰൺֱ

    • MultiQueue, GRO, LRO౳͸ແޮԽ

    • iperfͷTCPϞʔυͰܭଌ
    interrupts throughput packets CPU%(sy+si)
    ແޮ 46687 int/s 7.82 Gbps 660386 pkt/s 97.6%
    ༗ޮ 7994 int/s 8.24 Gbps 711132 pkt/s 79.6%

    View Slide

  13. Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ׂΓࠐΈແޮԽ
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ύέοτ
    ύέοτ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ύέοτड৴
    ύέοτ͕ແ͘ͳΔ
    ·Ͱ܁Γฦ͠
    2.ϓϩτίϧॲཧ͕ॏ͍

    View Slide

  14. ϓϩτίϧॲཧ͕ॏ͍
    • ಛʹখ͞ͳύέοτ͕େྔʹಧ͘৔߹ʹ
    ϓϩτίϧॲཧͰCPU࣌ؒΛେྔʹ࢖ͬͯ
    ͠·͏

    • ύέοτ਺෼ϓϩτίϧελοΫ͕ݺͼग़
    ͞ΕΔ

    ྫɿ64byte frameͷ৔߹

    ˠཧ࿦্ͷ࠷େ஋͸1500ສճ/s

    View Slide

  15. TOE
    (TCP Offload Engine)
    • OSͰϓϩτίϧॲཧ͢ΔͷΛ΍ΊͯɺNICͰॲཧ͢Δ

    • σϝϦοτ

    • ηΩϡϦςΟɿTOEʹηΩϡϦςΟϗʔϧ͕ੜͯ͡΋ɺOS
    ଆ͔Βରॲ͕ग़དྷͳ͍

    • ෳࡶੑɿOSͷωοτϫʔΫελοΫΛTOEͰஔ͖׵͑Δʹ
    ͸͔ͳΓ޿ൣғͷมߋ͕ඞཁ

    ϝʔΧʹΑͬͯTOEͷ࣮૷͕ҟͳΓڞ௨ΠϯλϑΣʔεఆ
    ͕ٛࠔ೉

    • Linuxɿαϙʔτ༧ఆແ͠

    View Slide

  16. Checksum Offloading

    • IPɾTCPɾUDP checksumͷܭࢉΛNICͰ
    ߦ͏

    View Slide

  17. Checksum Offloading
    ͷޮՌ
    • Intel 82599(ixgbe)Ͱൺֱ

    • iperfͷTCPϞʔυͰܭଌ

    • MultiQueue͸ແޮԽ

    • ethtool -K ix0 rx off
    throughput CPU%(sy+si)
    ແޮ 8.27 Gbps 86
    ༗ޮ 8.27 Gbps 85.2

    View Slide

  18. LRO
    (Large Receive Offload)
    • NIC͕ड৴ͨ͠TCPύέοτΛ݁߹͠ɺ

    େ͖ͳύέοτʹ͔ͯ͠ΒOS΁౉͢

    • ϓϩτίϧελοΫͷݺͼग़͠ճ਺Λ
    ࡟ݮ

    • LinuxͰ͸ιϑτ΢ΣΞʹΑΔLRO͕࣮
    ૷͞Ε͍ͯΔʢGROʣ

    View Slide

  19. LRO͕ແ͍৔߹
    • ύέοτຖʹωοτϫʔΫελοΫΛ
    ࣮ߦ
    seq 10000 seq 10001 seq 10002 seq 10003
    ←1500bytes→
    To network stack

    View Slide

  20. LRO͕༗Δ৔߹
    • ύέοτΛ݁߹͔ͯ͠ΒωοτϫʔΫελοΫΛ
    ࣮ߦɺωοτϫʔΫελοΫͷ࣮ߦճ਺Λ࡟ݮ
    seq 10000 seq 10001 seq 10002 seq 10003
    ←1500bytes→
    To network stack
    big one packet

    View Slide

  21. GROͷޮՌ
    • Intel 82599(ixgbe)Ͱൺֱ

    • MultiQueue͸ແޮԽ

    • iperfͷTCPϞʔυͰܭଌ

    • ethtool -K ix0 gro off
    packets network stack
    called count
    throughput CPU%(sy+si)
    ແޮ 632139 pkt/s 632139 call/s 7.30 Gbps 97.6%
    ༗ޮ 712387 pkt/s 47957 call/s 8.25 Gbps 79.6%

    View Slide

  22. TSO
    (TCP Segmentation Offload)
    • LROͷٯ

    • ύέοτΛϑϥάϝϯτԽͤͣʹૹ৴

    NIC͕ύέοτΛMTUαΠζʹ෼ׂ

    • OS͸ύέοτ෼ׂॲཧΛলུग़དྷΔ

    • LinuxͰ͸ιϑτ΢ΣΞʹΑΔGSOɺ

    ϋʔυ΢ΣΞʹΑΔTSOʗUFOΛαϙʔτ

    View Slide

  23. TSOͷޮՌ
    • Intel 82599(ixgbe)Ͱൺֱ

    • MultiQueue͸ແޮԽ

    • iperfͷTCPϞʔυͰܭଌ

    • ethtool -K ix0 gso off tso off
    packets throughput CPU%(sy+si)
    ແޮ 247794 pkt/s 2.87 Gbps 53.5%
    ༗ޮ 713127 pkt/s 8.16 Gbps 26.8%

    View Slide

  24. 3.ෳ਺ͷCPUͰύέοτॲཧ͍ͨ͠
    cpu0
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ׂΓࠐΈແޮԽ
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ύέοτ
    ύέοτ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ύέοτड৴
    ύέοτ͕ແ͘ͳΔ
    ·Ͱ܁Γฦ͠
    cpu1
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ׂΓࠐΈແޮԽ
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ύέοτ
    ύέοτ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ύέοτड৴
    ύέοτ͕ແ͘ͳΔ
    ·Ͱ܁Γฦ͠

    View Slide

  25. ιϑτׂΓࠐΈ͕
    ̍ͭͷίΞʹภΔ

    View Slide

  26. ιϑτׂΓࠐΈͱ͸ʁ
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ׂΓࠐΈແޮԽ
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ύέοτ
    ύέοτ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ύέοτड৴
    ύέοτ͕ແ͘ͳΔ
    ·Ͱ܁Γฦ͠
    ϙʔϦϯά͔Β

    ϓϩτίϧॲཧ·Ͱ

    →ωοτϫʔΫIOͷ
    େ൒෦෼

    View Slide

  27. ԿނภΔʁ
    ιϑτׂΓࠐΈ͸NICͷׂΓࠐΈ͕͔͔ͬͨCPU΁

    εέδϡʔϧ͞ΕΔ

    ˣ

    ϙʔϦϯά͔ΒϓϩτίϧελοΫͷ࣮ߦ·Ͱ

    ιϑτׂΓࠐΈ಺Ͱ࣮ߦ͞ΕΔ



    NICͷׂΓࠐΈ͕͔͔͍ͬͯΔCPU͚ͩʹ

    ෛՙ͕͔͔Δ

    View Slide

  28. ιϑτׂΓࠐΈ͕̍ͭͷ
    ίΞʹภͬͯੑೳ͕ग़ͳ͍
    • memcachedͳͲγϣʔτύέοτΛେ
    ྔʹࡹ͘ϫʔΫϩʔυͰݦࡏԽ

    • ιϑτ΢ΣΞׂΓࠐΈΛ࣮ߦ͍ͯ͠Δ
    CPU͕ϘτϧωοΫʹͳΓɺੑೳ͕ε
    έʔϧ͠ͳ͘ͳΔ

    View Slide

  29. ղܾํ๏
    • ύέοτΛෳ਺ͷCPU΁෼ࢄ͔ͤͯ͞Βϓ
    ϩτίϧॲཧ͢Δ࢓૊Έ͕͋Ε͹ྑ͍

    • ୠ͠ɺTCPʹ͸ॱংอূ͕༗ΔͷͰฒྻʹ
    ॲཧ͞ΕΔͱύέοτͷฒ΂௚͠ʢϦΦʔ
    μʣ͕ൃੜͯ͠ύϑΥʔϚϯε͕མͪΔ


    View Slide

  30. TCP Reordering
    • γʔέϯεφϯόʔ௨ΓͷॱংͰύέο
    τ͕ண৴͍ͯ͠Ε͹ॱʹόοϑΝ΁ίϐʔ
    ͍͚ͯͩ͘͠ͰΑ͍͕…
    ̍
    ̍
    protocol
    processing
    user buffer

    View Slide

  31. TCP Reordering
    ̍
    ̍
    protocol
    processing
    user buffer
    SFPSEFS
    RVFVF

    • ॱং͕ཚΕ͍ͯΔͱύέοτͷฒ΂௚
    ͠ʢϦΦʔμʣ࡞ۀ͕ඞཁʹͳΔ

    View Slide

  32. ղܾํ๏ʢଓʣ
    • ̍ͭͷϑϩʔ͸̍ͭͷCPUͰॲཧ͞Ε
    Δํ͕౎߹͕ྑ͍

    View Slide

  33. RSS
    ʢReceive Side Scalingʣ
    • CPU͝ͱʹผʑͷड৴ΩϡʔΛ࣋ͭNIC

    ʢMultiQueue NICͱݺ͹ΕΔʣ

    • ड৴Ωϡʔ͝ͱʹಠׂཱͨ͠ΓࠐΈΛ࣋ͭ

    • ಉ͡ϑϩʔʹଐ͢Δύέοτ͸ಉ͡Ωϡʔ΁ɺ

    ҟͳΔϑϩʔʹଐ͢Δύέοτ͸ͳΔ΂͘ผͷ
    Ωϡʔ΁෼ࢄ

    ˠύέοτϔομͷϋογϡ஋Λܭࢉ͢ΔࣄʹΑ
    ΓѼઌΩϡʔΛܾఆ

    View Slide

  34. MSI-XׂΓࠐΈ
    • PCI ExpressͰαϙʔτ

    • σόΠε͋ͨΓ2048ݸͷIRQΛ࣋ͯΔ

    • ͦΕͧΕͷIRQͷׂΓࠐΈઌCPUΛબ΂
    Δ

    ˠ1ͭͷNIC͕CPUίΞ਺෼ͷIRQΛ࣋
    ͯΔ

    View Slide

  35. RSSʹΑΔ
    ύέοτৼΓ෼͚
    NIC
    ύέοτ
    ύέοτ
    ύέοτ
    ϋογϡܭࢉ
    ύέοτண৴
    hash queue
    σΟεύον
    ࢀর
    RX
    Queue
    #0
    RX
    Queue
    #1
    RX
    Queue
    #2
    RX
    Queue
    #3
    cpu0 cpu1 cpu2 cpu3
    ड৴ॲཧ
    ׂΓࠐΈ
    ड৴ॲཧ


    0
    1

    View Slide

  36. Ωϡʔબ୒ͷखॱ
    indirection_table[64] = initial_value

    input[12] = 

    {src_addr, dst_addr, src_port, dst_port}

    key = toeplitz_hash(input, 12)

    index = key & 0x3f

    queue = indirection_table[index]

    View Slide

  37. RSSಋೖલ

    View Slide

  38. RSSಋೖޙ

    View Slide

  39. RPS
    • RSSඇରԠͷΦϯϘʔυNICΛ͏·͔ͭͬͯ͘αʔ
    όͷੑೳΛ޲্͍ͤͨ͞

    • ιϑτͰRSSΛ࣮૷ͯ͠͠·͓͏

    • ιϑτׂΓࠐΈͷஈ֊ͰύέοτΛ֤CPU΁͹Β
    ·͘

    • CPUׂؒΓࠐΈΛ࢖ͬͯଞͷCPUΛՔಈͤ͞Δ

    • RSSͷιϑτ΢ΤΞʹΑΔΤϛϡϨʔγϣϯ

    View Slide

  40. cpu3
    cpu2
    cpu1
    cpu0
    ׂΓࠐΈແޮԽ
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    ύέοτ
    γεςϜ
    ίʔϧ
    ϓϩηεىচ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ
    ύέοτ
    ύέοτ
    ιϑτ΢ΣΞׂΓࠐΈ
    ύέοτड৴
    ϋογϡܭࢉ
    σΟεύον
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    socket
    queue
    backlog
    #1
    hash queue
    ࢀর


    0
    1
    $16ؒ
    ׂΓࠐΈ
    backlog
    #2
    backlog
    #3

    View Slide

  41. RPSͷ࢖͍ํ
    # echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus

    # echo 4096 > /sys/class/net/eth0/queues/rx-0/
    rps_flow_cnt

    View Slide

  42. RPSಋೖલ

    View Slide

  43. RPSಋೖޙ

    View Slide

  44. RPS netperf result
    netperf benchmark result on lwn.net:

    e1000e on 8 core Intel

    Without RPS: 90K tps at 33% CPU

    With RPS: 239K tps at 60% CPU

    !
    foredeth on 16 core AMD

    Without RPS: 103K tps at 15% CPU

    With RPS: 285K tps at 49% CPU

    View Slide

  45. RFS
    • ϓϩηε௥੻ػೳΛRPSʹ௥Ճ

    View Slide

  46. RFS
    ϑϩʔʹׂΓ౰ͯΒ
    ΕͨΩϡʔ͕Ѽઌϓ
    ϩηεͷCPUͱҟͳ
    ΔͱΦʔόϔου͕
    ൃੜ͢Δ


    View Slide

  47. RFS
    ϋογϡςʔϒϧͷ
    ઃఆ஋Λมߋ͢Δ
    ࣄͰCPUΛҰக͞
    ͤΔࣄ͕Ͱ͖Δ

    View Slide

  48. RFSͷ࢖͍ํ
    # echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus

    # echo 4096 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt

    # echo 32768 > /proc/sys/net/core/rps_sock_flow_entries

    View Slide

  49. RFS netperf result
    netperf benchmark result on lwn.net:

    e1000e on 8 core Intel

    No RFS or RPS 104K tps at 30% CPU

    No RFS (best RPS config): 290K tps at 63% CPU

    RFS 303K tps at 61% CPU

    !
    RPC test tps CPU% 50/90/99% usec latency StdDev

    No RFS or RPS 103K 48% 757/900/3185 4472.35

    RPS only: 174K 73% 415/993/2468 491.66

    RFS 223K 73% 379/651/1382 315.61

    View Slide

  50. Accelerated RFS
    • RFSΛMultiQueue NICͰ΋࣮ݱ͢ΔͨΊ
    ͷNICυϥΠό֦ு

    • Linux kernel͸ϓϩηεͷ࣮ߦதCPUΛ
    NICυϥΠόʹ௨஌

    • NICυϥΠό͸௨஌Λड͚ͯϑϩʔͷ
    ΩϡʔׂΓ౰ͯΛߋ৽

    View Slide

  51. Receive Side Scalingͷ੍ݶ
    • 32bitͷϋογϡ஋Λͦͷ··࢖༻ͯ͠
    ͍Ε͹ϋογϡিಥ͠ʹ͍͕͘ɺ
    Indirection Table͕খ͍͞ͷͰগͳ͍Ϗο
    τ਺Ͱindex஋ΛϚεΫ͍ͯ͠Δ

    ˠϑϩʔ͕ଟ͍࣌ʹϋογϡিಥ͢Δ

    • Accelerated RFSʹ͸ෆ޲͖

    View Slide

  52. Flow Steering
    • ϑϩʔͱΩϡʔͷରԠ৘ใΛهԱ

    4tupleɿΩϡʔ൪߸ͷΑ͏ͳܗࣜͰઃఆ

    • RSSͷΑ͏ͳ໌֬ͳڞ௨࢓༷͸ແ͍͕ɺ
    ֤ࣾͷ10GbEʹ࣮૷͞Ε͍ͯΔ

    • Accelerated RFS͸Flow SteeringΛલఏͱ
    ͍ͯ͠Δ

    View Slide

  53. Flow SteeringͰ
    खಈϑΟϧλઃఆ
    # ethtool --config-nfc ix00 flow-type tcp4
    src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 10000
    dst-port 10001 action 6

    Added rule with ID 2045

    View Slide

  54. XPS
    • MultiQueue NIC͸ૹ৴Ωϡʔ΋ෳ਺
    ͍࣋ͬͯΔ

    • XPS͸CPUͱૹ৴ΩϡʔͷׂΓ౰ͯΛܾ
    ΊΔΠϯλϑΣʔε

    View Slide

  55. XPSͷ࢖͍ํ
    # echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus

    # echo 2 > /sys/class/net/eth0/queues/tx-1/xps_cpus

    # echo 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus

    # echo 8 > /sys/class/net/eth0/queues/tx-3/xps_cpus

    View Slide

  56. 4.σʔλҠಈʹ൐͏
    ϨΠςϯγͷ࡟ݮ

    View Slide

  57. σʔλҠಈʹ൐͏
    ϨΠςϯγͷ࡟ݮ
    • ϓϩτίϧॲཧΑΓ΋Ή͠ΖNIC㲗ϝϞ
    Ϧ㲗CPUΩϟογϡͷؒͰͷσʔλҠ
    ಈʹ൐͏Φʔόϔουͷํ͕ॏ͍έʔε
    ͕͋Δ

    • ಛʹϝϞϦΞΫηε͕௿଎

    View Slide

  58. Intel Data Direct I/O
    Technology
    • NIC͕DMAͨ͠ύέοτͷσʔλ͸ɺ࠷ॳʹCPU
    ͕ΞΫηεͨ࣌͠ʹඞͣΩϟογϡώοτϛεΛ
    ى͜͢

    ɹɹɹɹɹɹɹɹɹˣ

    • CPUͷLLCʢࡾ࣍ΩϟογϡʣʹDMAͯ͠͠·͑ʂ

    • ৽͍͠XeonͱIntel 10GbEͰαϙʔτ

    • OSରԠ͸ෆཁʢHW͕ಁաతʹఏڙ͢Δػೳʣ

    View Slide

  59. ίϐʔ͕ॏ͍
    Process(User)
    Process(Kernel)
    HW Intr Handler
    SW Intr Handler
    ύέοτड৴
    ϓϩτίϧॲཧ
    ιέοτ
    ड৴ॲཧ
    Ϣʔβ
    ϓϩάϥϜ
    VTFS
    CV⒎FS
    input
    queue
    socket
    queue
    ύέοτ
    γεςϜίʔϧ
    ϓϩηεىচ
    ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ
    ϋʔυ΢ΣΞׂΓࠐΈ
    Ϣʔβۭؒ΁ίϐʔ

    View Slide

  60. ίϐʔ͕ॏ͍͕
    θϩίϐʔԽ͸ࠔ೉
    • NICͷDMAόοϑΝ͸ΩϡʔຖʹઃఆͰ͖Δ͕ϑ
    ϩʔຖͰ͸ͳ͍

    ˠͦ΋ͦ΋ΩϡʔΛҰͭͷΞϓϦͰઐ༗ग़དྷΔલ
    ఏͰͳ͍ͱແཧ

    • όοϑΝ͕ϖʔδαΠζʹΞϥΠϯɾΞϩέʔτ
    ͞Εͯͳ͍ͱແཧ

    • ύέοτϔομͱϖΠϩʔυ͕෼཭͞Εͯͳ͍ͱ
    όοϑΝʹύέοτϔομ·Ͱॻ͔Εͯ͠·͏

    View Slide

  61. • ʢIntel I/O ATͱ΋ݺ͹ΕΔʣ

    • NICͷόοϑΝˠΞϓϦέʔγϣϯͷόο
    ϑΝ΁DMAసૹ

    • CPUෛՙΛ࡟ݮ

    • νοϓηοτʹ࣮૷

    • CONFIG_NET_DMA=y in Linux
    Intel QuickData Technology

    View Slide

  62. 5.ϓϩτίϧελοΫΛ
    ܦ༝͠ͳ͍ωοτϫʔΫIO

    View Slide

  63. ϓϩτίϧελοΫΛ
    ܦ༝͠ͳ͍ωοτϫʔΫIO
    • ϓϩτίϧॲཧΛ͢Δඞཁ΋Socket APIͰ͋Δඞཁ
    ΋ແ͍ͳΒɺωοτϫʔΫIO͸΋ͬͱ଎͘ग़དྷΔ

    • ಛఆ༻్޲͚

    • ϓϩτίϧॲཧΛඞཁͱ͠ͳ͍ΞϓϦέʔγϣϯ

    ˠsnortɺOpenvSwitchͳͲ

    • ϓϩτίϧॲཧΛࣗલͰߦͳͬͯͰ΋ੑೳΛ্͛
    ͍ͨΞϓϦέʔγϣϯ

    View Slide

  64. جຊతͳ࢓૊Έ
    • ઐ༻NICυϥΠόͱઐ༻
    ϥΠϒϥϦΛ༻͍ͯɺ
    NICͷड৴όοϑΝΛ
    MMAP

    • ύέοτΛϙʔϦϯά

    • ΞϓϦݻ༗ͷύέοτ
    ʹର͢ΔॲཧΛ࣮ߦ
    NIC
    RX1 RX2 RX3
    Kernel Driver
    App
    RX1 RX2 RX3
    MMAP
    1BD
    LFUT
    Polling
    Do some
    work

    View Slide

  65. RAWιέοτɾBPF
    ͱͷҧ͍ʁ
    • θϩίϐʔ͕جຊ

    • ϚϧνΩϡʔͷड৴όοϑΝΛͦͷ··Ϣʔ
    βϥϯυʹΤΫεϙʔτ͍ͯ͠Δ

    • ↑ʹΑΓɺϚϧνεϨουੑೳ͕ߴ͍

    ʢRAWιέοτɾBPF͸γϯάϧεϨουʣ

    • ্ड़ͷػೳΛ࣮ݱ͢ΔͨΊNICͷυϥΠόΛվ

    View Slide

  66. Intel DPDK
    • ׂΓࠐΈΛ΍ΊͯϙʔϦϯάΛ࢖༻͠Φʔόϔου࡟ݮ

    • ड৴όοϑΝʹHugePageΛ࢖͏ࣄʹΑΓTLB missΛ௿ݮ

    • 64 byte packetͷL3ϑΥϫʔσΟϯάੑೳʢIntelࢿྉΑΓʣ

    • Linux network stackɿXeon E5645 x 2 → 12.2Mpps

    • DPDKɿXeon E5645 x 1 → 35.2Mpps

    • DPDK : Next generation Intel Processor x 1 → 80Mpps

    • OpenvSwitchରԠ

    • ରԠNICɿIntel


    View Slide

  67. ྨࣅͷ࣮૷
    • PF_RING DNA

    ntopͷ࣮૷ɺLinux޲͚

    libpcapαϙʔτ

    ରԠNICɿIntel

    • Netmap

    FreeBSD޲͚ͷ࣮૷ɺҰԠLinux൛͋Γ

    libpcap, OpenvSwitchαϙʔτ

    ରԠNICɿIntel, Realtek...

    View Slide

  68. ·ͱΊ
    • ߴ଎ͳωοτϫʔΫIOΛࡹͨ͘Ίʹ༷ʑͳվળ͕
    ߦΘΕ͍ͯΔࣄΛ঺հ

    • ϋʔυ΢ΣΞɾιϑτ΢ΣΞͷ྆໘Ͱ࣮૷ͷݟ௚
    ͕͠ཁٻ͞Ε͓ͯΓɺͦͷൣғ͸ωοτϫʔΫʹ
    ௚઀ؔ܎ͳ͍Α͏ͳॴʹ·ͰٴͿ

    • औΓ׶͑ͣ໌೔͔Βग़དྷΔ͜ͱɿ

    ·ͣ͸αʔόʹऔΓ෇͚ΔNICΛ

    ʮϚϧνΩϡʔNICʯʮRSSରԠʯʹ͠Α͏

    View Slide