Slide 1

Slide 1 text

10GbE࣌୅ͷωοτ ϫʔΫI/Oߴ଎Խ Takuya ASADA

Slide 2

Slide 2 text

Slide URL •http://slidesha.re/16OV9Yx

Slide 3

Slide 3 text

͸͡Ίʹ • 10GbEɺ40GbEͳͲͷۃΊͯߴ଎ͳ௨৴Λ αϙʔτ͢ΔNIC͕ɺPCαʔόͷྖҬͰ ΋࢖ΘΕΔΑ͏ʹͳ͖͍ͬͯͯΔ • ͜ͷΑ͏ͳ଎౓ͷ௨৴Λιϑτ΢ΣΞ ʢOSʣͰॲཧ͠ߴ͍ੑೳΛಘΔʹ͸༷ʑ ͳো֐͕͋Γɺϋʔυ΢ΣΞɾιϑτ΢Σ Ξ྆໘ͷ࣮૷Λݟ௚͢ඞཁ͕͋Δ

Slide 4

Slide 4 text

ࠓ೔ͷτϐοΫ 1. ׂΓࠐΈ͕ଟ͗͢Δ 2. ϓϩτίϧॲཧ͕ॏ͍ 3. ෳ਺ͷCPUͰύέοτॲཧ͍ͨ͠ 4. σʔλҠಈʹ൐͏ϨΠςϯγͷ࡟ݮ 5. ϓϩτίϧελοΫΛܦ༝͠ͳ͍ωοτϫʔ ΫIO

Slide 5

Slide 5 text

1. ׂΓࠐΈ͕ଟ͗͢Δ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ύέοτड৴ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS input queue socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ

Slide 6

Slide 6 text

ׂΓࠐΈ͕ଟ͗͢Δ • NICͷੑೳ޲্ʹΑͬͯɺҰఆ࣌ؒʹ NIC͕ॲཧͰ͖Δύέοτ਺͕ඈ༂తʹ ૿Ճ • ̍ύέοτຖʹׂΓࠐΈ͕དྷΔͱɺ௨ ৴ྔ͕ଟ͍ͱ͖ʹίϯςΩετεΠο νճ਺͕૿͑͗͢ੑೳ͕ྼԽ

Slide 7

Slide 7 text

چདྷͷύέοτड৴ॲཧ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ύέοτड৴ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS input queue socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ϋʔυ΢ΣΞׂΓࠐΈ
 ˣ
 ड৴ΩϡʔʹΩϡʔ Πϯά
 ˣ
 ιϑτ΢ΣΞׂΓࠐ Έεέδϡʔϧ

Slide 8

Slide 8 text

چདྷͷύέοτड৴ॲཧ • ̍ύέοτड৴͢ΔͨͼʹׂΓࠐΈΛ ड͚ͯॲཧΛߦ͍ͬͯΔ • 64byte frameͷ࠷େड৴Մೳ਺ɿ • GbEɿ໿1.5Mppsʢ150ສʣ • 10GbEɿ໿15Mppsʢ1500ສʣ

Slide 9

Slide 9 text

ׂΓࠐΈΛແޮʹ͢Δʁ • ϙʔϦϯάํࣜ • NICͷׂΓࠐΈΛېࢭ͠ɺ୅ΘΓʹΫϩοΫׂΓࠐΈΛ༻ ͍ͯఆظతʹड৴ΩϡʔΛνΣοΫ • σϝϦοτɿϨΠςϯγ্͕͕ΔɾఆظతʹCPUΛى͜͢ ඞཁ͕͋Δ • ϋΠϒϦουํࣜ • ௨৴ྔ͕ଟ͘࿈ଓͯ͠ύέοτॲཧΛߦ͍ͬͯΔ࣌ͷΈׂ ΓࠐΈΛແޮԽͯ͠ϙʔϦϯάͰಈ࡞


Slide 10

Slide 10 text

NAPIʢϋΠϒϦουํ ࣜʣ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ ϋʔυ΢ΣΞׂΓࠐΈ
 ˣ
 ׂΓࠐΈແޮԽˍ
 ϙʔϦϯά։࢝ ↓ ύέοτ͕ແ͘ͳͬ ͨΒׂΓࠐΈ༗ޮԽ

Slide 11

Slide 11 text

Interrupt Coalescing • NIC͕OSෛՙΛߟׂྀͯ͠ΓࠐΈΛؒ Ҿ͘ • ύέοτ਺ݸʹҰճׂΓࠐΉɺ
 ͍҃͸Ұఆظؒ଴͔ͬͯΒׂΓࠐΉ • σϝϦοτɿϨΠςϯγ্͕͕Δ

Slide 12

Slide 12 text

Interrupt CoalescingͷޮՌ • Intel 82599(ixgbe)ͰInterrupt Coalescingແޮɺ
 ༗ޮʢׂΓࠐΈස౓ࣗಈௐ੔ʣͰൺֱ • MultiQueue, GRO, LRO౳͸ແޮԽ • iperfͷTCPϞʔυͰܭଌ interrupts throughput packets CPU%(sy+si) ແޮ 46687 int/s 7.82 Gbps 660386 pkt/s 97.6% ༗ޮ 7994 int/s 8.24 Gbps 711132 pkt/s 79.6%

Slide 13

Slide 13 text

Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ 2.ϓϩτίϧॲཧ͕ॏ͍

Slide 14

Slide 14 text

ϓϩτίϧॲཧ͕ॏ͍ • ಛʹখ͞ͳύέοτ͕େྔʹಧ͘৔߹ʹ ϓϩτίϧॲཧͰCPU࣌ؒΛେྔʹ࢖ͬͯ ͠·͏ • ύέοτ਺෼ϓϩτίϧελοΫ͕ݺͼग़ ͞ΕΔ
 ྫɿ64byte frameͷ৔߹
 ˠཧ࿦্ͷ࠷େ஋͸1500ສճ/s

Slide 15

Slide 15 text

TOE (TCP Offload Engine) • OSͰϓϩτίϧॲཧ͢ΔͷΛ΍ΊͯɺNICͰॲཧ͢Δ • σϝϦοτ • ηΩϡϦςΟɿTOEʹηΩϡϦςΟϗʔϧ͕ੜͯ͡΋ɺOS ଆ͔Βରॲ͕ग़དྷͳ͍ • ෳࡶੑɿOSͷωοτϫʔΫελοΫΛTOEͰஔ͖׵͑Δʹ ͸͔ͳΓ޿ൣғͷมߋ͕ඞཁ
 ϝʔΧʹΑͬͯTOEͷ࣮૷͕ҟͳΓڞ௨ΠϯλϑΣʔεఆ ͕ٛࠔ೉ • Linuxɿαϙʔτ༧ఆແ͠

Slide 16

Slide 16 text

Checksum Offloading
 • IPɾTCPɾUDP checksumͷܭࢉΛNICͰ ߦ͏

Slide 17

Slide 17 text

Checksum Offloading ͷޮՌ • Intel 82599(ixgbe)Ͱൺֱ • iperfͷTCPϞʔυͰܭଌ • MultiQueue͸ແޮԽ • ethtool -K ix0 rx off throughput CPU%(sy+si) ແޮ 8.27 Gbps 86 ༗ޮ 8.27 Gbps 85.2

Slide 18

Slide 18 text

LRO (Large Receive Offload) • NIC͕ड৴ͨ͠TCPύέοτΛ݁߹͠ɺ
 େ͖ͳύέοτʹ͔ͯ͠ΒOS΁౉͢ • ϓϩτίϧελοΫͷݺͼग़͠ճ਺Λ ࡟ݮ • LinuxͰ͸ιϑτ΢ΣΞʹΑΔLRO͕࣮ ૷͞Ε͍ͯΔʢGROʣ

Slide 19

Slide 19 text

LRO͕ແ͍৔߹ • ύέοτຖʹωοτϫʔΫελοΫΛ ࣮ߦ seq 10000 seq 10001 seq 10002 seq 10003 ←1500bytes→ To network stack

Slide 20

Slide 20 text

LRO͕༗Δ৔߹ • ύέοτΛ݁߹͔ͯ͠ΒωοτϫʔΫελοΫΛ ࣮ߦɺωοτϫʔΫελοΫͷ࣮ߦճ਺Λ࡟ݮ seq 10000 seq 10001 seq 10002 seq 10003 ←1500bytes→ To network stack big one packet

Slide 21

Slide 21 text

GROͷޮՌ • Intel 82599(ixgbe)Ͱൺֱ • MultiQueue͸ແޮԽ • iperfͷTCPϞʔυͰܭଌ • ethtool -K ix0 gro off packets network stack called count throughput CPU%(sy+si) ແޮ 632139 pkt/s 632139 call/s 7.30 Gbps 97.6% ༗ޮ 712387 pkt/s 47957 call/s 8.25 Gbps 79.6%

Slide 22

Slide 22 text

TSO (TCP Segmentation Offload) • LROͷٯ • ύέοτΛϑϥάϝϯτԽͤͣʹૹ৴
 NIC͕ύέοτΛMTUαΠζʹ෼ׂ • OS͸ύέοτ෼ׂॲཧΛলུग़དྷΔ • LinuxͰ͸ιϑτ΢ΣΞʹΑΔGSOɺ
 ϋʔυ΢ΣΞʹΑΔTSOʗUFOΛαϙʔτ

Slide 23

Slide 23 text

TSOͷޮՌ • Intel 82599(ixgbe)Ͱൺֱ • MultiQueue͸ແޮԽ • iperfͷTCPϞʔυͰܭଌ • ethtool -K ix0 gso off tso off packets throughput CPU%(sy+si) ແޮ 247794 pkt/s 2.87 Gbps 53.5% ༗ޮ 713127 pkt/s 8.16 Gbps 26.8%

Slide 24

Slide 24 text

3.ෳ਺ͷCPUͰύέοτॲཧ͍ͨ͠ cpu0 Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ cpu1 Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠

Slide 25

Slide 25 text

ιϑτׂΓࠐΈ͕ ̍ͭͷίΞʹภΔ

Slide 26

Slide 26 text

ιϑτׂΓࠐΈͱ͸ʁ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ύέοτड৴ ύέοτ͕ແ͘ͳΔ ·Ͱ܁Γฦ͠ ϙʔϦϯά͔Β
 ϓϩτίϧॲཧ·Ͱ →ωοτϫʔΫIOͷ େ൒෦෼

Slide 27

Slide 27 text

ԿނภΔʁ ιϑτׂΓࠐΈ͸NICͷׂΓࠐΈ͕͔͔ͬͨCPU΁
 εέδϡʔϧ͞ΕΔ
 ˣ ϙʔϦϯά͔ΒϓϩτίϧελοΫͷ࣮ߦ·Ͱ
 ιϑτׂΓࠐΈ಺Ͱ࣮ߦ͞ΕΔ ↓ NICͷׂΓࠐΈ͕͔͔͍ͬͯΔCPU͚ͩʹ
 ෛՙ͕͔͔Δ

Slide 28

Slide 28 text

ιϑτׂΓࠐΈ͕̍ͭͷ ίΞʹภͬͯੑೳ͕ग़ͳ͍ • memcachedͳͲγϣʔτύέοτΛେ ྔʹࡹ͘ϫʔΫϩʔυͰݦࡏԽ • ιϑτ΢ΣΞׂΓࠐΈΛ࣮ߦ͍ͯ͠Δ CPU͕ϘτϧωοΫʹͳΓɺੑೳ͕ε έʔϧ͠ͳ͘ͳΔ

Slide 29

Slide 29 text

ղܾํ๏ • ύέοτΛෳ਺ͷCPU΁෼ࢄ͔ͤͯ͞Βϓ ϩτίϧॲཧ͢Δ࢓૊Έ͕͋Ε͹ྑ͍ • ୠ͠ɺTCPʹ͸ॱংอূ͕༗ΔͷͰฒྻʹ ॲཧ͞ΕΔͱύέοτͷฒ΂௚͠ʢϦΦʔ μʣ͕ൃੜͯ͠ύϑΥʔϚϯε͕མͪΔ


Slide 30

Slide 30 text

TCP Reordering • γʔέϯεφϯόʔ௨ΓͷॱংͰύέο τ͕ண৴͍ͯ͠Ε͹ॱʹόοϑΝ΁ίϐʔ ͍͚ͯͩ͘͠ͰΑ͍͕… ̍ ̍ protocol processing user buffer

Slide 31

Slide 31 text

TCP Reordering ̍ ̍ protocol processing user buffer SFPSEFS RVFVF • ॱং͕ཚΕ͍ͯΔͱύέοτͷฒ΂௚ ͠ʢϦΦʔμʣ࡞ۀ͕ඞཁʹͳΔ

Slide 32

Slide 32 text

ղܾํ๏ʢଓʣ • ̍ͭͷϑϩʔ͸̍ͭͷCPUͰॲཧ͞Ε Δํ͕౎߹͕ྑ͍

Slide 33

Slide 33 text

RSS ʢReceive Side Scalingʣ • CPU͝ͱʹผʑͷड৴ΩϡʔΛ࣋ͭNIC
 ʢMultiQueue NICͱݺ͹ΕΔʣ • ड৴Ωϡʔ͝ͱʹಠׂཱͨ͠ΓࠐΈΛ࣋ͭ • ಉ͡ϑϩʔʹଐ͢Δύέοτ͸ಉ͡Ωϡʔ΁ɺ
 ҟͳΔϑϩʔʹଐ͢Δύέοτ͸ͳΔ΂͘ผͷ Ωϡʔ΁෼ࢄ
 ˠύέοτϔομͷϋογϡ஋Λܭࢉ͢ΔࣄʹΑ ΓѼઌΩϡʔΛܾఆ

Slide 34

Slide 34 text

MSI-XׂΓࠐΈ • PCI ExpressͰαϙʔτ • σόΠε͋ͨΓ2048ݸͷIRQΛ࣋ͯΔ • ͦΕͧΕͷIRQͷׂΓࠐΈઌCPUΛબ΂ Δ
 ˠ1ͭͷNIC͕CPUίΞ਺෼ͷIRQΛ࣋ ͯΔ

Slide 35

Slide 35 text

RSSʹΑΔ ύέοτৼΓ෼͚ NIC ύέοτ ύέοτ ύέοτ ϋογϡܭࢉ ύέοτண৴ hash queue σΟεύον ࢀর RX Queue #0 RX Queue #1 RX Queue #2 RX Queue #3 cpu0 cpu1 cpu2 cpu3 ड৴ॲཧ ׂΓࠐΈ ड৴ॲཧ ■ ■ 0 1

Slide 36

Slide 36 text

Ωϡʔબ୒ͷखॱ indirection_table[64] = initial_value input[12] = 
 {src_addr, dst_addr, src_port, dst_port} key = toeplitz_hash(input, 12) index = key & 0x3f queue = indirection_table[index]

Slide 37

Slide 37 text

RSSಋೖલ

Slide 38

Slide 38 text

RSSಋೖޙ

Slide 39

Slide 39 text

RPS • RSSඇରԠͷΦϯϘʔυNICΛ͏·͔ͭͬͯ͘αʔ όͷੑೳΛ޲্͍ͤͨ͞ • ιϑτͰRSSΛ࣮૷ͯ͠͠·͓͏ • ιϑτׂΓࠐΈͷஈ֊ͰύέοτΛ֤CPU΁͹Β ·͘ • CPUׂؒΓࠐΈΛ࢖ͬͯଞͷCPUΛՔಈͤ͞Δ • RSSͷιϑτ΢ΤΞʹΑΔΤϛϡϨʔγϣϯ

Slide 40

Slide 40 text

cpu3 cpu2 cpu1 cpu0 ׂΓࠐΈແޮԽ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue ύέοτ γεςϜ ίʔϧ ϓϩηεىচ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ ύέοτ ύέοτ ιϑτ΢ΣΞׂΓࠐΈ ύέοτड৴ ϋογϡܭࢉ σΟεύον ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS socket queue backlog #1 hash queue ࢀর ■ ■ 0 1 $16ؒ ׂΓࠐΈ backlog #2 backlog #3

Slide 41

Slide 41 text

RPSͷ࢖͍ํ # echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus # echo 4096 > /sys/class/net/eth0/queues/rx-0/ rps_flow_cnt

Slide 42

Slide 42 text

RPSಋೖલ

Slide 43

Slide 43 text

RPSಋೖޙ

Slide 44

Slide 44 text

RPS netperf result netperf benchmark result on lwn.net: e1000e on 8 core Intel Without RPS: 90K tps at 33% CPU With RPS: 239K tps at 60% CPU ! foredeth on 16 core AMD Without RPS: 103K tps at 15% CPU With RPS: 285K tps at 49% CPU

Slide 45

Slide 45 text

RFS • ϓϩηε௥੻ػೳΛRPSʹ௥Ճ

Slide 46

Slide 46 text

RFS ϑϩʔʹׂΓ౰ͯΒ ΕͨΩϡʔ͕Ѽઌϓ ϩηεͷCPUͱҟͳ ΔͱΦʔόϔου͕ ൃੜ͢Δ


Slide 47

Slide 47 text

RFS ϋογϡςʔϒϧͷ ઃఆ஋Λมߋ͢Δ ࣄͰCPUΛҰக͞ ͤΔࣄ͕Ͱ͖Δ

Slide 48

Slide 48 text

RFSͷ࢖͍ํ # echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus # echo 4096 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt # echo 32768 > /proc/sys/net/core/rps_sock_flow_entries

Slide 49

Slide 49 text

RFS netperf result netperf benchmark result on lwn.net: e1000e on 8 core Intel No RFS or RPS 104K tps at 30% CPU No RFS (best RPS config): 290K tps at 63% CPU RFS 303K tps at 61% CPU ! RPC test tps CPU% 50/90/99% usec latency StdDev No RFS or RPS 103K 48% 757/900/3185 4472.35 RPS only: 174K 73% 415/993/2468 491.66 RFS 223K 73% 379/651/1382 315.61

Slide 50

Slide 50 text

Accelerated RFS • RFSΛMultiQueue NICͰ΋࣮ݱ͢ΔͨΊ ͷNICυϥΠό֦ு • Linux kernel͸ϓϩηεͷ࣮ߦதCPUΛ NICυϥΠόʹ௨஌ • NICυϥΠό͸௨஌Λड͚ͯϑϩʔͷ ΩϡʔׂΓ౰ͯΛߋ৽

Slide 51

Slide 51 text

Receive Side Scalingͷ੍ݶ • 32bitͷϋογϡ஋Λͦͷ··࢖༻ͯ͠ ͍Ε͹ϋογϡিಥ͠ʹ͍͕͘ɺ Indirection Table͕খ͍͞ͷͰগͳ͍Ϗο τ਺Ͱindex஋ΛϚεΫ͍ͯ͠Δ
 ˠϑϩʔ͕ଟ͍࣌ʹϋογϡিಥ͢Δ • Accelerated RFSʹ͸ෆ޲͖

Slide 52

Slide 52 text

Flow Steering • ϑϩʔͱΩϡʔͷରԠ৘ใΛهԱ
 4tupleɿΩϡʔ൪߸ͷΑ͏ͳܗࣜͰઃఆ • RSSͷΑ͏ͳ໌֬ͳڞ௨࢓༷͸ແ͍͕ɺ ֤ࣾͷ10GbEʹ࣮૷͞Ε͍ͯΔ • Accelerated RFS͸Flow SteeringΛલఏͱ ͍ͯ͠Δ

Slide 53

Slide 53 text

Flow SteeringͰ खಈϑΟϧλઃఆ # ethtool --config-nfc ix00 flow-type tcp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 10000 dst-port 10001 action 6 Added rule with ID 2045

Slide 54

Slide 54 text

XPS • MultiQueue NIC͸ૹ৴Ωϡʔ΋ෳ਺ ͍࣋ͬͯΔ • XPS͸CPUͱૹ৴ΩϡʔͷׂΓ౰ͯΛܾ ΊΔΠϯλϑΣʔε

Slide 55

Slide 55 text

XPSͷ࢖͍ํ # echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus # echo 2 > /sys/class/net/eth0/queues/tx-1/xps_cpus # echo 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus # echo 8 > /sys/class/net/eth0/queues/tx-3/xps_cpus

Slide 56

Slide 56 text

4.σʔλҠಈʹ൐͏ ϨΠςϯγͷ࡟ݮ

Slide 57

Slide 57 text

σʔλҠಈʹ൐͏ ϨΠςϯγͷ࡟ݮ • ϓϩτίϧॲཧΑΓ΋Ή͠ΖNIC㲗ϝϞ Ϧ㲗CPUΩϟογϡͷؒͰͷσʔλҠ ಈʹ൐͏Φʔόϔουͷํ͕ॏ͍έʔε ͕͋Δ • ಛʹϝϞϦΞΫηε͕௿଎

Slide 58

Slide 58 text

Intel Data Direct I/O Technology • NIC͕DMAͨ͠ύέοτͷσʔλ͸ɺ࠷ॳʹCPU ͕ΞΫηεͨ࣌͠ʹඞͣΩϟογϡώοτϛεΛ ى͜͢
 ɹɹɹɹɹɹɹɹɹˣ • CPUͷLLCʢࡾ࣍ΩϟογϡʣʹDMAͯ͠͠·͑ʂ • ৽͍͠XeonͱIntel 10GbEͰαϙʔτ • OSରԠ͸ෆཁʢHW͕ಁաతʹఏڙ͢Δػೳʣ

Slide 59

Slide 59 text

ίϐʔ͕ॏ͍ Process(User) Process(Kernel) HW Intr Handler SW Intr Handler ύέοτड৴ ϓϩτίϧॲཧ ιέοτ ड৴ॲཧ Ϣʔβ ϓϩάϥϜ VTFS CV⒎FS input queue socket queue ύέοτ γεςϜίʔϧ ϓϩηεىচ ιϑτ΢ΣΞׂΓࠐΈεέδϡʔϧ ϋʔυ΢ΣΞׂΓࠐΈ Ϣʔβۭؒ΁ίϐʔ

Slide 60

Slide 60 text

ίϐʔ͕ॏ͍͕ θϩίϐʔԽ͸ࠔ೉ • NICͷDMAόοϑΝ͸ΩϡʔຖʹઃఆͰ͖Δ͕ϑ ϩʔຖͰ͸ͳ͍
 ˠͦ΋ͦ΋ΩϡʔΛҰͭͷΞϓϦͰઐ༗ग़དྷΔલ ఏͰͳ͍ͱແཧ • όοϑΝ͕ϖʔδαΠζʹΞϥΠϯɾΞϩέʔτ ͞Εͯͳ͍ͱແཧ • ύέοτϔομͱϖΠϩʔυ͕෼཭͞Εͯͳ͍ͱ όοϑΝʹύέοτϔομ·Ͱॻ͔Εͯ͠·͏

Slide 61

Slide 61 text

• ʢIntel I/O ATͱ΋ݺ͹ΕΔʣ • NICͷόοϑΝˠΞϓϦέʔγϣϯͷόο ϑΝ΁DMAసૹ • CPUෛՙΛ࡟ݮ • νοϓηοτʹ࣮૷ • CONFIG_NET_DMA=y in Linux Intel QuickData Technology

Slide 62

Slide 62 text

5.ϓϩτίϧελοΫΛ ܦ༝͠ͳ͍ωοτϫʔΫIO

Slide 63

Slide 63 text

ϓϩτίϧελοΫΛ ܦ༝͠ͳ͍ωοτϫʔΫIO • ϓϩτίϧॲཧΛ͢Δඞཁ΋Socket APIͰ͋Δඞཁ ΋ແ͍ͳΒɺωοτϫʔΫIO͸΋ͬͱ଎͘ग़དྷΔ • ಛఆ༻్޲͚ • ϓϩτίϧॲཧΛඞཁͱ͠ͳ͍ΞϓϦέʔγϣϯ
 ˠsnortɺOpenvSwitchͳͲ • ϓϩτίϧॲཧΛࣗલͰߦͳͬͯͰ΋ੑೳΛ্͛ ͍ͨΞϓϦέʔγϣϯ

Slide 64

Slide 64 text

جຊతͳ࢓૊Έ • ઐ༻NICυϥΠόͱઐ༻ ϥΠϒϥϦΛ༻͍ͯɺ NICͷड৴όοϑΝΛ MMAP • ύέοτΛϙʔϦϯά • ΞϓϦݻ༗ͷύέοτ ʹର͢ΔॲཧΛ࣮ߦ NIC RX1 RX2 RX3 Kernel Driver App RX1 RX2 RX3 MMAP 1BD LFUT Polling Do some work

Slide 65

Slide 65 text

RAWιέοτɾBPF ͱͷҧ͍ʁ • θϩίϐʔ͕جຊ • ϚϧνΩϡʔͷड৴όοϑΝΛͦͷ··Ϣʔ βϥϯυʹΤΫεϙʔτ͍ͯ͠Δ • ↑ʹΑΓɺϚϧνεϨουੑೳ͕ߴ͍
 ʢRAWιέοτɾBPF͸γϯάϧεϨουʣ • ্ड़ͷػೳΛ࣮ݱ͢ΔͨΊNICͷυϥΠόΛվ ଄

Slide 66

Slide 66 text

Intel DPDK • ׂΓࠐΈΛ΍ΊͯϙʔϦϯάΛ࢖༻͠Φʔόϔου࡟ݮ • ड৴όοϑΝʹHugePageΛ࢖͏ࣄʹΑΓTLB missΛ௿ݮ • 64 byte packetͷL3ϑΥϫʔσΟϯάੑೳʢIntelࢿྉΑΓʣ • Linux network stackɿXeon E5645 x 2 → 12.2Mpps • DPDKɿXeon E5645 x 1 → 35.2Mpps • DPDK : Next generation Intel Processor x 1 → 80Mpps
 • OpenvSwitchରԠ • ରԠNICɿIntel


Slide 67

Slide 67 text

ྨࣅͷ࣮૷ • PF_RING DNA
 ntopͷ࣮૷ɺLinux޲͚
 libpcapαϙʔτ
 ରԠNICɿIntel • Netmap
 FreeBSD޲͚ͷ࣮૷ɺҰԠLinux൛͋Γ
 libpcap, OpenvSwitchαϙʔτ
 ରԠNICɿIntel, Realtek...

Slide 68

Slide 68 text

·ͱΊ • ߴ଎ͳωοτϫʔΫIOΛࡹͨ͘Ίʹ༷ʑͳվળ͕ ߦΘΕ͍ͯΔࣄΛ঺հ • ϋʔυ΢ΣΞɾιϑτ΢ΣΞͷ྆໘Ͱ࣮૷ͷݟ௚ ͕͠ཁٻ͞Ε͓ͯΓɺͦͷൣғ͸ωοτϫʔΫʹ ௚઀ؔ܎ͳ͍Α͏ͳॴʹ·ͰٴͿ • औΓ׶͑ͣ໌೔͔Βग़དྷΔ͜ͱɿ
 ·ͣ͸αʔόʹऔΓ෇͚ΔNICΛ
 ʮϚϧνΩϡʔNICʯʮRSSରԠʯʹ͠Α͏