Slide 1

Slide 1 text

େن໛KubernetesΫϥελ޲͚ʹ CNIϓϥάΠϯΛࣗ࡞ͨ͠࿩ αΠϘ΢ζגࣜձࣾ ஑ఴ ໌޺ 1 Kubernetes Meetup Tokyo #16 2019/02/19

Slide 2

Slide 2 text

ຊ೔ͷൃද಺༰ • Πϯϑϥ࡮৽ϓϩδΣΫτ Neco ͷ঺հ • ͳͥCNIϓϥάΠϯΛࣗ࡞ͨ͠ͷ͔ʁ • ࣗ࡞CNIϓϥάΠϯ coil ͷ঺հ • CNIϓϥάΠϯ։ൃͰಘΒΕͨ஌ݟ • ·ͱΊ 2

Slide 3

Slide 3 text

Πϯϑϥ࡮৽ϓϩδΣΫτ Neco 3

Slide 4

Slide 4 text

Πϯϑϥ࡮৽ϓϩδΣΫτNecoͱ͸ • Kubernetes ΍ Λಋೖͯ͠ ͷΠϯϑϥ Λ࡮৽͢ΔϓϩδΣΫτ • ͱ͸ • kintone΍Garoon, OfficeͳͲͷαʔϏεΛSaaSͱͯ͠ఏڙ • ಋೖاۀ2.5ສࣾɺϢʔβʔ਺100ສਓ௒ • 2011೥ϦϦʔε • VMϕʔεͷΞʔΩςΫνϟ • σʔληϯλʔΛआΓͯɺ1,000୆Ҏ্ͷαʔόʔΛࣗલͰӡ༻ 4

Slide 5

Slide 5 text

Πϯϑϥ࡮৽ϓϩδΣΫτNecoͱ͸ • ໨త • ϝϯςφϯείετͷେ෯࡟ݮ (NoOpsΛ໨ࢦ͢) • εέʔϥϏϦςΟͷ޲্ • αʔόʔͷू໿ੑ޲্ • ΞϓϦέʔγϣϯ։ൃνʔϜ͕σϓϩΠɾӡ༻ʹࢀՃ • ΄ͱΜͲͷ੒Ռ෺ΛOSSͱͯ͠ެ։ 5

Slide 6

Slide 6 text

NecoͷΞʔΩςΫνϟ Kubernetes CoreOS Node LB Prometheus squid CoreOS Node CoreOS Node CoreOS Node CoreOS Node Boot Server CKE sabakan CoreDNS 1,000୆ن໛ͷαʔόʔ ਺ઍʙͷΞϓϦέʔγϣϯίϯςφ ໿5୆ͷϒʔταʔόʔ Kubernetesͷ σϓϩΠ ؅ཧ ܧଓతσϦόϦʔ app app MySQL Elastic search 6 CoreOS Node neco updater Ubuntu Argo CD Rook

Slide 7

Slide 7 text

NecoΛࢧ͑Διϑτ΢ΣΞͨͪ • sabakan • αʔόʔػࡐͷϥΠϑαΠΫϧ؅ཧͱϓϩϏδϣχϯάͷࣗಈԽΛ͓͜ͳ͏ɻ • BIOSͷઃఆɺOSͷωοτϒʔτɺσΟεΫ҉߸Խ΍֤छγεςϜιϑτ΢ΣΞͷ ηοτΞοϓΛࣗಈతʹ͓͜ͳ͏ɻ • CKE (Cybozu Container Engine) • KubernetesΫϥελͷࣗಈߏஙɾӡ༻Λ͓͜ͳ͏πʔϧɻ • sabakan͕ߏஙͨ͠αʔόʔʹKubernetesΛࣗಈతʹσϓϩΠ͢Δɻ • ΤϥʔΛࣗಈम෮ͨ͠ΓɺނোػࡐΛΫϥελ͔ΒऔΓ֎͢ͳͲͷӡ༻ΛࣗಈԽɻ • neco-updater • ΠϯϑϥͷܧଓతσϦόϦʔπʔϧɻ • GitHubͷϦϦʔε৘ใΛνΣοΫ͠ɺCKE΍sabakanΛ͸͡Ίͱ͢Δ֤छιϑτ ΢ΣΞͷσϓϩΠ΍ɺCoreOSΠϝʔδͷߋ৽ͳͲΛࣗಈతʹ࣮ࢪ͢Δɻ 7

Slide 8

Slide 8 text

ͳͥCNIϓϥάΠϯΛ ࣗ࡞ͨ͠ͷ͔ʁ 8

Slide 9

Slide 9 text

ͳͥCNIϓϥάΠϯΛࣗ࡞ͨ͠ͷ͔ʁ • NecoϓϩδΣΫτͷωοτϫʔΫߏ੒ʹϚονͨ͠ωοτϫʔΫϓ ϥάΠϯ͕ඞཁ • طଘͷϓϥάΠϯΛબఆ͕ͨ͠ɺ׬શʹཁ݅ʹϚον͢Δ΋ͷ͕ͳ ͔ͬͨɻ • CNIϓϥάΠϯ͸ෳ਺Λ૊Έ߹Θͤͯར༻͢Δ͜ͱ͕Մೳɻඞཁͳ෦ ෼͚ͩࣗ࡞͠Α͏ɻ 9

Slide 10

Slide 10 text

NecoͷωοτϫʔΫߏ੒ Rack1 Rack2 Rack3 • CLOSΞʔΩςΫνϟ • ϑϥοτͳL3ωοτϫʔΫ • ֤ϊʔυʹಉҰϗοϓ਺Ͱ౸ୡՄೳ • East-WestτϥϑΟοΫͷ૿େʹର͠ ͯεέʔϧՄೳ • BGPʹΑΔϧʔςΟϯά • AS per Rack • ECMPʹΑΔܦ࿏৑௕Խ • BFDʹΑΔߴ଎ͳܦ࿏ऩଋ • ৄ͘͠͸ϒϩάͰ • https://blog.cybozu.io/entry/2018/1 1/01/113000 10

Slide 11

Slide 11 text

CNIϓϥάΠϯͷબఆ • σʔληϯλʔωοτϫʔΫͱ߹ΘͤͯɺKubernetesͷωοτϫʔΫʹ΋BGP Λ࠾༻ͯ͠ޮ཰తʹϧʔςΟϯάΛ͓͜ͳ͍͍ͨɻ • Calico • ։ൃ͕׆ൃͰػೳ΋๛෋ɻ࠾༻࣮੷΋ଟ͍ɻ • BGP SpeakerΛ಺แ͍ͯ͠Δ͜ͱ΍ɺେن໛ΫϥελͰ͸ܦ࿏਺͕૿େ͢Δ ͜ͱͳͲ͕ݒ೦ɻ • Romana • ػೳతʹ͸ཁ݅Λຬ͍ͨͯ͠Δɻ • etcd v3ະରԠɻ։ൃ͕׆ൃͰͳ͘࠷৽ͷKubernetesʹ௥ैͰ͖͍ͯͳ͍ɻ 11

Slide 12

Slide 12 text

ϓϥάΠϯͷ૊Έ߹Θͤ • ίϯςφωοτϫʔΫͰ͸ղܾ͢΂͖՝୊͕ଟ਺͋Δɻ • IPΞυϨε؅ཧ (IPAM) • ωοτϫʔΫ઀ଓੑ • ωοτϫʔΫϙϦγʔ • ଳҬ੍ݶ • ͢΂ͯͷ໰୊Λ1ͭͷϓϥάΠϯͰղܾ͢Δඞཁ͸ͳ͍ɻ • ෳ਺ͷϓϥάΠϯΛ૊Έ߹ΘͤΔྫ: Canal • IPAM͸CNIϓϥάΠϯͷඪ४ػೳΛར༻ • ωοτϫʔΫ઀ଓੑ͸ flannel Λར༻ • ωοτϫʔΫϙϦγʔ͸ Calico Λར༻ 12

Slide 13

Slide 13 text

Ͳ͜Λࣗ࡞͢Δඞཁ͕͋Δͷ͔ʁ • ωοτϫʔΫϙϦγʔ • ࣮૷͕େมɻಠࣗੑΛग़͢ඞཁ͸ͳ͍ɻ ‎ Calico΍CiliumͳͲΛར༻ • IPΞυϨε؅ཧ (IPAM) • ϧʔςΟϯάςʔϒϧͷ૿େΛආ͚ΔͨΊͷ࢓૊Έ͕ඞཁɻ • PodʹάϩʔόϧIPΞυϨεΛׂΓ౰ͯΔػೳ΋ཉ͍͠ɻ ‎ ࣗ࡞ • ωοτϫʔΫ઀ଓ • ϊʔυͷ֎ͱͷܦ࿏ަ׵͸ɺDC಺ͷϧʔςΟϯάιϑτ΢ΣΞʹ೚ͤΔ ‎ ϊʔυ಺ͷϧʔςΟϯά෦෼Λࣗ࡞ 13

Slide 14

Slide 14 text

ࣗ࡞CNIϓϥάΠϯ coil 14

Slide 15

Slide 15 text

coilͱ͸ • αΠϘ΢ζ͕։ൃ͢ΔCNIϓϥάΠϯ • OSSͱͯ͠ެ։ • https://github.com/cybozu-go/coil • ಛ௃ • CNI Spec v0.3.1 (Kubernetes v1.13)ʹରԠ • GoݴޠͰ࣮૷ɺόοΫΤϯυʹ͸ etcd Λ࠾༻ • Linux only, Kubernetes only, IPv4 only • IPAMͱϊʔυ಺ͷϧʔςΟϯάػೳͷΈΛఏڙ • ϧʔςΟϯάιϑτ΢ΣΞΛ಺แ͠ͳ͍ • େن໛ΫϥελͰͷར༻Λߟྀ 15

Slide 16

Slide 16 text

େن໛ΫϥελͰར༻͢ΔͨΊʹ • ΦʔόϨΠωοτϫʔΫ͸ར༻͠ͳ͍ • VXLANͳͲͷΦʔόʔϨΠωοτϫʔΫ͸εϧʔϓοτ͕མͪΔɻ ‎ ୯७ͳϧʔςΟϯάʹΑΓωοτϫʔΫͷ઀ଓੑΛཱ֬ • Linux BridgeΛར༻͠ͳ͍ • Linux BridgeΛ࢖͏ͱCPU࢖༻཰͕૿͑ͯ͠·͏ɻ ‎ ϧʔςΟϯάςʔϒϧʹveth΁ͷϧʔτΛ௥Ճ͢Δɻ • etcd API v3ͷΈʹରԠ • etcd API v2͸ɺଟ਺ͷΫϥΠΞϯτ͔Β઀ଓ͞Εͨ࣌ʹύϑΥʔϚϯε͕མͪΔɻ • ܦ࿏਺ͷ૿େΛ๷͙ • Pod͝ͱʹܦ࿏Λ޿ࠂ͢Δͱܦ࿏਺͕૿େͯ͠͠·͏ɻ ‎ ΞυϨεϒϩοΫͱ͍͏࢓૊ΈʹΑΓɺαϒωοτ͝ͱͷܦ࿏Λ޿ࠂɻ 16

Slide 17

Slide 17 text

Node coilͷߏ੒ཁૉ • etcd • ׂΓ౰ͯͨΞυϨε৘ใΛ؅ཧ • coil-controller • k8s্ͷDeployment • ࢖ΘΕͳ͘ͳͬͨΞυϨεϒϩοΫͷղ์ • coil: • CNIϓϥάΠϯຊମ • Pod΁ͷvethͷ௥Ճɾ࡟আɺIPΞυϨεׂΓ౰ͯɺ ϧʔςΟϯάઃఆͳͲΛ࣮ࢪ • coil-node: k8s্ͷDaemonSet • coild: • ϊʔυ͝ͱͷΞυϨε؅ཧ΍ϧʔςΟϯάͷઃఆΛ࣮ࢪ • coil-installer: • coil΍ઃఆϑΝΠϧͷΠϯετʔϧΛ࣮ࢪ 17 coil-node coil etcd House Keeping coil-controller coild coil installer conf ηοτΞοϓ

Slide 18

Slide 18 text

ΞυϨεϒϩοΫ (Inspired by Romana) • coilͰ͸ΞυϨεϒϩοΫͱ͍͏࢓૊ΈΛ औΓೖΕɺαϒωοτ୯Ґ(ྫ: /28)Ͱܦ࿏ Λ޿ࠂ͢Δ͜ͱͰϧʔςΟϯάςʔϒϧͷ ංେԽΛճආ͍ͯ͠Δɻ 18 Node2 Pod 10.0.1.16/32 Node1 Pod 10.0.1.0/32 Pod 10.0.1.17/32 Pod 10.0.1.1/32 BGP Router 10.0.1.0/28 -> Node1 10.0.1.16/28 -> Node2 αϒωοτ୯ҐͰ ܦ࿏Λ޿ࠂ PodͷΞυϨεϨϯδ: 10.0.1.0/24 ΞυϨεϒϩοΫ: 10.0.1.0/28 ΞυϨεϒϩοΫ: 10.0.1.16/28 ΞυϨεϒϩοΫ: 10.0.1.32/28 ΞυϨεϒϩοΫ: 10.0.1.48/28 ΞυϨεϒϩοΫ: 10.0.1.64/28 ΞυϨεϒϩοΫ 10.0.1.0/28 ΞυϨεϒϩοΫ 10.0.1.16/28 ɿ ϊʔυ͝ͱʹ ΞυϨεϒϩοΫ ΛׂΓ౰ͯ ׂΓ౰ͯՄೳͳ ΞυϨεΛ ϒϩοΫͱ͍͏ ୯Ґʹ෼ׂ PodͷΞυϨε ͸ΞυϨεϒ ϩοΫ͔Βׂ Γ౰ͯ

Slide 19

Slide 19 text

coilͷॲཧͷྲྀΕ 1. kubelet ͕ࢦࣔΛड͚ͯ Pod Λ࡞੒ 2. CNIϓϥάΠϯ coil Λ࣮ߦ 3. coil ͔Β coild ʹIPΞυϨεΛཁٻ 4. etcd ͔ΒΞυϨεϒϩοΫΛׂΓ౰ͯΔ 5. ܦ࿏ΛϧʔςΟϯάςʔϒϧʹॻ͖ग़͢ 6. ϧʔςΟάςʔϒϧΛಡΈࠐΈܦ࿏Λ޿ࠂ 7. coild ͔Β coil ʹIPΞυϨεΛฦ͢ 8. Pod ͷ netns ʹ veth ͷϖΞΛ࡞੒͠ɺIPΞυ ϨεͷׂΓ౰ͯͱϧʔτͷઃఆΛ͓͜ͳ͏ Node coild coil kubelet Pod etcd routing table BGP Speaker eth0 veth BGP Router ᶃ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶈ ᶊ ᶉ 19

Slide 20

Slide 20 text

boot-taint • CNIϓϥάΠϯ͕ηοτΞοϓ͞Ε͍ͯͳ͍ϊʔυ ʹPod͕εέδϡʔϧ͞ΕΔͱࠔΔɻ • kubeletͷىಈΦϓγϣϯʹ —register-with-taints Λ ࢦఆͯ͠ɺىಈ௚ޙͷϊʔυʹ͸PodΛεέδϡʔ ϧͰ͖ͳ͍Α͏ʹ͓ͯ͘͠ɻ • coil ͷηοτΞοϓ͕׬ྃͨ͠Β taints Λ࡟আ͠ɺ PodΛεέδϡʔϦϯάՄೳʹ͢Δɻ 20 taints/tolerations Kubernetesͷػೳɻ ϊʔυʹtaintsΛ෇༩͢ Δ͜ͱͰɺPodͷεέ δϡʔϦϯά΍࣮ߦΛ ېࢭ͢Δ͜ͱ͕Ͱ͖Δɻ ಛఆͷtolerations͕෇ ༩͞ΕͨPodͷΈεέ δϡʔϦϯάՄೳͱͳ Δɻ

Slide 21

Slide 21 text

CNIϓϥάΠϯ։ൃͰ ಘΒΕͨ஌ݟ 21

Slide 22

Slide 22 text

։ൃ΍σόοά͕େมͰ͸ʁ • NecoϓϩδΣΫτͰ͸ɺσʔληϯλʔͷωοτϫʔΫߏ੒Λؙ͝ͱιϑτ ΢ΣΞͰԾ૝Խͨ͠؀ڥΛ༻ҙ͓ͯ͠ΓɺखݩͰ؆୯ʹಈ࡞֬ೝΛ͓͜ͳ͏͜ͱ ͕Ͱ͖Δɻ • γϯϓϧͳL3ωοτϫʔΫͳͷͰௐࠪ͠΍͍͢ɻ nsenterͱtcpdump͕͋Ε͹ɺେ఍ͷ໰୊͸ௐࠪͰ͖Δɻ 22

Slide 23

Slide 23 text

Kubernetesͷ৘ใΛಘΔʹ͸ʁ • CNIϓϥάΠϯ͸Kubernetesઐ༻ͷ΋ͷͰ͸ͳ͍ͨΊɺPodͷ৘ใΛऔಘ͢Δ࢓ ༷͸ఆΊΒΕ͍ͯͳ͍ɻ • https://github.com/containernetworking/cni/issues/606 • ݱঢ়͸Kubernetes͔ΒCNIϓϥάΠϯΛݺͼग़͢ࡍʹɺCNI_ARGS Ͱ K8S_POD_NAME ΍ K8S_POD_NAMESPACE ͳͲͷ৘ใΛ౉͍ͯ͠Δɻ • ໌֬ʹ࢓༷ͱͯ͠ఆ·͍ͬͯΔΘ͚Ͱ͸ͳ͍ͷͰ࣮૷ΛಡΈղ͘ඞཁ͕͋ͬͨɻ (dockershim΍containerdͳͲɺίϯςφϥϯλΠϜ͝ͱʹͦΕͧΕ࣮૷ͯ͠Δ) 23

Slide 24

Slide 24 text

CNIϓϥάΠϯ͕ಡΈࠐ·Εͳ͍ • Kubernetes͸/etc/cni/net.dʹ͓͍ͯ͋ΔઃఆΛݩʹϓϥάΠϯΛ࣮ߦ͢Δɻ • ઃఆϑΝΠϧͷߋ৽௚ޙʹΞϓϦέʔγϣϯͷPodΛ࡞੒ͨ͠ͱ͜Ζɺઃఆͨ͠ ϓϥάΠϯ͕ར༻͞Εͳ͍͜ͱ͕͋Δɻ • kubeletͰ͸ɺ5ඵ͝ͱʹίϯςφϥϯλΠϜͷεςʔλεߋ৽Λߦͳ͍ͬͯΔɻ ͜ͷͱ͖ʹCNIϓϥάΠϯ΋ಡΈࠐΜͰ͍Δɻ 24

Slide 25

Slide 25 text

Kubernetes 1.13ͰPodؒ௨৴͕Ͱ͖ͳ͍ • KubernetesΛv1.13ʹΞοϓάϨʔυͨ͠ΒɺPodؒͷ௨৴͕Ͱ͖ͳ͘ͳͬͨɻ • kube-proxyͷIPVSϞʔυͷ࣮૷ͰɺLinuxͷΧʔωϧύϥϝʔλ(sysctl)ͷઃఆ͕ มߋ͞Ε͍ͯͨ (ϦϦʔεϊʔτʹهड़ͳ͠) • net.ipv4.conf.all.arp_ignore: 0 -> 1 • net.ipv4.conf.all.arp_announce: 0 -> 2 • ͍͔ͭ͘ͷCNIϓϥάΠϯʹӨڹ͕ग़͍ͯΔ • https://github.com/kubernetes/kubernetes/issues/71555 25

Slide 26

Slide 26 text

໰୊఺ͱղܾํ๏ Node Pod ens3 eth0 veth 192.168.1.11 IP: 10.0.1.1/32 GW: 192.168.1.11 ARP: 192.168.1.11 Node Pod ens3 eth0 veth 192.168.1.11 IP: 10.0.1.1/32 GW: 169.254.1.1 169.254.1.1/32 arp_ignore=0 veth͕ens3ͷMACΞυϨεΛฦ͢ arp_ignore=1 veth͕ens3ͷMACΞυϨεΛฦ͞ͳ͍ Pod͔ΒNodeʹͭͳ͕Βͳ͍ʂ vethʹϦϯΫϩʔΧϧΞυϨεΛׂΓ ౰ͯɺPodͷσϑΥϧτήʔτ΢ΣΠ Λ͜ͷΞυϨεʹมߋͨ͠ɻ मਖ਼લ मਖ਼ޙ 26

Slide 27

Slide 27 text

·ͱΊͱࠓޙ 27

Slide 28

Slide 28 text

·ͱΊͱࠓޙ • ·ͱΊ • େن໛ͳKubernetesΫϥελͰͷར༻Λߟྀͨ͠CLOSΞʔΩςΫνϟ޲͚ CNIϓϥάΠϯ coil Λ։ൃͨ͠ • coil ͷ࣮૷͸γϯϓϧͰಡΈ΍͍͢ͱࢥ͏ͷͰɺCNIϓϥάΠϯΛֶͼ͍ͨਓ ʹ΋Φεεϝ • ࠓޙ • େن໛Ϋϥελʹద༻ͯ҆͠ఆੑΛݕূ • νϡʔτϦΞϧͳͲΛ༻ҙͯ͠ར༻͠΍͘͢ 28

Slide 29

Slide 29 text

We are hiring! • NecoϓϩδΣΫτͷ࠾༻৘ใ • https://cybozu.co.jp/company/job/recruitment/list/neco_project.html • εΩϧνΣοΫγʔτ • https://gist.github.com/ymmt2005/bd92296166e52d1beba9df8ac516a9db • NecoϓϩδΣΫτͰ਎ʹ͚ͭΒΕΔεΩϧΛ঺հ • ଟ༷ͳಇ͖ํ • ׬શϦϞʔτϫʔΫɺि20࣌ؒۈ຿ɺଞࣾͱͷ݉ۀͳͲɺ͍Ζ͍Ζͳಇ͖ํ Λ͍ͯ͠Δϝϯόʔ͕ॴଐ 29