Upgrade to Pro — share decks privately, control downloads, hide ads and more …

coil.pdf

A8df61f7319d2c1b4359657b3204ca85?s=47 Akihiro Ikezoe
February 19, 2019

 coil.pdf

A8df61f7319d2c1b4359657b3204ca85?s=128

Akihiro Ikezoe

February 19, 2019
Tweet

Transcript

  1. େن໛KubernetesΫϥελ޲͚ʹ CNIϓϥάΠϯΛࣗ࡞ͨ͠࿩ αΠϘ΢ζגࣜձࣾ ஑ఴ ໌޺ 1 Kubernetes Meetup Tokyo #16

    2019/02/19
  2. ຊ೔ͷൃද಺༰ • Πϯϑϥ࡮৽ϓϩδΣΫτ Neco ͷ঺հ • ͳͥCNIϓϥάΠϯΛࣗ࡞ͨ͠ͷ͔ʁ • ࣗ࡞CNIϓϥάΠϯ coil

    ͷ঺հ • CNIϓϥάΠϯ։ൃͰಘΒΕͨ஌ݟ • ·ͱΊ 2
  3. Πϯϑϥ࡮৽ϓϩδΣΫτ Neco 3

  4. Πϯϑϥ࡮৽ϓϩδΣΫτNecoͱ͸ • Kubernetes ΍ Λಋೖͯ͠ ͷΠϯϑϥ Λ࡮৽͢ΔϓϩδΣΫτ • ͱ͸ •

    kintone΍Garoon, OfficeͳͲͷαʔϏεΛSaaSͱͯ͠ఏڙ • ಋೖاۀ2.5ສࣾɺϢʔβʔ਺100ສਓ௒ • 2011೥ϦϦʔε • VMϕʔεͷΞʔΩςΫνϟ • σʔληϯλʔΛआΓͯɺ1,000୆Ҏ্ͷαʔόʔΛࣗલͰӡ༻ 4
  5. Πϯϑϥ࡮৽ϓϩδΣΫτNecoͱ͸ • ໨త • ϝϯςφϯείετͷେ෯࡟ݮ (NoOpsΛ໨ࢦ͢) • εέʔϥϏϦςΟͷ޲্ • αʔόʔͷू໿ੑ޲্

    • ΞϓϦέʔγϣϯ։ൃνʔϜ͕σϓϩΠɾӡ༻ʹࢀՃ • ΄ͱΜͲͷ੒Ռ෺ΛOSSͱͯ͠ެ։ 5
  6. NecoͷΞʔΩςΫνϟ Kubernetes CoreOS Node LB Prometheus squid CoreOS Node CoreOS

    Node CoreOS Node CoreOS Node Boot Server CKE sabakan CoreDNS 1,000୆ن໛ͷαʔόʔ ਺ઍʙͷΞϓϦέʔγϣϯίϯςφ ໿5୆ͷϒʔταʔόʔ Kubernetesͷ σϓϩΠ ؅ཧ ܧଓతσϦόϦʔ app app MySQL Elastic search 6 CoreOS Node neco updater Ubuntu Argo CD Rook
  7. NecoΛࢧ͑Διϑτ΢ΣΞͨͪ • sabakan • αʔόʔػࡐͷϥΠϑαΠΫϧ؅ཧͱϓϩϏδϣχϯάͷࣗಈԽΛ͓͜ͳ͏ɻ • BIOSͷઃఆɺOSͷωοτϒʔτɺσΟεΫ҉߸Խ΍֤छγεςϜιϑτ΢ΣΞͷ ηοτΞοϓΛࣗಈతʹ͓͜ͳ͏ɻ • CKE

    (Cybozu Container Engine) • KubernetesΫϥελͷࣗಈߏஙɾӡ༻Λ͓͜ͳ͏πʔϧɻ • sabakan͕ߏஙͨ͠αʔόʔʹKubernetesΛࣗಈతʹσϓϩΠ͢Δɻ • ΤϥʔΛࣗಈम෮ͨ͠ΓɺނোػࡐΛΫϥελ͔ΒऔΓ֎͢ͳͲͷӡ༻ΛࣗಈԽɻ • neco-updater • ΠϯϑϥͷܧଓతσϦόϦʔπʔϧɻ • GitHubͷϦϦʔε৘ใΛνΣοΫ͠ɺCKE΍sabakanΛ͸͡Ίͱ͢Δ֤छιϑτ ΢ΣΞͷσϓϩΠ΍ɺCoreOSΠϝʔδͷߋ৽ͳͲΛࣗಈతʹ࣮ࢪ͢Δɻ 7
  8. ͳͥCNIϓϥάΠϯΛ ࣗ࡞ͨ͠ͷ͔ʁ 8

  9. ͳͥCNIϓϥάΠϯΛࣗ࡞ͨ͠ͷ͔ʁ • NecoϓϩδΣΫτͷωοτϫʔΫߏ੒ʹϚονͨ͠ωοτϫʔΫϓ ϥάΠϯ͕ඞཁ • طଘͷϓϥάΠϯΛબఆ͕ͨ͠ɺ׬શʹཁ݅ʹϚον͢Δ΋ͷ͕ͳ ͔ͬͨɻ • CNIϓϥάΠϯ͸ෳ਺Λ૊Έ߹Θͤͯར༻͢Δ͜ͱ͕Մೳɻඞཁͳ෦ ෼͚ͩࣗ࡞͠Α͏ɻ

    9
  10. NecoͷωοτϫʔΫߏ੒ Rack1 Rack2 Rack3 • CLOSΞʔΩςΫνϟ • ϑϥοτͳL3ωοτϫʔΫ • ֤ϊʔυʹಉҰϗοϓ਺Ͱ౸ୡՄೳ

    • East-WestτϥϑΟοΫͷ૿େʹର͠ ͯεέʔϧՄೳ • BGPʹΑΔϧʔςΟϯά • AS per Rack • ECMPʹΑΔܦ࿏৑௕Խ • BFDʹΑΔߴ଎ͳܦ࿏ऩଋ • ৄ͘͠͸ϒϩάͰ • https://blog.cybozu.io/entry/2018/1 1/01/113000 10
  11. CNIϓϥάΠϯͷબఆ • σʔληϯλʔωοτϫʔΫͱ߹ΘͤͯɺKubernetesͷωοτϫʔΫʹ΋BGP Λ࠾༻ͯ͠ޮ཰తʹϧʔςΟϯάΛ͓͜ͳ͍͍ͨɻ • Calico • ։ൃ͕׆ൃͰػೳ΋๛෋ɻ࠾༻࣮੷΋ଟ͍ɻ • BGP

    SpeakerΛ಺แ͍ͯ͠Δ͜ͱ΍ɺେن໛ΫϥελͰ͸ܦ࿏਺͕૿େ͢Δ ͜ͱͳͲ͕ݒ೦ɻ • Romana • ػೳతʹ͸ཁ݅Λຬ͍ͨͯ͠Δɻ • etcd v3ະରԠɻ։ൃ͕׆ൃͰͳ͘࠷৽ͷKubernetesʹ௥ैͰ͖͍ͯͳ͍ɻ 11
  12. ϓϥάΠϯͷ૊Έ߹Θͤ • ίϯςφωοτϫʔΫͰ͸ղܾ͢΂͖՝୊͕ଟ਺͋Δɻ • IPΞυϨε؅ཧ (IPAM) • ωοτϫʔΫ઀ଓੑ • ωοτϫʔΫϙϦγʔ

    • ଳҬ੍ݶ • ͢΂ͯͷ໰୊Λ1ͭͷϓϥάΠϯͰղܾ͢Δඞཁ͸ͳ͍ɻ • ෳ਺ͷϓϥάΠϯΛ૊Έ߹ΘͤΔྫ: Canal • IPAM͸CNIϓϥάΠϯͷඪ४ػೳΛར༻ • ωοτϫʔΫ઀ଓੑ͸ flannel Λར༻ • ωοτϫʔΫϙϦγʔ͸ Calico Λར༻ 12
  13. Ͳ͜Λࣗ࡞͢Δඞཁ͕͋Δͷ͔ʁ • ωοτϫʔΫϙϦγʔ • ࣮૷͕େมɻಠࣗੑΛग़͢ඞཁ͸ͳ͍ɻ ‎ Calico΍CiliumͳͲΛར༻ • IPΞυϨε؅ཧ (IPAM)

    • ϧʔςΟϯάςʔϒϧͷ૿େΛආ͚ΔͨΊͷ࢓૊Έ͕ඞཁɻ • PodʹάϩʔόϧIPΞυϨεΛׂΓ౰ͯΔػೳ΋ཉ͍͠ɻ ‎ ࣗ࡞ • ωοτϫʔΫ઀ଓ • ϊʔυͷ֎ͱͷܦ࿏ަ׵͸ɺDC಺ͷϧʔςΟϯάιϑτ΢ΣΞʹ೚ͤΔ ‎ ϊʔυ಺ͷϧʔςΟϯά෦෼Λࣗ࡞ 13
  14. ࣗ࡞CNIϓϥάΠϯ coil 14

  15. coilͱ͸ • αΠϘ΢ζ͕։ൃ͢ΔCNIϓϥάΠϯ • OSSͱͯ͠ެ։ • https://github.com/cybozu-go/coil • ಛ௃ •

    CNI Spec v0.3.1 (Kubernetes v1.13)ʹରԠ • GoݴޠͰ࣮૷ɺόοΫΤϯυʹ͸ etcd Λ࠾༻ • Linux only, Kubernetes only, IPv4 only • IPAMͱϊʔυ಺ͷϧʔςΟϯάػೳͷΈΛఏڙ • ϧʔςΟϯάιϑτ΢ΣΞΛ಺แ͠ͳ͍ • େن໛ΫϥελͰͷར༻Λߟྀ 15
  16. େن໛ΫϥελͰར༻͢ΔͨΊʹ • ΦʔόϨΠωοτϫʔΫ͸ར༻͠ͳ͍ • VXLANͳͲͷΦʔόʔϨΠωοτϫʔΫ͸εϧʔϓοτ͕མͪΔɻ ‎ ୯७ͳϧʔςΟϯάʹΑΓωοτϫʔΫͷ઀ଓੑΛཱ֬ • Linux BridgeΛར༻͠ͳ͍

    • Linux BridgeΛ࢖͏ͱCPU࢖༻཰͕૿͑ͯ͠·͏ɻ ‎ ϧʔςΟϯάςʔϒϧʹveth΁ͷϧʔτΛ௥Ճ͢Δɻ • etcd API v3ͷΈʹରԠ • etcd API v2͸ɺଟ਺ͷΫϥΠΞϯτ͔Β઀ଓ͞Εͨ࣌ʹύϑΥʔϚϯε͕མͪΔɻ • ܦ࿏਺ͷ૿େΛ๷͙ • Pod͝ͱʹܦ࿏Λ޿ࠂ͢Δͱܦ࿏਺͕૿େͯ͠͠·͏ɻ ‎ ΞυϨεϒϩοΫͱ͍͏࢓૊ΈʹΑΓɺαϒωοτ͝ͱͷܦ࿏Λ޿ࠂɻ 16
  17. Node coilͷߏ੒ཁૉ • etcd • ׂΓ౰ͯͨΞυϨε৘ใΛ؅ཧ • coil-controller • k8s্ͷDeployment

    • ࢖ΘΕͳ͘ͳͬͨΞυϨεϒϩοΫͷղ์ • coil: • CNIϓϥάΠϯຊମ • Pod΁ͷvethͷ௥Ճɾ࡟আɺIPΞυϨεׂΓ౰ͯɺ ϧʔςΟϯάઃఆͳͲΛ࣮ࢪ • coil-node: k8s্ͷDaemonSet • coild: • ϊʔυ͝ͱͷΞυϨε؅ཧ΍ϧʔςΟϯάͷઃఆΛ࣮ࢪ • coil-installer: • coil΍ઃఆϑΝΠϧͷΠϯετʔϧΛ࣮ࢪ 17 <DeamonSet> coil-node <CNI> coil etcd House Keeping <Deployment> coil-controller coild coil installer conf ηοτΞοϓ
  18. ΞυϨεϒϩοΫ (Inspired by Romana) • coilͰ͸ΞυϨεϒϩοΫͱ͍͏࢓૊ΈΛ औΓೖΕɺαϒωοτ୯Ґ(ྫ: /28)Ͱܦ࿏ Λ޿ࠂ͢Δ͜ͱͰϧʔςΟϯάςʔϒϧͷ ංେԽΛճආ͍ͯ͠Δɻ

    18 Node2 Pod 10.0.1.16/32 Node1 Pod 10.0.1.0/32 Pod 10.0.1.17/32 Pod 10.0.1.1/32 BGP Router 10.0.1.0/28 -> Node1 10.0.1.16/28 -> Node2 αϒωοτ୯ҐͰ ܦ࿏Λ޿ࠂ PodͷΞυϨεϨϯδ: 10.0.1.0/24 ΞυϨεϒϩοΫ: 10.0.1.0/28 ΞυϨεϒϩοΫ: 10.0.1.16/28 ΞυϨεϒϩοΫ: 10.0.1.32/28 ΞυϨεϒϩοΫ: 10.0.1.48/28 ΞυϨεϒϩοΫ: 10.0.1.64/28 ΞυϨεϒϩοΫ 10.0.1.0/28 ΞυϨεϒϩοΫ 10.0.1.16/28 ɿ ϊʔυ͝ͱʹ ΞυϨεϒϩοΫ ΛׂΓ౰ͯ ׂΓ౰ͯՄೳͳ ΞυϨεΛ ϒϩοΫͱ͍͏ ୯Ґʹ෼ׂ PodͷΞυϨε ͸ΞυϨεϒ ϩοΫ͔Βׂ Γ౰ͯ
  19. coilͷॲཧͷྲྀΕ 1. kubelet ͕ࢦࣔΛड͚ͯ Pod Λ࡞੒ 2. CNIϓϥάΠϯ coil Λ࣮ߦ

    3. coil ͔Β coild ʹIPΞυϨεΛཁٻ 4. etcd ͔ΒΞυϨεϒϩοΫΛׂΓ౰ͯΔ 5. ܦ࿏ΛϧʔςΟϯάςʔϒϧʹॻ͖ग़͢ 6. ϧʔςΟάςʔϒϧΛಡΈࠐΈܦ࿏Λ޿ࠂ 7. coild ͔Β coil ʹIPΞυϨεΛฦ͢ 8. Pod ͷ netns ʹ veth ͷϖΞΛ࡞੒͠ɺIPΞυ ϨεͷׂΓ౰ͯͱϧʔτͷઃఆΛ͓͜ͳ͏ Node <DeamonSet> coild <CNI> coil kubelet Pod etcd routing table BGP Speaker eth0 veth BGP Router ᶃ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶈ ᶊ ᶉ 19
  20. boot-taint • CNIϓϥάΠϯ͕ηοτΞοϓ͞Ε͍ͯͳ͍ϊʔυ ʹPod͕εέδϡʔϧ͞ΕΔͱࠔΔɻ • kubeletͷىಈΦϓγϣϯʹ —register-with-taints Λ ࢦఆͯ͠ɺىಈ௚ޙͷϊʔυʹ͸PodΛεέδϡʔ ϧͰ͖ͳ͍Α͏ʹ͓ͯ͘͠ɻ

    • coil ͷηοτΞοϓ͕׬ྃͨ͠Β taints Λ࡟আ͠ɺ PodΛεέδϡʔϦϯάՄೳʹ͢Δɻ 20 taints/tolerations Kubernetesͷػೳɻ ϊʔυʹtaintsΛ෇༩͢ Δ͜ͱͰɺPodͷεέ δϡʔϦϯά΍࣮ߦΛ ېࢭ͢Δ͜ͱ͕Ͱ͖Δɻ ಛఆͷtolerations͕෇ ༩͞ΕͨPodͷΈεέ δϡʔϦϯάՄೳͱͳ Δɻ
  21. CNIϓϥάΠϯ։ൃͰ ಘΒΕͨ஌ݟ 21

  22. ։ൃ΍σόοά͕େมͰ͸ʁ • NecoϓϩδΣΫτͰ͸ɺσʔληϯλʔͷωοτϫʔΫߏ੒Λؙ͝ͱιϑτ ΢ΣΞͰԾ૝Խͨ͠؀ڥΛ༻ҙ͓ͯ͠ΓɺखݩͰ؆୯ʹಈ࡞֬ೝΛ͓͜ͳ͏͜ͱ ͕Ͱ͖Δɻ • γϯϓϧͳL3ωοτϫʔΫͳͷͰௐࠪ͠΍͍͢ɻ nsenterͱtcpdump͕͋Ε͹ɺେ఍ͷ໰୊͸ௐࠪͰ͖Δɻ 22

  23. Kubernetesͷ৘ใΛಘΔʹ͸ʁ • CNIϓϥάΠϯ͸Kubernetesઐ༻ͷ΋ͷͰ͸ͳ͍ͨΊɺPodͷ৘ใΛऔಘ͢Δ࢓ ༷͸ఆΊΒΕ͍ͯͳ͍ɻ • https://github.com/containernetworking/cni/issues/606 • ݱঢ়͸Kubernetes͔ΒCNIϓϥάΠϯΛݺͼग़͢ࡍʹɺCNI_ARGS Ͱ K8S_POD_NAME

    ΍ K8S_POD_NAMESPACE ͳͲͷ৘ใΛ౉͍ͯ͠Δɻ • ໌֬ʹ࢓༷ͱͯ͠ఆ·͍ͬͯΔΘ͚Ͱ͸ͳ͍ͷͰ࣮૷ΛಡΈղ͘ඞཁ͕͋ͬͨɻ (dockershim΍containerdͳͲɺίϯςφϥϯλΠϜ͝ͱʹͦΕͧΕ࣮૷ͯ͠Δ) 23
  24. CNIϓϥάΠϯ͕ಡΈࠐ·Εͳ͍ • Kubernetes͸/etc/cni/net.dʹ͓͍ͯ͋ΔઃఆΛݩʹϓϥάΠϯΛ࣮ߦ͢Δɻ • ઃఆϑΝΠϧͷߋ৽௚ޙʹΞϓϦέʔγϣϯͷPodΛ࡞੒ͨ͠ͱ͜Ζɺઃఆͨ͠ ϓϥάΠϯ͕ར༻͞Εͳ͍͜ͱ͕͋Δɻ • kubeletͰ͸ɺ5ඵ͝ͱʹίϯςφϥϯλΠϜͷεςʔλεߋ৽Λߦͳ͍ͬͯΔɻ ͜ͷͱ͖ʹCNIϓϥάΠϯ΋ಡΈࠐΜͰ͍Δɻ 24

  25. Kubernetes 1.13ͰPodؒ௨৴͕Ͱ͖ͳ͍ • KubernetesΛv1.13ʹΞοϓάϨʔυͨ͠ΒɺPodؒͷ௨৴͕Ͱ͖ͳ͘ͳͬͨɻ • kube-proxyͷIPVSϞʔυͷ࣮૷ͰɺLinuxͷΧʔωϧύϥϝʔλ(sysctl)ͷઃఆ͕ มߋ͞Ε͍ͯͨ (ϦϦʔεϊʔτʹهड़ͳ͠) • net.ipv4.conf.all.arp_ignore:

    0 -> 1 • net.ipv4.conf.all.arp_announce: 0 -> 2 • ͍͔ͭ͘ͷCNIϓϥάΠϯʹӨڹ͕ग़͍ͯΔ • https://github.com/kubernetes/kubernetes/issues/71555 25
  26. ໰୊఺ͱղܾํ๏ Node Pod ens3 eth0 veth 192.168.1.11 IP: 10.0.1.1/32 GW:

    192.168.1.11 ARP: 192.168.1.11 Node Pod ens3 eth0 veth 192.168.1.11 IP: 10.0.1.1/32 GW: 169.254.1.1 169.254.1.1/32 arp_ignore=0 veth͕ens3ͷMACΞυϨεΛฦ͢ arp_ignore=1 veth͕ens3ͷMACΞυϨεΛฦ͞ͳ͍ Pod͔ΒNodeʹͭͳ͕Βͳ͍ʂ vethʹϦϯΫϩʔΧϧΞυϨεΛׂΓ ౰ͯɺPodͷσϑΥϧτήʔτ΢ΣΠ Λ͜ͷΞυϨεʹมߋͨ͠ɻ मਖ਼લ मਖ਼ޙ 26
  27. ·ͱΊͱࠓޙ 27

  28. ·ͱΊͱࠓޙ • ·ͱΊ • େن໛ͳKubernetesΫϥελͰͷར༻Λߟྀͨ͠CLOSΞʔΩςΫνϟ޲͚ CNIϓϥάΠϯ coil Λ։ൃͨ͠ • coil

    ͷ࣮૷͸γϯϓϧͰಡΈ΍͍͢ͱࢥ͏ͷͰɺCNIϓϥάΠϯΛֶͼ͍ͨਓ ʹ΋Φεεϝ • ࠓޙ • େن໛Ϋϥελʹద༻ͯ҆͠ఆੑΛݕূ • νϡʔτϦΞϧͳͲΛ༻ҙͯ͠ར༻͠΍͘͢ 28
  29. We are hiring! • NecoϓϩδΣΫτͷ࠾༻৘ใ • https://cybozu.co.jp/company/job/recruitment/list/neco_project.html • εΩϧνΣοΫγʔτ •

    https://gist.github.com/ymmt2005/bd92296166e52d1beba9df8ac516a9db • NecoϓϩδΣΫτͰ਎ʹ͚ͭΒΕΔεΩϧΛ঺հ • ଟ༷ͳಇ͖ํ • ׬શϦϞʔτϫʔΫɺि20࣌ؒۈ຿ɺଞࣾͱͷ݉ۀͳͲɺ͍Ζ͍Ζͳಇ͖ํ Λ͍ͯ͠Δϝϯόʔ͕ॴଐ 29