Slide 1

Slide 1 text

KARTEΛࢧ͑ΔAutoScaling ~SpotInstanceฤ ʲτϨλ×ϓϨΠυʳTechBlog Deep Dive Meetup #1 ʙΠϯϑϥฤʙ 2016/07/05 @tik-son

Slide 2

Slide 2 text

Profile ஛ଜঘ඙ a.k.a @tik-son ϓϨΠυʹ͸2014/11ʹJoin ओʹΠϯϑϥΤϯδχΞ ि຤ΪλϦετ

Slide 3

Slide 3 text

ΞΫηεΛҰਓͷਓؒͱͯ͠ ϦΞϧλΠϜͰՄࢹԽ͠ ౰ͨΓલͷ઀٬Λ8FC্ͰՄೳʹ

Slide 4

Slide 4 text

http://tech.plaid.co.jp/ 2016/04 start ݄ҰϖʔεͰߋ৽༧ఆ

Slide 5

Slide 5 text

࿩͢͜ͱ AutoscalingΛ࢖͏Ϟνϕʔγϣϯ AutoscalingͱSpotInstance (AutoscalingͰࠔͬͨ͜ͱ/೰Έ) ※ࠓ೔࿩͢Autoscaling͸AWSͷAutoscaling

Slide 6

Slide 6 text

AutoscalingΛ࢖͏Ϟνϕʔγϣϯ

Slide 7

Slide 7 text

̍೔ͷෛՙਪҠ ϐʔΫ࣌ͱඇϐʔΫ࣌Ͱ࠷େ10ഒఔ౓ͷࠩ 1. ࣌ؒଳʹΑͬͯେ͖͘ෛՙ͕มΘΔͨΊɺϦιʔεΛෛՙͷϐʔ Ϋʹ͋Θͤͯ༻ҙ͓ͯ͘͠ͱແବʹίετ͕͔͔ͬͯ͠·͏

Slide 8

Slide 8 text

1. ࣌ؒଳʹΑͬͯେ͖͘ෛՙ͕มΘΔͨΊɺϦιʔεΛෛՙͷϐʔ Ϋʹ͋Θͤͯ༻ҙ͓ͯ͘͠ͱແବʹίετ͕͔͔ͬͯ͠·͏ 2. ݈શͳΠϯελϯεͷΈ͕ੜ͖ଓ͚ΔΑ͏ʹ͍ͨ͠ new

Slide 9

Slide 9 text

ෆ݈શͳΠϯελϯεͷ࿩ • ෆ݈શͳΠϯελϯε(ύϑΥʔϚϯε͕Ͱͳ͘ͳΔɺΞ ϓϦέʔγϣϯͷঢ়ଶ͕͓͔͘͠ͳΔ)͕֬཰తʹগͳ͍ ͕ग़ݱ • ෆ݈શͳΠϯελϯε͕࢒Γଓ͚ΔͱKARTEͷύϑΥʔ Ϛϯε͕(෦෼తʹ)Լ͕ͬͯ͠·͏͜ͱʹͳΔ • ਖ਼ৗԽ͠ͳ͍ͱ͍͚ͳ͍͚ͲɺखಈOperation͸ΊΜͲ͘ ͍͞

Slide 10

Slide 10 text

ᶃΞϓϦέʔγϣϯͷhealthcheck༻ͷport/pathʹELB Ͱhealthcheck ᶄAutoscalingͰhealthcheck͕ࣦഊͨ͠ΠϯελϯεΛམͱ͢ ᶅAutoscalingͰ৽ͨͳΠϯελϯεΛ্ཱͪ͛Δ ※WebαʔόͰͳͯ͘΋ɺhealthcheckϙʔτΛ࡞͓͚ͬͯ͹࣮ݱՄೳ ᶃ ᶄ ᶅ new ݈શͳΠϯελϯεͷΈ͕ੜ͖ଓ͚Δ࢓૊Έ terminate

Slide 11

Slide 11 text

1. ࣌ؒଳʹΑͬͯେ͖͘ෛՙ͕มΘΔͨΊɺϦιʔεΛෛՙͷϐʔ Ϋʹ͋Θͤͯ༻ҙ͓ͯ͘͠ͱແବʹίετ͕͔͔ͬͯ͠·͏ 2. ݈શͳΠϯελϯεͷΈ͕ੜ͖ଓ͚ΔΑ͏ʹ͍ͨ͠ AutoscalingΛ࢖͏Ϟνϕʔγϣϯ͸2ͭ

Slide 12

Slide 12 text

KARTEར༻ऀ༷ KARTEར༻ऀ༷ͷ ͓٬༷ ղੳ ؅ཧը໘ τϥοΫ ղੳDB τϥοΫDB ؅ཧը໘DB εςʔτϨεͳαʔό(EC2)શͯ Autoscalingར༻Օॴ

Slide 13

Slide 13 text

AutoscalingͱSpotInstance

Slide 14

Slide 14 text

• onedmandͷ1/3 ʙ 1/5 ͙Β͍ͷՁ֨ͰΠϯελϯεΛར༻Ͱ͖Δ • ೖࡳՁ֨ ≧ ࢢ৔Ձ֨ →ɹSpotΠϯελϯεΛಈ͔ͤΔ • ೖࡳՁ֨ < ࢢ৔Ձ֨ → spotΠϯελϯεར༻Ͱ͖ͳ͍ɻՔಇதͷΠϯελϯεΛམͱ͞ ΕΔ • ىಈઃఆʹೖࡳՁ֨Λઃఆ͢Δ͚ͩͰAutoscalingܦ༝Ͱ؆୯ʹར༻Ͱ͖Δ Spot Instance ೖࡳՁ֨

Slide 15

Slide 15 text

SpotInstanceͷಋೖʹ͍ͭͯߟ͑ͯΈͨ

Slide 16

Slide 16 text

KARTE͸αʔόϦιʔεΛଟ͘࢖͏αʔϏεͳͷ ͰɺSpotInstance࢖͑ͨΒخ͍͠ Ͱ΋ແ࣊൵ʹམͱ͞ΕΔͱ൵͍͜͠ͱʹɾɾ

Slide 17

Slide 17 text

ೖࡳֹۚΛondemandΑΓ΋͔ͳΓߴ͓͚ͯ͘͠͹མ ͱ͞ΕΔ͜ͱ͸ͳ͍͚Ͳɾɾ ondemandͷ໿10ഒͷ஋ஈ × 3೔ = ondemandͷ1ϲ݄ UnControllableͳͷ͸ා͍ɻΠϯελϯε୆਺΋݁ߏ ͋Δ

Slide 18

Slide 18 text

spotͷՁ͕֨ondemandͷՁ֨Λ௒͑ͨλΠϛϯάͰ ্ख͘ɺಉ͡ΠϯελϯελΠϓ͕ىಈ͢Δondemandͷ AutoscalingGroupʹ੾Γସ͑ΕͨΒ޾ͤ Autoscaling 㱻 SpotFleet ͷ࿈ܞ΋͠͹Β͘͞Εͳͦ͞͏ ͩͳɾɾ

Slide 19

Slide 19 text

࢓૊Έ͕ͳ͍ΜͩͬͨΒɺ ෳ਺ͷAutoscalingGroupΛ༻ҙ͠ɺ੾Γସ͑ͯ࢖͏ ࢓૊ΈΛ࡞ͬͯΈΑ͏ ͰɺspotInstanceΛੵۃతʹ࢖ͬͯ·͢

Slide 20

Slide 20 text

spot Group • ϝΠϯͰ࢖͏ • SpotInstance͕ىಈ • ೖࡳՁ֨͸ondemand ͷՁ֨ • αϒͰ࢖͏ • ondemandInstance ͕ىಈ ondemand Group ੾Γସ͑ static Group • ৗʹҰఆ୆਺͕ىಈ • ondemandInstance͕ ىಈ ༻ҙ͢Δͷ͸3छྨͷAutoscalingGroup

Slide 21

Slide 21 text

ondemand Group spot Group • αϒͰ࢖͏ • ondemandInstance ͕ىಈ ීஈ ೖࡳՁ֨ ≧ ࢢ৔Ձ֨ͷ࣌ static Group • ৗʹҰఆ୆਺͕ىಈ • ondemandInstance͕ ىಈ • ϝΠϯͰ࢖͏ • SpotInstance͕ىಈ • ೖࡳՁ֨͸ondemand ͷՁ֨

Slide 22

Slide 22 text

ondemand Group spot Group • αϒͰ࢖͏ • ondemandInstance ͕ىಈ ࢢ৔Ձ্֨ঢ࣌ ೖࡳՁ֨ ≧ ࢢ৔Ձ֨ → ೖࡳՁ֨ < ࢢ৔Ձ֨ static Group • ৗʹҰఆ୆਺͕ىಈ • ondemandInstance͕ ىಈ ੾Γସ͑ • ϝΠϯͰ࢖͏ • SpotInstance͕ىಈ • ೖࡳՁ֨͸ondemand ͷՁ֨

Slide 23

Slide 23 text

ondemand Group spot Group • αϒͰ࢖͏ • ondemandInstance ͕ىಈ ࢢ৔Ձ֨௿Լ࣌ ೖࡳՁ֨ < ࢢ৔Ձ֨ → ೖࡳՁ֨ ≧ ࢢ৔Ձ֨ static Group • ৗʹҰఆ୆਺͕ىಈ • ondemandInstance͕ ىಈ ੾Γସ͑ • ϝΠϯͰ࢖͏ • SpotInstance͕ىಈ • ೖࡳՁ֨͸ondemand ͷՁ֨

Slide 24

Slide 24 text

੾Γସ͑ํ๏ʹ͍ͭͯ ΋͏গ͠ৄ͘͠ʹ͍ͭͯ࿩͍͖ͯ͠·͢

Slide 25

Slide 25 text

ͦΕͧΕͷGroupʹॴଐ͢ΔEC2Πϯελϯε͕੾Γ ସ͑ઌͷASGΛૢ࡞ ondemand Group spot Group جຊతͳߟ͑ํ

Slide 26

Slide 26 text

ࢢ৔Ձ্֨ঢ࣌ͷ੾Γସ͑खॱ (ೖࡳՁ֨ ≧ ࢢ৔Ձ֨ → ೖࡳՁ֨ < ࢢ৔Ձ֨) ondemand Group spot Group

Slide 27

Slide 27 text

ondemand Group spot Group

Slide 28

Slide 28 text

- ೖࡳՁ֨ < ࢢ৔Ձ֨ →TerminationΛ໿̎෼ؒ଴ͬͯ͘ΕΔ - Πϯελϯε಺͔Βࣗ෼ࣗ਎͕Ձ্֨ঢͰམͱ͞ΕΔ͔Ͳ͏͔͕෼͔Δ - http://169.254.169.254/latest/meta-data/spot/termination-time →ɹ2016-07-05T17:00:00Z or 404 Not Found - ݁ߏૣ͘ݕ஌Ͱ͖Δ ೖࡳՁ֨

Slide 29

Slide 29 text

ondemand Group spot Group ᶃhttp://169.254.169.254/latest/meta-data/spot/termination-time Λ࢖ͬͯݕ஌ ᶃ

Slide 30

Slide 30 text

ondemand Group spot Group ᶃhttp://169.254.169.254/latest/meta-data/spot/termination-time Λ࢖ͬͯݕ஌ ᶄonDemandGroupͷSizeΛ૿΍͢ ᶄ

Slide 31

Slide 31 text

ondemand Group spot Group ᶃhttp://169.254.169.254/latest/meta-data/spot/termination-time Λ࢖ͬͯݕ஌ ᶄonDemandGroupͷSizeΛ૿΍͢ ᶅ spotInstance্ཱ͕͕ͪΒͳ͍Α͏ʹ͢Δ ᶅ

Slide 32

Slide 32 text

ೖࡳՁ֨ • SpotPrice͕ߴಅ͢Δ࣌͸ࢢ৔Ձ͕֨ෆ҆ఆͳ͕࣌ଟ͍ • ୹࣌ؒͰͷՁ֨ͷ௿ԼͰSpotInstanceͷىಈΛ൑அ͢Δͱɺ͙͢ʹՁ֨ߴಅ͙ͯ͢͠ ʹམͱ͞Εͯ͠·͏orz • SpotPriceߴಅ࣌͸ࢢ৔Ձ͕֨҆ఆ͢Δ·ͰSpotInstance্͕͕Βͳ͍Α͏ʹ͓ͯ͘͠

Slide 33

Slide 33 text

ondemand Group spot Group ᶃhttp://169.254.169.254/latest/meta-data/spot/termination-time Λ࢖ͬͯݕ஌ ᶄonDemandGroupͷSizeΛ૿΍͢ ᶅ spotInstance্ཱ͕͕ͪΒͳ͍Α͏ʹ͢Δ ᶆ spot→ondemandʹ੾ΓସΘΔ

Slide 34

Slide 34 text

ࢢ৔Ձ֨௿Լ࣌ (ೖࡳՁ֨ < ࢢ৔Ձ֨ → ೖࡳՁ֨ ≧ ࢢ৔Ձ֨) ͷ੾Γସ͑ ondemand Group spot Group

Slide 35

Slide 35 text

ondemand Group spot Group

Slide 36

Slide 36 text

ondemand Group spot Group ᶃೖࡳՁ͕֨ࢢ৔Ձ֨ΛҰఆظ্ؒճΔ͔Λఆظతʹcheck ᶃ

Slide 37

Slide 37 text

̍ճࢢ৔Ձ͕֨ߴಅ͢Δͱɺෆ҆ఆͳঢ়ଶ͕࢑͘ଓ͘͜ͱ͕ଟ͍ ୹࣌ؒͰ൑அͯ͠͠·͏ͱ͙͢ʹΠϯελϯε͕མͱ͞ΕΔϜμʹ ೖࡳՁ֨

Slide 38

Slide 38 text

ondemand Group spot Group ᶃೖࡳՁ͕֨ࢢ৔Ձ֨ΛҰఆظ্ؒճΔ͔Λఆظతʹcheck ᶄSpotGroupͷSizeΛ૿΍͢ ᶄ

Slide 39

Slide 39 text

ondemand Group spot Group ᶃೖࡳՁ͕֨ࢢ৔Ձ֨ΛҰఆظ্ؒճΔ͔Λఆظతʹcheck ᶄSpotGroupͷSizeΛ૿΍͢ ᶅSpotGroup͕ਖ਼ৗʹىಈͨ͠ͷΛ֬ೝͯ͠ɺondemandGroupͷsizeΛมߋ͢Δ ᶅ

Slide 40

Slide 40 text

ondemand Group spot Group ᶃೖࡳՁ͕֨ࢢ৔Ձ֨ΛҰఆظ্ؒճΔ͔Λఆظతʹcheck ᶄSpotGroupͷSizeΛ૿΍͢ ᶅSpotGroup͕ਖ਼ৗʹىಈͨ͠ͷΛ֬ೝͯ͠ɺondemandGroupͷsizeΛมߋ͢Δ ᶆondemand→spotʹ੾ΓସΘΔ

Slide 41

Slide 41 text

݁ہstaticGroupͬͯʁ static Group

Slide 42

Slide 42 text

ͻͱ͜ͱͰݴ͏ͱอݥ༻ spot→ondemandͷ੾Γସ͑ͷࡍʹɺΠϯελϯεͷىಈ࣌ؒʹΑͬ ͯ͸ɺActiveͳΠϯελϯε͕গͳ͘ͳΓఏڙαʔϏεͷSLAʹ͙ͦ Θͳ͘ͳͬͯ͠·͏͜ͱ΋ߟ͑ΒΕΔͷͰ ඞͣҰఆ୆਺͸ಈ͖ଓ͚ΔStaticGroupΛ༻ҙͯ͠όϥϯεΛऔͬͯ ͍Δ static Group • ৗʹҰఆ୆਺͕ىಈ • ondemandInstance͕ ىಈ

Slide 43

Slide 43 text

on demand Group spot Group • αϒͰ࢖͏ • ondemandInstance ͕ىಈ static Group • ৗʹҰఆ୆਺͕ىಈ • ondemandInstance͕ ىಈ 3छྨͷάϧʔϓΛ࢖͍෼͚ͯ·͢ ੾Γସ͑ • ϝΠϯͰ࢖͏ • SpotInstance͕ىಈ • ೖࡳՁ֨͸ondemand ͷՁ֨

Slide 44

Slide 44 text

AutoscalingͰࠔͬͨ͜ͱ/೰Έ

Slide 45

Slide 45 text

ᶃ:RateExceeded GUI͔ΒAutoscalingΛશ͘ૢ࡞Ͱ͖ͳ͘ͳͬͯ͠·ͬ ͨɻΠϯελϯεΛ૿΍͍ͨ͠ͷʹ૿΍ͤͳ͍ɾɾɾ why!? αϙʔτʹฉ͍ͯΈͨɻ

Slide 46

Slide 46 text

A. APIΛୟ͖͗͢ • spot㱻ondemandͷ੾Γସ͑ͷͨΊͷ؂ࢹ • deployલޙͷॲཧ • instanceىಈ࣌ • spotΠϯελϯεͷࢢ৔Ձ֨ͷDatadog΁ͷૹ෇࣌ • terraform࣮ߦ࣌ • DatadogͷAutoscaling Integration ɾɾɾɾetc

Slide 47

Slide 47 text

API Request੍͕ݶ͞ΕΔࡍͷɺ۩ମతͳ্ݶճ਺͸ڭ͑ͯ΋Β͑ͳ ͔ͬͨͷͰ • ෆཁͳIntegrationɺॲཧΛݟ௚͠ • ϦτϥΠ͸ΤΫεϙωϯγϟϧόοΫΦϑ ͪͳΈʹCloudWatchʹCustomMetricΛૹΔ࣌΋ಉ͡

Slide 48

Slide 48 text

AutoscalingͷτϦΨʔʹҾ͔͔ͬΔˠEC2Λىಈˠ࠷৽ͷCodeΛpullͯ͠ setup(CodeDeploy)→ΞϓϦىಈ ىಈ·ͰͲ͏ͯ͠΋਺෼͔͔ͬͯ͠·͏ - EC2ͷىಈ͕࣌ؒͦΜͳʹૣ͘ͳ͍(SpotInstanceͷىಈ࣌ؒ͸࠷ۙૣ͘ͳ͚ͬͨͲ) - εφοϓγϣοτ͔Β෮ݩ͞ΕͨEBSϘϦϡʔϜ΁ͷॳճΞΫηε࣌͸ύϑΥʔϚϯ ε͕Լ͕Δ ݱࡏ͸ɺ͋Δఔ౓༨༟Λ΋ͨͤͨcapacityઃܭ ᶄಥൃతͳෛՙͱ͸૬ੑѱ͍

Slide 49

Slide 49 text

ಥൃతͳෛՙʹ΋໰୊ͳ͘εέʔϧ͢Δͤ͞Δʹ͸ʁ(໛ࡧத) ᶃΠϯελϯεϨϕϧͰͷAutoscalingΛ࢖Θͳ͍ • ཪଆͰΑ͠ͳʹ΍ͬͯ͘ΕΔαʔόϨεͳαʔϏεͷGAEͱ͔Lambdaͱ͔ͦͷลΓ Λ্ख͔ͭ͘͏ʁ • αʔόϦιʔεΛ͔ͬ͠Γ࢖͏ॲཧ͸૬ੑ߹Θͳͦ͞͏ • ϑϧϚωʔδυαʔϏεͷϦεΫ • vs ୆਺ʹ༨༟Λ࣋ͨͤͨSpotInstanceͰͷӡ༻? ᶄΠϯελϯεͷىಈ࣌ؒͱcodeͷ൓ө࣌ؒΛ୹͘͢Δ - GCE? - ίϯςφ?

Slide 50

Slide 50 text

·ͱΊ • AutoscalingͷԸܙ͸ड͚ͯΔ • ؤுͬͯspotInstanceΛ࢖͍ͬͯΔ • AutoscalingAPIͷ͝ར༻͸ܭըతʹ • ·ͩ·ͩAutoscalingपΓͰվળͷ༨஍͋Γɻߋ ͳΔਐԽΛ໨ࢦͯ͠·͢

Slide 51

Slide 51 text

͓·͚

Slide 52

Slide 52 text

Termination Reason:Server.InternalError ͋Δ೔Πϯελϯε͕ىಈ͞Εͳ͘ͳͬͨ ͜ͷmessage͕ίϯιʔϧͷΞΫςΟϏςΟཤྺʹ Launching a new EC2 instance: i-xxxxxxx. Status Reason: Instance became unhealthy while waiting for instance to be in InService state. Termination Reason: Server.InternalError why!? αϙʔτʹฉ͍ͯΈͨɻ

Slide 53

Slide 53 text

A. EBSͷ߹ܭ࢖༻ྔ͕ΞΧ΢ϯτͷ্ݶ௒͑

Slide 54

Slide 54 text

• AutoscalingͰཱͯΔΠϯελϯε͕࣋ͭϦιʔ εͷ্ݶ஋ʹୡͨ࣌͠΋ɺىಈʹࣦഊ͢Δ • ࣦͨͩ͠ഊཧ༝͸Console͔Β෼͔Γʹ͍͘ (Termination Reason: Server.InternalError)