Slide 1

Slide 1 text

Scaling Telemetry Workloads in Cloud Applications: Techniques for Instrumentation, Storage, and Mining ژ౎େֶେֶӃ ৘ใֶݚڀՊ 2024೥12݄24೔ തֶ࢜Ґ࿦จ༧උ৹ࠪ ௶಺ ༎थ

Slide 2

Slide 2 text

0. ຊ࿦จͷ֓ཁ

Slide 3

Slide 3 text

3 ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά Ϋϥ΢υ ΞϓϦέʔγϣϯ ΦϖϨʔλʔ Ϣʔβʔ Πϯλʔωοτ ςϨϝτϦʔγεςϜ ֓ཁ ςϨϝτϦσʔλ ͷܭ૷ͱऩू ςϨϝτϦʔσʔλ ͷอଘͱ໰͍߹Θͤ ςϨϝτϦʔσʔλ ͷखಈɾࣗಈղੳ

Slide 4

Slide 4 text

4 ςϨϝτϦʔϫʔΫϩʔυͷ૿େ Ϋϥ΢υ ΞϓϦέʔγϣϯ ΦϖϨʔλʔ Ϣʔβʔ Πϯλʔωοτ ܭଌ ετϨʔδ ϚΠχϯά ϫʔΫϩʔυͷ૿େ ⾭ ςϨϝτϦʔγεςϜ ֓ཁ औΓࠐΈॲཧͱอ ଘͷϦιʔεফඅ ղੳॲཧͷϦιʔεফඅ ⾭ ⾭ ܭଌॲཧͷϦιʔεফඅ ΞϓϦॲཧͷ஗Ԇ ⾭ ⾭

Slide 5

Slide 5 text

5 ݚڀ໨తɿςϨϝτϦʔϫʔΫϩʔυεέʔϦϯά ςϨϝτϦʔγεςϜ Ϋϥ΢υ ΞϓϦέʔγϣϯ ΦϖϨʔλʔ Ϣʔβʔ Πϯλʔωοτ ܭଌ ετϨʔδ ϚΠχϯά ϫʔΫϩʔυͷ૿େ ςϨϝτϦʔϫʔΫϩʔυΛޮ཰తʹεέʔϦϯάͤ͞Δ ֓ཁ ⾭ औΓࠐΈॲཧͱอ ଘͷϦιʔεফඅ ղੳॲཧͷϦιʔεফඅ ⾭ ⾭ ܭଌॲཧͷϦιʔεফඅ ΞϓϦॲཧͷ஗Ԇ ⾭ ⾭

Slide 6

Slide 6 text

ຊ࿦จͷߩݙɿ̏छͷٕज़ఏҊ ςϨϝτϦʔγεςϜ Ϋϥ΢υ ΞϓϦέʔγϣϯ ΦϖϨʔλʔ Ϣʔβʔ Πϯλʔωοτ ܭଌ ετϨʔδ ϚΠχϯά Ϧιʔεফඅ Ϧιʔεফඅ γεςϜো֐ͷճ෮࣌ؒ ϚΠχϯάͷ஗Ԇ ϫʔΫϩʔυͷ૿େ ⾭ ⾭ ⾭ ߩݙ ᶃ ωοτϫʔΫ௨৴ͷOS಺ ͷܦ࿏্ͷܭଌॲཧͷΦʔ όϔουͷ௿ݮ ߩݙ ᶄ ҟछσʔλϕʔεͷ૊Έ߹Θ ͤʹΑΔσʔλͷऔΓࠐΈॲ ཧޮ཰ͱ௕ظอଘͷཱ྆ ߩݙ ᶅ લॲཧஈ֊Ͱͷσʔλྔ࡟ ݮʹΑΔࣗಈղੳͷਫ਼౓ͱ ଎౓ͷ޲্ ֓ཁ 6

Slide 7

Slide 7 text

7 1. ͸͡Ίʹ 2. OSΧʔωϧ಺ܭ૷๏ͷఏҊʢߩݙᶃʣ 3. ετϨʔδΞʔΩςΫνϟߏ੒๏ͷఏҊʢߩݙᶄʣ 4. ނোࣗಈಛఆͷલॲཧ๏ͷఏҊʢߩݙᶅʣ 5. ૯ׅ ޱ಄ൃදͷྲྀΕ

Slide 8

Slide 8 text

1. ͸͡Ίʹ (Chapter 1 and Chapter 2)

Slide 9

Slide 9 text

9 Ϋϥ΢υίϯϐϡʔςΟϯάͷීٴ Cloud ΦϯϥΠϯαʔϏεࣄۀऀ͸ Ϋϥ΢υ؀ڥʹΞϓϦέʔγϣϯΛ ߏங͠ɺΠϯλʔωοτΛհͯ͠ɺ ϢʔβʔʹαʔϏεΛఏڙɻ ɾιʔγϟϧωοτϫʔΩϯά ɾEίϚʔε ɾΦϯϥΠϯήʔϜ ɾϝσΟΞ഑৴ ɾϖΠϝϯτ ɾIoT ɾ… Applications Datacenters (Users) എܠ

Slide 10

Slide 10 text

10 Ϋϥ΢υΞϓϦέʔγϣϯͷجຊΞʔΩςΫνϟ Fig. 2.1 എܠ όοΫΤϯυ૚ ϏδωεϩδοΫॲཧ ϑϩϯτΤϯυ૚ σʔλϕʔεʢDBʣ ΫϥελʹΑΔϏδ ωεσʔλͷ؅ཧɻ

Slide 11

Slide 11 text

11 Ϋϥ΢υΞϓϦέʔγϣϯͷجຊΞʔΩςΫνϟ Fig. 2.1 എܠ ෛՙ෼ࢄͱ৑௕Խ ඇಉظॲཧ ༻్͝ͱͷ ҟछDBγεςϜ Մ༻ੑͱن໛֦ுͷͨΊͷෳ਺ͷٕ ज़͕ંΓॏͳΓෳࡶԽ͍ͯ͠Δɻ

Slide 12

Slide 12 text

12 Ϋϥ΢υΞϓϦέʔγϣϯͷجຊΞʔΩςΫνϟ എܠ ᶃ ᶄ ᶅ ᶆ ϦΫΤετॲཧͷܦ࿏ͷҰྫ Fig. 2.1 ϦΫΤετɾϨεϙϯεܕͷܗଶ TCP઀ଓΛऴ୺͠ͳ͕Βதܧ͢Δܗ

Slide 13

Slide 13 text

13 Ϋϥ΢υΞϓϦέʔγϣϯͷ৴པੑ എܠ ߴ৴པੑͷཁٻɿ24࣌ؒ365೔ͷՄ༻ੑɺ௿஗ԆԠ౴ͳͲɻ 1,819ݸͷγεςϜো֐ͷ͏ͪ47%͕ղܾ·Ͱʹ2࣌ؒҎ্ཁ͢Δɻ มߋىҼͷো֐ͷׂ߹͕શମͷ49.5%Λ઎ΊΔɻ [58] [13] ো֐ͷ Өڹ ো֐ͷ τϦΨʔ ɾ ΞϓϦέʔγϣϯίʔυ΍ઃఆϑΝΠϧɺج൫γεςϜͷมߋͳͲ ো֐ͷൃੜΛલఏʹӨڹΛ͍͔ʹ௿ݮ͢Δ͔ʹԠ͑ΔΞϓϩʔν ͕ීٴ͍ͯ͠Δɻ ΦϖϨʔλʔͷରԠ΋ؚΊͨϑΥʔϧττϨϥϯε͕ॏཁͰ͋Δɻ [14]

Slide 14

Slide 14 text

14 ɾ಺ଆͷނোʢFaultʣͷӨڹ͕Α Γ֎ଆ·Ͱ೾ٴ͢Δ ɾ֤૚ͷϑΥʔϧττϨϥϯεػ ߏʹΑΓɺͦͷ೾ٴΛ཈͑Δ ɾຊ࿦จͰ͸࠷֎૚ʹண໨ 3૚ͷϑΥʔϧττϨϥϯε Fig. 2.2: [60]ͷFigure 1-1Λجʹվม എܠ ΦϖϨʔλʔ͕γεςϜͷ ৼΔ෣͍Λ஌ΔͨΊʹɺ ςϨϝτϦʔ͕ඞཁɻ

Slide 15

Slide 15 text

15 ओཁͳςϨϝτϦʔσʔλ Time-oriented Path-oriented ਺஋ʢϝτϦΫεʣ จࣈྻʢϩάʣ τϨʔε ͋Δ࣌఺ͰͷγεςϜͷੑೳΛఆྔత ʹଌఆͨ͠஋ɻ ݻఆִ࣌ؒؒͰαϯϓϦϯά͞ΕΔɻ ྫʣ CPUར༻཰ɺϦΫΤετԠ౴࣌ؒ γεςϜ಺Ͱൃੜ͢ΔΠϕϯτͷඇߏ ଄Խ͞ΕͨจࣈྻʹΑΔه࿥ ྫʣΤϥʔϝοηʔδɺϢʔβʔΞΫ ςΟϏςΟɺγεςϜૢ࡞ͳͲ γεςϜ಺Λ௨ա͢ΔҰ࿈ͷॲཧ΍௨৴ ͷྲྀΕΛදݱ͢Δߏ଄Խ͞Εͨσʔλɻ എܠ ಛʹωοτϫʔΫ௨৴ʹؔΘΔτϨʔε ɾ্Ґ૚ɿϦΫΤετཻ౓ ɾԼҐ૚ɿϑϩʔཻ౓ ߩݙᶄͱᶅ ߩݙᶃ

Slide 16

Slide 16 text

16 ओཁͳςϨϝτϦʔσʔλʢϝτϦΫεʣ Time-oriented Topology-oriented Data ਺஋ʢϝτϦΫεʣ จࣈྻʢϩάʣ τϨʔε ͋Δ࣌఺ͰͷγεςϜͷੑೳΛఆྔత ʹଌఆͨ͠஋ɻ ݻఆִ࣌ؒؒͰαϯϓϦϯά͞ΕΔɻ ྫʣ CPUར༻཰ɺϦΫΤετԠ౴࣌ؒ ྫʣΤϥʔϝοηʔδɺϢʔβʔΞΫ ςΟϏςΟɺγεςϜૢ࡞ͳͲ - ϦΫΤετཻ౓ʢΞϓϦ૚ʣ - ϑϩʔ·ͨ͸ύέοτཻ౓ʢΠϯϑϥ૚ʣ γεςϜ಺Λ௨ա͢ΔҰ࿈ͷॲཧ΍௨৴ ͷྲྀΕΛදݱ͢Δߏ଄Խ͞Εͨσʔλย ͷू߹ എܠ cpu_seconds{instance=host1,…} λΠϜελϯϓͱ஋ͷ૊ͷ഑ྻͰදݱ͞ΕΔ ྫɿ[(1709298600, 29851.26), …] γεςϜ಺Ͱൃੜ͢ΔΠϕϯτͷඇߏ ଄Խ͞ΕͨจࣈྻʹΑΔه࿥ɻ

Slide 17

Slide 17 text

17 ओཁͳςϨϝτϦʔσʔλʢτϨʔεʣ Path-oriented τϨʔε γεςϜ಺Λ௨ա͢ΔҰ࿈ͷॲཧ΍௨৴ ͷྲྀΕΛදݱ͢Δߏ଄Խ͞Εͨσʔλ എܠ ಛʹωοτϫʔΫ௨৴ʹؔΘΔτϨʔε ɾ্Ґ૚ɿϦΫΤετཻ౓ ɾԼҐ૚ɿϑϩʔཻ౓ B C D A ίʔϧάϥϑ 10.0.10.1:80 10.0.20.1:3306 listen port 80 3306 9200 9092 10.0.30.1:9200 10.0.40.1:9092 ʢຊݚڀର৅֎ʣ

Slide 18

Slide 18 text

18 ςϨϝτϦʔγεςϜ ܭଌ૚ ʢInstrumentationʣ ετϨʔδ૚ ʢStorageʣ ϚΠχϯά૚ ʢMiningʣ ຊݚڀͰ͸̏֊૚ʹ ෼ྨ͢Δɻ എܠ

Slide 19

Slide 19 text

19 ΞϓϦέʔγϣϯγ εςϜʹηϯαʔ͕ ૊Έࠐ·ΕΔɻ தԝͷετϨʔδ΁ σʔλ͕ૹ৴͞ΕΔɻ ςϨϝτϦʔγεςϜɿܭଌʢInstrumentationʣ എܠ

Slide 20

Slide 20 text

20 ϚΠχϯά૚͔ΒDB ʹඞཁͳσʔλ͕໰ ͍߹Θͤ͞ΕΔɻ ૹ৴͞Εͨσʔλ͸ DBγεςϜʹऔΓࠐ· ΕΔɻ ςϨϝτϦʔγεςϜɿετϨʔδʢStorageʣ എܠ

Slide 21

Slide 21 text

21 ςϨϝτϦʔγεςϜɿϚΠχϯάʢMiningʣ ՄࢹԽ͞ΕͨϏϡʔ ͱҟৗͷൃੜΛࣔ͢ ΞϥʔτΛఏڙɻ ػցֶशʹΑΔσʔλͷࣗ ಈղੳثΛ௨ͯ͠ΦϖϨʔ λʔͷෛ୲Λ௿ݮɻ ʢߩݙᶅͷର৅ʣ എܠ ࣗಈϚΠχϯά खಈϚΠχϯά

Slide 22

Slide 22 text

22 ɾΞϓϦέʔγϣϯͷϫʔΫϩʔυɺ͓Αͼɺίϯϙʔωϯτ਺ͷ૿େ ɾΑΓਫ਼៛ͳγεςϜཧղͷͨΊͷςϨϝτϦʔσʔλͷࡉཻ౓Խ ςϨϝτϦʔϫʔΫϩʔυͷ૿େ എܠ ܭଌ ϚΠχϯά ɾܭଌ஋ͷసૹɾू໿ॲཧʹ ཁ͢ΔϦιʔεফඅͷ૿େ ɾΞϓϦέʔγϣϯͷॲཧ஗ Ԇ૿େ ܭଌɾૹ৴ॲཧྔͷ૿େ ετϨʔδ σʔλऔΓࠐΈྔͷ૿େ ɾॻ͖ࠐΈॲཧͷϦιʔε ফඅͷ૿େ ɾσΟεΫอଘྖҬͷ૿େ ɾಡΈࠐΈॲཧͷϦιʔε ফඅͱ஗Ԇͷ૿େ ֶशॲཧྔͷ૿େ ɾϞσϧग़ྗͷਫ਼౓௿Լ ɾֶशॲཧͷ࣮ߦ࣌ؒͱ Ϧιʔεফඅྔͷ૿େ ཁҼ

Slide 23

Slide 23 text

23 ςϨϝτϦʔγεςϜ͕΋ͨΒ͢ӡ༻ͷෳࡶ͞ ల։༰қੑ ϝϯςφϯε༰қੑ ɾαʔϏεࣄۀऀ͸ΞϓϦέʔγϣϯʹՃ͑ͯςϨϝτϦʔγεςϜ΋ӡ༻ ͢Δඞཁ͕͋Δɻ ɾӡ༻ෳࡶੑΛ཈͑Δ͜ͱ͸࣮༻ԽͷͨΊʹॏཁͰ͋Δɻ ܭଌ ϚΠχϯά ετϨʔδ खಈʹΑΔܭ૷࡞ۀ DBγεςϜͷߏஙɺઃఆɺνϡʔ χϯάɺόοΫΞοϓͷ࡞ۀෛ୲ σʔληοτͷखಈϥϕϦϯά Ϟσϧͷύϥϝʔλνϡʔχϯά σʔλ෼෍ಛੑͷมԽʹΑΔਫ਼౓௿ Լ΁ͷରԠʢ࠶ֶशɾ࠶νϡʔχϯ άͳͲʣ ܭ૷ݩͷίʔυมߋ΁ͷ௥ै ن໛֦ுͷ࡞ۀ΍ɺόʔδϣϯ Ξοϓɺ࠶νϡʔχϯά എܠ

Slide 24

Slide 24 text

ݚڀ໨త

Slide 25

Slide 25 text

25 ݚڀ໨త ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ ݚڀ໨త ϫʔΫϩʔυ ςϨϝτϦʔϫʔΫϩʔυͷ૿େʹ ର֤ͯ͠૚͝ͱʹޮ཰తʹεέʔϦ ϯά͢Δٕज़ΛఏҊ͢Δ ӡ༻ෳࡶੑͷ૿ՃΛ཈͑Δ৚݅ԼͰ Ϧ ι ʛ ε ফ අ ॲ ཧ ஗ Ԇ

Slide 26

Slide 26 text

26 ຊݚڀΛ၆ᛌͨ͠ਤ (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త

Slide 27

Slide 27 text

27 ຊݚڀΛ၆ᛌͨ͠ਤ (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ Mining ΦϖϨʔλʔ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ ςϨϝτϦʔ ϫʔΫϩʔυͷ૿େ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏

Slide 28

Slide 28 text

28 ຊݚڀΛ၆ᛌͨ͠ਤ (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ Mining ΦϖϨʔλʔ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ εέʔϦϯάٕज़ ͷఏҊ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏

Slide 29

Slide 29 text

29 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented Telemetry System ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, Mar 2022. ӡ༻ෳࡶੑ ܭ૷ͷͨΊͷΞϓϦέʔγϣϯίʔυ ͷमਖ਼Λෆཁͱ͢Δ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏

Slide 30

Slide 30 text

30 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ Mining ΦϖϨʔλʔ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछ෼ࢄKVSؒͷࣗಈ ֊૚ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ৘ใॲཧֶձ࿦จࢽ, Vol.62, No.3, pp.818-828, 2021೥3݄. ӡ༻ෳࡶੑ ஌ࣝɾ࣮૷ͷྲྀ༻ੑͷߴ͍ଟ໨తͷ DBγεςϜͷൣғ಺Ͱղܾ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏

Slide 31

Slide 31 text

31 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented Telemetry System ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔτϨʔγϯάͷܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త ӡ༻ෳࡶੑ ϥϕϦϯάͱϞσϧͷ܇࿅͕ෆཁͳ ڭࢣͳֶ͠शͷ࿮૊ΈͰղܾɻ ύϥϝʔλͷมԽʹରͯ͠ؤڧͳઃܭ ͱ͠ɺνϡʔχϯάͷෛ୲Λ௿ݮɻ ܭଌ

Slide 32

Slide 32 text

2. OSΧʔωϧ಺ܭ૷๏ͷఏҊʢߩݙᶃʣ ܭଌ૚ (Chapter 3)

Slide 33

Slide 33 text

33 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, Mar 2022. ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ

Slide 34

Slide 34 text

34 ωοτϫʔΫίʔϧάϥϑ എܠ ैདྷ͸खಈͰͷ࡞ਤ͕ඞཁͰ͋ͬͨ ͕ɺ࠷ۙͰ͸Path-oriented dataΛجʹ ࣗಈԽ͞Εͭͭ͋Δɻ Cloud Load Balancers Database Clusters Web app servers Message queues ֤ίϯϙʔωϯτͷݺͼग़ؔ͠܎ Λ஌Γ͍ͨɻ L7: ϦΫΤετ਺,Τϥʔ਺,Ԡ౴࣌ؒ… L4: ૹ৴ɾड৴Bytes/s, RTT, … - มߋͷӨڹൣғΛ஌Γ͍ͨɻ - ϦϯΫ୯ҐͷϝτϦΫεΛ஌Γ͍ͨɻ

Slide 35

Slide 35 text

35 Path-oriented dataͷܭ૷Ξϓϩʔν طଘख๏ Kernel User Proxy Network Stack Comm Library App NIC Application-intrusive ΞϓϦέʔγϣϯίʔυʹܭ૷͢Δɻ Application-non-intrusive ΞϓϦέʔγϣϯҎ֎ͷՕॴʹܭ૷ɻ Switch ωοτϫʔΫ௨৴ܦ࿏্ͷ͍ͣΕ͔ʹܭଌ఺Λઃஔ͢Δɻ ར఺ɿΞϓϦͷίϯςΩετΛ஫ೖՄɻ ܽ఺ɿίʔυ௥Ճͷ࿑ྗ͕େ͖͍ɻ ར఺ͱܽ఺͸App-intrusiveͱٯɻ Χʔωϧͷ্Ґ૚ʢιέοτʣͰͷܭ૷ʹண໨ɻ ରProxy: தܧΦʔόʔϔου͕ͳ͍ɻ ରSwitch: ܭଌෛՙΛΤϯυϗετʹ෼ࢄՄೳɻ

Slide 36

Slide 36 text

ιέοτ૚ʹ͓͚Δܭ૷ख๏ Kernel User Service Agent ετϦʔϛϯά๏ ϑϩʔू໿๏ ϑϩʔूଋ๏ʢఏҊʣ ✗ ϝοηʔδ਺૿ՃʹԠ ͯ͡ɺϢʔβۭؒ΁ͷܭ ଌ஋ͷసૹ਺͕૿Ճɻ ✗ ୹໋ͳϑϩʔ͕૿Ճ͢Δͱɺ సૹσʔλ਺΋૿Ճɻ Ѽઌ͑͞ಉҰͳΒ͹ͦ ΕΒͷϑϩʔΛଋͶΔ ※ ϑϩʔ = ྆୺ͷΞυϨεͱϙʔτͷ૊͕ಉҰͷ௨৴୯Ґ ݚڀͷҐஔ ͚ͮ Queue ܭଌ఺ Kernel User Service Agent ܭଌ఺ ※ ໼ҹ͸σʔλͷྲྀΕΛද͢ ✔ ϑϩʔ͝ͱʹू໿͞Εͨܭ ଌ஋ͷΈอଘɻసૹσʔλ਺ Λ௿ݮɻ Flow1 Flow2 Flow3 Flow4 Kernel User Service Agent ܭଌ఺ ✔ ୹໋ͳϑϩʔ਺͕ଟ͘ ͱ΋సૹσʔλ਺Λ௿ݮ Bundle 1 Bundle 2 ✔ ܭଌΦʔόʔϔου ͕খ͍͞ ([93,94]) ([27,95])

Slide 37

Slide 37 text

37 ߩݙᶃͷ֓ཁ 1. ୹໋ͳϑϩʔ͕ଟ͍؀ڥʹ͓͍ͯ΋ɺܭଌΦʔόʔϔουΛ௿ݮͤ͞Δ Χʔωϧ಺ϑϩʔूଋ๏ΛఏҊ͢Δɻ 2. ϑϩʔ਺͕૿େͨ͠ͱͯ͠΋ɺܭଌΦʔόʔϔουʢCPUෛՙʣ͕े෼ʹ খ͘͞ͳΔ͜ͱΛݕূͨ͠ɻ طଘख๏ʹෆརͳ؀ڥ Web App Servers DB Server PHPΞϓϦέʔγϣϯͰ͸ɺϦιʔεͷ ཚ༻Λ๷͙ͨΊʹDB΁ͷӬଓతͳ઀ଓ ͕ਪ঑͞Εͳ͍͜ͱ͕͋Δ[98] ղܾ ϑϩʔ͕࣋ଓ͞Εͣɺ୹໋ͳϑϩʔ͕૿େ͢Δɻ Connections ߩݙ

Slide 38

Slide 38 text

38 ϑϩʔͷूଋͷ֓೦ Host 1 Host 2 ఏҊख๏ 53421 32346 48901 Service Service Listen port 80 Ephemeral port Flow 1 Flow 2 Flow N Service Service 80 1ຊͷଋͶΒΕͨϑϩʔͱΈͳ͢

Slide 39

Slide 39 text

39 Χʔωϧ಺ͰͷҟͳΔϑϩʔͷूଋ ఏҊख๏ ϑϩʔूଋ๏ʢఏҊʣ Kernel User Service Agent NIC ܭଌ఺ Bundle 1 Bundle 2 "src_ip": "192.168.1.101", "src_port": 53421, "dst_ip": "192.168.1.200", “dst_port": 80, “recv_bytes”: 2000, “send_bytes”: 500, "src_ip": "192.168.1.101", "src_port": 61390, "dst_ip": "192.168.1.200", "dst_port": 80, “recv_bytes": 1000, “sent_bytes”: 100, Flow 1 Flow 2 Bundle 1 "src_ip": "192.168.1.101", "dst_ip": "192.168.1.200", “dst_port": 80, “recv_bytes”: 3000, “sent_bytes”: 600, ephemeral portΛ ࡟আͯ͠Ϛʔδ ਺஋σʔλ͸౷ܭॲཧ͞ΕΔ ʢྫͰ͸૯࿨ΛͱΔʣ

Slide 40

Slide 40 text

40 ࣮૷ͷུ֓ਤ Hash map Kernel User Service Socket Layer Agent tcp_v4_connect() inet_csk_accept() tcp_sendmsg() tcp_cleanup_rbuf() ʢUDPলུʣ ఏҊख๏ {src_addr, dst_addr, listen_port, proto, pid} NIC Keys Values {counts, recv_bytes, send_bytes, …} System Call ܭଌϓϩάϥϜ1 ܭଌϓϩάϥϜ2 ܭଌϓϩάϥϜ3 ܭଌϓϩάϥϜ4 LinuxͷkprobeͰΧʔωϧ ؔ਺ʹΞλον͢Δ Linuxͷ extended Barkley Packet Filter (eBPF) Λ༻͍ͯΧʔωϧΛ֦ுΛ͢Δɻ Mapߏ଄ମΛߋ৽ Batch APIʹΑΓෳ਺ ΞΠςϜΛఆظऔಘ

Slide 41

Slide 41 text

41 ධՁͷઃఆ ධՁ ϕϯνϚʔΫ ϕʔεϥΠϯ ධՁ߲໨ Client Server Agent Agent ɾ ΤίʔΫϥΠΞϯτɾαʔόʹΑΓTCP·ͨ͸ UDPͷ௨৴ෛՙΛൃੜͤ͞Δɻ ɾ Ұճͷࢼߦ͸30ඵɺόονऔಘස౓͸1ඵ ɾ Χʔωϧͷιέοτ૚Λର৅ͱͨ͠طଘͷܭ૷ख๏ ɾ ετϦʔϛϯά๏ ɾ Χʔωϧ಺ू໿๏ 1. ୹໋ϑϩʔ਺ͷ૿େʹର͢ΔCPUෛՙͷൺֱ 2. 1ରNͷ௨৴؀ڥʹ͓͚ΔCPUෛՙͷൺֱ 3. ΞϓϦέʔγϣϯͷRTTΦʔόʔϔου VM VM

Slide 42

Slide 42 text

42 1. ୹໋ͳTCPϑϩʔ਺ͷ૿େʹର͢ΔCPUෛՙͷൺֱ ఏҊख๏ ɾ2.2%ҎԼͷCPUར༻཰Λҡ࣋ɻ ධՁ ετϦʔϛϯά๏ ࠷େ21.3%·ͰCPUར༻཰͕૿Ճɻ Χʔωϧ಺ू໿๏ ࠷େ11.5%·ͰCPUར༻཰͕૿Ճɻ UDPϝοηʔδϨʔτ͕૿େ͢Δ࣮ݧʹͭ ͍ͯ΋ྨࣅͷ݁Ռ͕ಘΒΕͨɻ

Slide 43

Slide 43 text

43 2. ௨৴ઌͷݸ਺Λ૿Ճͨ࣌͠ͷCPUෛՙ ҟͳΔ଴ͪड͚ϙʔτΛ΋ͭ௨৴ઌ͕૿͑Δͱɺूଋ཰͕௿Լ͢Δɻ ↪ ఏҊख๏ͷCPUෛՙ͕૿Ճ͢Δ͸ͣ…ʁ ूଋ཰ : ଋͶΒΕΔϑϩʔ਺ : ߹ܭϑϩʔ਺ R = 1 − B/T B T ධՁ R=0.90 R=0.94 R=0.98 ௨৴ઌͷ਺Ͱ ܾ·Δ ݻఆ T = 10k αʔϏε਺ʢ௨৴ઌʣͷ૿Ճʹର͠ ͯɺCPUར༻཰͸2%ҎԼΛҡ࣋ͨ͠ɻ ·Ͱ૿Ճͤ͞ΔͱR=0ͱͳΓɺ طଘख๏΁ͷ༏Ґੑ͸ͳ͘ͳΔɻ T = 100k

Slide 44

Slide 44 text

44 3. ܭଌॲཧ͕༩͑Δ஗ԆΦʔόϔουͷൺֱ TCP୹໋઀ଓ UDP RTT 300μs ʹରͯ͠ɺఏҊख๏ͷΦʔόϔου͸࠷େͰ΋ 5.8 μsɻ ແܭ૷ͱൺ΂ɺߴʑ2%ͷΦʔόϔου૿Ճʹཹ·Δɻ ධՁ ετϦʔϛϯά๏͕ ࠷খͷRTTΛࣔͨ͠ɻ

Slide 45

Slide 45 text

45 ୈ̎෦ ߩݙᶃ ·ͱΊ ·ͱΊ (Chapter 3) Path-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔܭ૷๏ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ධՁɿ୹໋ϑϩʔ਺ͷ૿Ճʹରͯ͠ɺఏҊ๏͸ 2.2%ҎԼͷCPUར༻཰Λҡ࣋ͨ͠ɻ ແܭ૷ঢ়ଶʹରͯ͠RTTΦʔόʔϔου͸ߴʑ 2%૿Ճʹཹ·ͬͨɻ ༻్ɿωοτϫʔΫίʔϧάϥϑΛܧଓతʹࣗ ಈߏங͢Δɻ

Slide 46

Slide 46 text

3. ετϨʔδΞʔΩςΫνϟߏ੒๏ͷఏҊ ʢߩݙᶄʣ (Chapter 4) ετϨʔδ૚

Slide 47

Slide 47 text

47 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔτϨʔγϯάͷܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछ෼ࢄKVSؒͷࣗಈ ֊૚ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ৘ใॲཧֶձ࿦จࢽ, Vol.62, No.3, pp.818-828, 2021೥3݄. ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ݚڀ໨త

Slide 48

Slide 48 text

48 ϝτϦΫεͷऔΓࠐΈϫʔΫϩʔυྔ͸ɺ̎ͭͷ࣍ݩʹൺྫ͢Δ ϝτϦΫεετϨʔδͷϫʔΫϩʔυ ᶄ ϝ τ Ϧ Ϋ ε ͷ ݸ ਺ Time ᶃ ղ૾౓ (Ұൠʹ1 ~ 60ඵͷൣғ) cpu_seconds{instance=host1,…} memory_total_bytes{instance=host1,…} http_requests_count{instance=host1,…} http_requests_count{instance=host99,…} എܠ

Slide 49

Slide 49 text

49 ϝτϦΫεετϨʔδͷεέʔϥϏϦςΟཁٻ औΓࠐΈॲཧεϧʔϓοτ ετϨʔδ༰ྔ Slackɿ4B time series / day 12M datapoints / sec Metaɿ700M datapoints / min Slack: 12 TB / day ByteDance: 10 TB / day Mackerel: 460 days [19] [32] [19] [35] [66] σʔλѹॖٕज़΍هԱίετͷ௿͍ ϝσΟΞ΁ͷ௕ظอଘʢSSD/HDDʣ എܠ LYCorp: 2.7 TB / day LYCorp: 12.5M datapoints / minʢ׵ࢉ) [108] [108] ɾ ਫฏ෼ׂ͞Εͨෳ਺ϊʔυͰͷऔΓࠐΈ ɾ ϝϞϦ্ͷσʔλߏ଄΁ͷޮ཰తͳॻ͖ ࠐΈ ղܾ ղܾ

Slide 50

Slide 50 text

50 طଘख๏ͷ෼ྨ ࣌ܥྻDB؅ཧγεςϜํࣜ ʢTSDBMSʣ Client DBMS ؔ࿈ݚڀ ࣌ܥྻσʔλࢦ޲ΞϓϦέʔγϣϯํࣜ ʢTSDAʣ App DBMS Client ଟ໨తͳDBγεςϜͰ͋ΔKVSͷ্ʹߏ ங͞ΕΔɻ (OpenTSDB, KairosDB) KVS: Ωʔͱ஋ͷϖΞͷू߹ͱͯ͠ σʔλΛอଘɺݕࡧɺ؅ཧՄೳͳ DBMSɻ Transaction Transaction ࣌ܥྻσʔλॲཧʹ࠷దԽ͞ΕͨDBMSɻ λΠϜελϯϓͷ౳ִؒੑɺ஋ͷ࣌ ؒతۙ઀ੑʹண໨ͨ͠ූ߸Խɻ ѹॖ ߏ଄ σΟεΫϕʔεKVSͰ༻͍ΒΕΔLSMπ ϦʔΛجʹ࣌ܥྻߏ଄ʹ࠷దԽɻݻఆ ͷ࣌ؒ࿮͝ͱʹϑΝΠϧ؅ཧ͞ΕΔɻ (Prometheus, Gorilla, InfluxDBͳͲ)

Slide 51

Slide 51 text

51 طଘख๏ͷ෼ྨ ࣌ܥྻDB؅ཧγεςϜํࣜ ʢTSDBMSʣ DBMS ؔ࿈ݚڀ ࣌ܥྻσʔλࢦ޲ΞϓϦέʔγϣϯํࣜ ʢTSDAʣ App DBMS Client ଟ໨తͳDBγεςϜͰ͋ΔKVSͷ্ʹߏ ங͞ΕΔɻ (OpenTSDB, KairosDB) KVS: Ωʔͱ஋ͷϖΞͷू߹ͱͯ͠ σʔλΛอଘɺݕࡧɺ؅ཧՄೳͳ DBMSɻ Transaction ࣌ܥྻσʔλॲཧʹ࠷దԽ͞ΕͨDBMSɻ λΠϜελϯϓͷ౳ִؒੑɺ஋ͷ࣌ ؒతۙ઀ੑʹண໨ͨ͠ූ߸Խɻ ѹॖ ߏ଄ σΟεΫϕʔεKVSͰ༻͍ΒΕΔLSMπ ϦʔΛجʹ࣌ܥྻߏ଄ʹ࠷దԽɻݻఆ ͷ࣌ؒ࿮͝ͱʹϑΝΠϧ؅ཧ͞ΕΔɻ (Prometheus, Gorilla, InfluxDBͳͲ) KVS͸޿͘ར༻͞Ε͍ͯΔɻ DBӡ༻ΛࣗಈԽ͢ΔͨΊͷ ʮDB as a Serviceʯͱͯ͠KVS αʔϏε͕޿͘ఏڙ͞Ε͍ͯ Δɻ ӡ༻ෳࡶੑΛߟྀ͠ɺ TSDAํࣜʹண໨ TSDAํࣜ͸ૄ݁߹ੑ͕͋Δͨ Ίɺར༻ऀʹDBMS࣮૷ͷબ୒ ࢶΛఏڙՄೳɻ

Slide 52

Slide 52 text

52 KVSͷऔΓࠐΈޮ཰ ϝϞϦϕʔεKVS ϝϞϦ͸ϥϯμϜΞΫ ηεޮ཰ʹ༏ΕΔͨ ΊɺϋογϡදΛ࠾༻ ؔ࿈ݚڀ σΟεΫϕʔεKVS ϝτϦΫε਺͕૿େ͢Δ = KVSͷΩʔ਺͕૿େ͢Δ ↳ σʔλΛ௥Ճ͢Δ࣌ͷΠϯσοΫεࢀরޮ཰͕໰୊ͱͳΔ Memory Disk ฏߧ໦ɾεΩο ϓϦετͳͲͷ ιʔτࡁΈߏ଄ ιʔτࡁΈͷͨ ΊσΟεΫΞΫ ηεޮ཰͕ߴ͍ O(logn) ॻ͖ࠐΈ Flush ॻ͖ࠐΈ Memory O(k) σΟεΫ্ʹ͸σʔλ Λอ࣋͠ͳ͍ɻ ʢίϛοτϩάΛআ͘ʣ Disk File

Slide 53

Slide 53 text

53 KVSͷऔΓࠐΈޮ཰ ϝϞϦϕʔεKVS ϝϞϦ͸ϥϯμϜΞΫ ηεޮ཰ʹ༏ΕΔͨ ΊɺϋογϡදΛ࠾༻ ؔ࿈ݚڀ σΟεΫϕʔεKVS ϝτϦΫε਺͕૿େ͢Δ = KVSͷΩʔ਺͕૿େ͢Δ ↳ σʔλΛ௥Ճ͢Δ࣌ͷΠϯσοΫεࢀরޮ཰͕໰୊ͱͳΔ Memory Disk ฏߧ໦ɾεΩο ϓϦετͳͲͷ ιʔτࡁΈߏ଄ ιʔτ͞Ε͍ͯ ΔͨΊσΟεΫ ΞΫηεޮ཰͕ ߴ͍ O(logn) ॻ͖ࠐΈ Flush ॻ͖ࠐΈ Memory O(k) σΟεΫ্ʹ͸σʔλ Λอ࣋͠ͳ͍ɻ ʢίϛοτϩάΛআ͘ʣ Disk ✘ ϝϞϦ͸هԱྔ͋ͨΓͷඅ༻͕େ ͖͍ͨΊɺ௕ظอ࣋ʹ͸ෆ޲͖ɻ ✘ Ωʔ਺͕େ͖͍࣌ʹɺσʔλͷॻ ͖ࠐΈޮ཰͕௿Լ͢Δɻ

Slide 54

Slide 54 text

54 ߩݙᶄͷ·ͱΊ औΓࠐΈॲཧޮ཰ͱ௕ظอଘͷཱ྆ ࣌ܥྻσʔλࢦ޲ΞϓϦέʔγϣϯʢTSDAʣ ࣌ܥྻDB؅ཧ γεςϜ ʢTSDBMSʣ σΟεΫϕʔε ఏҊख๏ ӡ༻ ෳࡶੑ औΓࠐΈ ޮ཰ ετϨʔδ ༰ྔ ࣌ܥྻѹॖͳͲ ࣌ܥྻσʔλ อଘʹ࠷దԽ ஌ࣝͱ࣮૷ͷ ྲྀ༻ੑ͕௿͍ SSD/HDDอଘ σΟεΫΞΫη εޮ཰Λߟྀ ͨ͠ߏ଄ ϥϯμϜΞΫηεޮ཰ʹ༏Εͨ ϝϞϦʹ࠷దԽ ݹ͍σʔλͷΈ SSD/HDDอଘ ஌ࣝͱ࣮૷ͷ ྲྀ༻ੑ͕ߴ͍ ϝϞϦϕʔε ϝϞϦอଘ ߩݙ ɾӡ༻ෳࡶੑͷ௿͍TSDAํࣜͰɺϝϞϦɾσΟεΫϕʔεͷ֤ಛੑΛ ྆औΓ͢ΔΞʔΩςΫνϟΛઃܭͨ͠ɻ ɾσΟεΫϕʔεͷํࣜͱൺֱ͠ɺ3.98ഒͷऔΓࠐΈੑೳΛୡ੒ͨ͠ɻ ߩݙ

Slide 55

Slide 55 text

55 ఏҊख๏ HeteroTSDB Client ఏҊख๏ ϝϞϦϕʔεKVS σΟεΫϕʔεKVS App Flusher ௚ۙͷλΠϜελϯϓΛ΋ͭσʔ λ͕֨ೲ͞ΕΔϝϞϦόοϑΝ ϋογϡදʹجͮ͘ߴ଎औΓࠐΈ ݹ͍λΠϜελϯϓΛ΋ͭσʔλ͕ ֨ೲ͞ΕΔσΟεΫετϨʔδ SSD/HDDʹอଘ͢Δ͜ͱʹΑΔ ௕ظอ࣋ίετͷ௿Լ σʔλͷϚΠά Ϩʔγϣϯ ཱ྆

Slide 56

Slide 56 text

56 ϝϞϦϕʔεKVSͱσΟεΫϕʔεKVSͷ֊૚Խ ϝϞϦϕʔεKVS σΟεΫϕʔεKVS ϋογϡද O(k) ฏߧ໦ɾεΩοϓϦετ O(logn) ౸ண dݸͷσʔλ఺Λόονॻ͖ࠐΈ ʹΑΓɺऔΓࠐΈճ਺Λ࡟ݮ M (ingestions/s) M / d (ingestions/s) cpu_seconds{…} cpu_seconds{…} memory_total_bytes{…} http_requests_count{…} dݸ Lookup Insert Lookup memory_total_bytes{…} http_requests_count{…} ఏҊख๏

Slide 57

Slide 57 text

57 λΠϚʔʹجͮ͘ϚΠάϨʔγϣϯ ϝϞϦϕʔεKVS σΟεΫϕʔεKVS cpu_seconds{…} cpu_seconds{…} memory_total_bytes{…} http_requests_count{…} memory_total_bytes{…} http_requests_count{…} 3511 934 298 TTL ɾΩʔ͝ͱʹTTLʢTime To LiveʣΛઃఆ͠ɺTTL͕0ʹͳΕ͹Ҡಈͤ͞Δ ɾTTLηοτ࣌ʹδολʔΛՃ͑ɺҠಈͷλΠϛϯάΛ෼ࢄͤ͞Δ όονॲཧʹΑΔσʔλҠಈ͸ɺσΟεΫϕʔεKVS΁ͷऔΓࠐΈෛՙ͕ภΔ ఏҊख๏ ʢྫɿ3600ඵʣ

Slide 58

Slide 58 text

58 ɾ طଘͷෛՙੜ੒πʔϧ[113]Λ༻͍ͯɺෛՙΛ࠶ݱ͢Δɻ ɾ 1ճͷࢼߦΛ30෼ͱ͠ɺఏҊख๏ͷTTLΛ10෼ͱ͢Δɻ ධՁͷઃఆ ධՁ DB servers Load generation client VM VM ϕϯνϚʔΫ ϕʔεϥΠϯ ධՁ߲໨ ɾ TSDAํࣜΛͱΔKairosDBΛൺֱର৅ͱ͢Δɻ ɾ KairosDB͸σΟεΫϕʔεKVSͷCassandraΛ༻͍Δɻ 1. औΓࠐΈॲཧޮ཰ͷൺֱ 2. ϝτϦΫε਺ͷ૿Ճʹର͢ΔऔΓࠐΈॲཧޮ཰ͷൺֱ ϝϞϦKVS: Redis σΟεΫKVS: Cassandra ఏҊख๏

Slide 59

Slide 59 text

59 ̍. औΓࠐΈॲཧޮ཰ͷൺֱ ධՁ ϗετ਺ʢ1~8ʣ औ Γ ࠐ Έ ε ϧ ʛ ϓ ο τ ఏҊख๏ʢHeteroTSDBʣ͕ ϕʔεϥΠϯͷ3.98ഒɻ 420k datapoints/s ੨ɿKairosDB ᒵɿఏҊख๏ Slackࣾͷ12 m/s ͷϫʔΫϩʔυ ʹஔ͖׵͑Δͱ - ఏҊख๏͸229ݸ - KairosDB͸915ݸ ͷϗετ਺Λඞཁͱ͢Δܭࢉʹͳ Δɻ ϝτϦΫε਺Λ1Mʹݻఆ

Slide 60

Slide 60 text

ຊ࣮ݧͰ͸ɺ໌֬ʹΠϯσοΫεࢀর ͕ϘτϧωοΫͰ͋Δͱ͸ಛఆͰ͖ͯ ͍ͳ͍ɻ ࠓޙɺ௥ՃͷৄࡉͳϓϩϑΝΠϦϯά ͕ඞཁͰ͋Δɻ 60 ̎. ϝτϦΫε਺ͷ૿Ճʹର͢ΔऔΓࠐΈॲཧޮ཰ͷൺֱ ධՁ औ Γ ࠐ Έ ε ϧ ʛ ϓ ο τ ϝτϦΫε਺ʢ100~1,000,000) ੨ɿKairosDB ᒵɿఏҊख๏ 2.32ഒ 3.58ഒ ϝτϦΫε਺૿ՃͷεέʔϥϏϦςΟ͸ ϕʔεϥΠϯΑΓߴ͍ɻ σʔλ఺ͷશମૹ৴Ϩʔτ͸ݻఆ

Slide 61

Slide 61 text

61 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔτϨʔγϯάͷܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰ࡟ݮ͢Δલॲཧ๏ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ·ͱΊ औΓࠐΈॲཧޮ཰ͱ̍೥Ҏ ্ͷ௕ظσʔλอ࣋Λཱ྆ ϝτϦΫε਺100͔Β100ສ ݸͷൣғͰϕʔεϥΠϯʹର ͢ΔεέʔϥϏϦςΟ޲্ 100ສݸͷϝτϦΫεͷऔΓ ࠐΈ࣌ʹɺϕʔεϥΠϯʹର ͯ͠3.98ഒͷੑೳ޲্ ධՁᶃ ධՁᶄ ӡ༻ෳࡶੑΛߟྀ͠ɺ طଘͷKVS্ʹఏҊ๏Λ ࣮ݱ͢Δɻ ໨త ୈ̏෦ ߩݙᶄ ·ͱΊ

Slide 62

Slide 62 text

4. ނোࣗಈಛఆʹ͓͚Δલॲཧ๏ͷఏҊʢߩݙᶅʣ (Chapter 5) ϚΠχϯά૚

Slide 63

Slide 63 text

63 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔτϨʔγϯάͷܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ΛࣗಈͰ࡟ݮ͢Δલॲཧ๏ Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ

Slide 64

Slide 64 text

ϝτϦΫε ΦϖϨʔλʔ 64 ػցֶशʹΑΔނোಛఆͷࣗಈԽ ࣗಈނোಛఆ എܠ ো֐ݕ஌ ετϨʔδ 2. ೖྗ 3. ग़ྗ 1. ىಈ ݪҼΛࣔ͢ϝτϦΫε ͷϥϯΩϯά 1. memory_total_bytes{instance=host4,…} 2. disk_write_io{instance=host4,…} 3. net_transmit_bytes{instance=host1,…} 4. … [91,93,120-132]

Slide 65

Slide 65 text

ϝτϦΫε ΦϖϨʔλʔ 65 ػցֶशʹΑΔނোಛఆͷࣗಈԽ ࣗಈނোಛఆ എܠ ো֐ݕ஌ ετϨʔδ 2. ೖྗ 3. ग़ྗ 1. ىಈ ϥϯΫ 1. … 2. … 3. … ػցֶश ɾϝτϦΫεͱࠜຊݪҼͷϖΞΛେྔʹ ؚΉσʔληοτ͕ͳ͍ɻ ɾओʹڭࢣͳֶ͠श͕࠾༻͞ΕΔɻ ɾϝτϦΫε͝ͱʹҟৗ౓Λࢉग़ɻ ɾϝτϦΫεؒͷҟৗ఻ൖΛัଊɻ [91,93,120-132]

Slide 66

Slide 66 text

ϝτϦΫε ਺͕૿େ ΦϖϨʔλʔ 66 ނোಛఆʹ͓͚Δੑೳ௿Լͷ໰୊ ࣗಈނোಛఆ എܠ ো֐ݕ஌ ετϨʔδ 2. ೖྗ 3. ग़ྗ 1. ىಈ ϥϯΫ 1. … 2. … 3. … ػցֶश ϝτϦΫεͷ਺ͷ૿େʹΑΓɺਫ਼౓ ͱ࣮ߦ͕࣌ؒ௿Լ͢Δɻ [91,93,120-132] [23,24]

Slide 67

Slide 67 text

ϝτϦΫε ਺͕૿େ ΦϖϨʔλʔ 67 ނোಛఆʹ͓͚Δੑೳ௿Լͷ໰୊ ࣗಈނোಛఆ എܠ ো֐ݕ஌ ετϨʔδ 2. ೖྗ 3. ग़ྗ 1. ىಈ ϥϯΫ 1. … 2. … 3. … ػցֶश ಛ௃ྔ࡟ݮ ϊΠζͱͳΔϝτϦΫε ΛऔΓআ͘ [91,93,120-132] ϝτϦΫεͷ਺ͷ૿େʹΑΓɺਫ਼౓ ͱ࣮ߦ͕࣌ؒ௿Լ͢Δɻ[23,24] [23,84]

Slide 68

Slide 68 text

68 ಛ௃ྔ࡟ݮͷ໰୊ఆٛʢOursʣ Fig. 5.2: Three types of metrics on anomaly propagation for a failure. ނোʢFaultʣൃੜޙɺϝτϦΫεཻ౓Ͱͷҟ ৗͷ఻ൖϞσϧ ো֐Λݕ஌ͨ͠ΒɺͰ͖ΔݶΓૣ͘ɺ Λಛఆ͢Δ͜ͱɻ MA ∪ MB ໰୊ എܠ ɿ௚઀తʹӨڹ͕ݱΕͨϝτϦΫε ɿؒ઀తʹӨڹ͕ݱΕͨϝτϦΫε ɿແӨڹͷϝτϦΫε MA MB MC ࠜຊݪҼ ͨͩ͠ɺো֐ݕ஌௚ޙ͔Βݻఆͷ࣌ؒൣғ·Ͱ Λೖྗͱ͢Δɻʢ௨ྫͰ͸30~60෼ʣ

Slide 69

Slide 69 text

69 طଘͷಛ௃࡟ݮͱͦͷ՝୊ എܠ ҟৗੑʹجͮ͘࡟ݮ ো֐࣌ؒ֎ͷҟৗΛݕ஌͠͏Δɻ ݪҼϝτϦΫεʢ ʣؒͰ͸ྨࣅ͠΍͢ ͍ͨΊɺޡ࡟আ͕ൃੜ͠͏Δɻ MA ҟৗ͕ແ͍࣌ܥྻΛ࡟আ ૬ؔੑ΍ܗঢ়ྨࣅੑͷߴ͍࣌ܥྻΛ࡟আ ৑௕ੑʹجͮ͘࡟ݮ ຊདྷ࡟আ͍ͨ࣌͠ܥྻ ʢِཅੑʣ ʢِӄੑʣ ো֐ظؒ [23,120,123.127] [84,125]

Slide 70

Slide 70 text

70 طଘͷಛ௃࡟ݮͱͦͷ՝୊ എܠ ҟৗੑʹجͮ͘࡟ݮ ো֐࣌ؒ֎ͷҟৗΛݕ஌͠͏Δɻ ݪҼϝτϦΫεʢ ʣؒͰ͸ྨࣅ͠΍͢ ͍ͨΊɺޡ࡟আ͕ൃੜ͠͏Δɻ MA ҟৗ͕ແ͍࣌ܥྻΛ࡟আ ૬ؔੑ΍ܗঢ়ྨࣅੑͷߴ͍࣌ܥྻͷॏෳ ࡟আ ৑௕ੑʹجͮ͘࡟ݮ ຊདྷ࡟আ͍ͨ࣌͠ܥྻ ʢِཅੑʣ ʢِӄੑʣ ো֐ظؒ Ұ෦ͷϝτϦΫεʹݱΕΔҟৗੑɾ৑௕ੑͷΈΛѻ͏ɻ ہॴత େҬత γεςϜશମͷʮো֐ʯ΁ͷؔ࿈ੑΛଊ͍͑ͨɻ

Slide 71

Slide 71 text

71 ؍࡯ͱԾఆ Fig. 5.1: Change points in root fault metric.ΑΓҰ෦ൈਮ ނোൃੜ࣌ؒ ނোىҼͷมԽ఺͸ ޓ͍ʹ͍ۙ࣌ؒʹݱΕΔ ؍࡯ ہॴతͳಛ௃͔Β େҬతͳো֐Λ ଊ͑Δ มԽ఺͕࣌ؒ࠷΋ภΔൣғ͕ɺো֐ظؒͱͳΔ Ծఆ എܠ

Slide 72

Slide 72 text

72 ɾຊݚڀͰ͸ɺେҬతͳো֐Λଊ͑Δಛ௃ྔ࡟ݮ๏ΛఏҊͨ͠ɻ ɾఏҊख๏͸࠷ྑͷਖ਼ղ཰Λୡ੒͠ɺEnd-to-endͰͷਫ਼౓ͱ࣮ߦޮ཰Λ޲্ͤͨ͞ɻ ߩݙͷ֓ཁ ߩݙ ख๏ छผ ֶशछผ େҬੑ FluxInfer-AD BIRCH K-S test NSigma PairCorr k-Shape HDBS+SBD MetricSifter ҟৗੑ ৑௕ੑ ൒ڭࢣ͋Γ ʢਖ਼ৗظؒͷࢦఆʣ ڭࢣͳ͠ ҟৗੑ ڭࢣͳ͠ ✘ ✘ ✘ ✔ ଊ͑Δಛ௃ มԽ఺ ਖ਼ৗ - ҟৗظؒͷ ϢʔΫϦουڑ཭ ܗঢ়ྨࣅੑ ෼෍ͷมԽɾ֎Ε஋ ϐΞιϯ૬ؔੑ ڭࢣͳ͠ ҟछͷಛ௃ྔ࡟ݮ๏Λఆྔൺֱͨ͠ॳͷݚڀ

Slide 73

Slide 73 text

73 ఏҊɿো֐ࢦ޲ͷಛ௃࡟ݮ MetricSifter 2. େҬతͳΠϕϯτͱͯ͠ʮো֐ͷ࣌ؒൣғʯΛಛఆ͢Δ มԽ఺࣌ؒͷ෼෍ͷ࠷େͷๆ 1. ہॴతͳΠϕϯτͱͯ࣌͠ܥྻ͝ͱʹʮมԽ఺ʯΛݕग़͢Δ 3. ʮো֐ͷ࣌ؒൣғʯʹมԽ఺͕ ͋Δ → อ࣋ ͳ͍ → ࡟আ t ఏҊख๏

Slide 74

Slide 74 text

74 ఏҊख๏͸ͲͷΑ͏ʹಈ࡞͢Δ͔ʁ Fig. 5.5: An example of feature reduction using the MetricSifter framework. STEP 2:มԽ఺࣌ؒͷ෼෍ ΛجʹηάϝϯτΛ෼ׂ STEP 1:࣌ܥྻ͝ͱʹɺ ނো༝དྷͷมԽ఺ީิ Λݕग़ STEP3: ࠷େີ౓ͷηάϝϯ τΛબ୒ ఏҊख๏

Slide 75

Slide 75 text

75 STEP 1: ୯มྔ࣌ܥྻͷมԽ఺ݕग़ ᶃ ίετؔ਺ɿݕग़͢ΔมԽͷछྨ มԽ఺ݕग़ͷطଘͷ࿮૊Έ[148]ͷ͏ͪɺຊυϝΠϯʹదͨ͠΋ͷΛબ୒͢Δɻ ᶄ ୳ࡧ๏ɿมԽ఺ͷ୳ࡧΞϧΰϦζϜ ᶅ ϖφϧςΟ߲ɿݕग़͢ΔมԽ఺ͷ਺ʹ੍໿Λ͔͚Δ L2Ϟσϧ ʢฏۉγϑτʣ PeltɿݫີղΛٻΊΔ͕৚݅෇͖ͰࢬמΓߴ଎ԽՄ BICʹج͖ͮώϡʔϦεςΟοΫʹܾఆɻͨͩ͠ಠࣗͷዞҙతͳ܎਺ Λ௥Ճɻ ω ఏҊख๏

Slide 76

Slide 76 text

76 ᶃ ີ౓෼෍ͷਪఆ Χʔωϧີ౓ਪఆ๏ʢKDEʣΛ༻͍ͯ ཭ࢄܕͷ෼෍ີ౓Λੜ੒ STEP 2/3: มԽ఺ͷີ౓෼෍ਪఆͱ෼෍ͷ෼ׂ Fig. 5.6: An example of segmentation. ᶅ ࠷େີ౓ͷηάϝϯτΛબ୒ ᶄ ηάϝϯςʔγϣϯ ہॴ࠷খ఺ʹڥքઢΛҾ͘ ʢਤ͸10ݸͷηάϝϯτʹ෼ׂʣ ఏҊख๏

Slide 77

Slide 77 text

77 ɾ߹੒ɿো֐ͷ਺஋γϛϡϨʔγϣϯ ɾ࣮ূɿ̎छྨͷఆ൪ධՁ༻ΞϓϦέʔγϣϯ΁ͷނো஫ೖʹΑΔো֐࠶ݱ ධՁͷઃఆ ධՁ σʔληοτ ϕʔεϥΠϯ ධՁ߲໨ ධՁࢦඪ 1. ಛ௃ྔ࡟ݮ୯ҐͰͷਖ਼֬ੑ 2. End-to-endͷਫ਼౓ͱ࣮ߦ࣌ؒ ɾҟৗੑʹجͮ͘࡟ݮͷάϧʔϓ ɾ৑௕ੑʹجͮ͘࡟ݮͷάϧʔϓ 3. ύϥϝʔλͷහײੑͱAblation Study ɾಛ௃ྔ࡟ݮɿ෼ྨ໰୊ͷఆ൪ධՁࢦඪʢRecall / Specifically / Balanced Accuracy) ɾ End-to-end: ϥϯΩϯάग़ྗʹਖ਼ղؚ͕·ΕΔׂ߹ʢఆ൪ࢦඪΛ࠾༻ʣ ʢ߹ܭ132ݸͷσʔληοτʣ

Slide 78

Slide 78 text

78 1: ಛ௃ྔ࡟ݮ୯ମͷධՁʢ߹੒ʣ MetricSifterͷਖ਼ղ཰ͷฏۉ஋ 0.981ͱͳΓɺ࠷ྑ஋Λࣔͨ͠ɻ ৑௕࡟ݮάϧʔϓ͸ɺ૯ͯ͡ ௿είΞͱͳͬͨɻ ಺Ͱ࣌ܥྻ͕ྨࣅɾ૬ؔ ͢Δ΋ͷ͕࡟আ͞ΕΔͨΊɻ MA ∪ MB ධՁ ਖ਼ ղ ཰ ಛ௃ྔ࡟ݮ๏

Slide 79

Slide 79 text

79 ಛ௃ྔ࡟ݮͱނোಛఆ๏ͷ૊Έ߹ͤ ධՁ ࣗಈނোಛఆ ಛ௃ྔ࡟ݮ ຊݚڀͷண໨ ɾఏҊख๏ ɾҟৗੑʹجͮ͘࡟ݮͷάϧʔϓ ɾ৑௕ੑʹجͮ͘࡟ݮͷάϧʔϓ ɾNone ɾ Random Selection ɾ CallGraph + PageRank ɾ PC + PageRank ɾ PC + HT ɾ LiNGAM + PageRank ɾ LiNGAM + HT ɾ RCD શͯͷϖΞͷ ૊Έ߹ͤ Ͱ࣮ݧ End-to-end

Slide 80

Slide 80 text

PC+HT ϥϯμϜબ୒ 80 2: End-to-endͷධՁʢ߹੒ʣ Ұ෦ൈਮ ૯߹ධՁɹ ख๏ ਫ਼౓ උߟ Ideal 0.344 ཧ૝஋ MetricSifter 0.299 ࠷ྑ NSigma 0.241 ࣍఺ None 0.175 w /o ಛ௃࡟ݮ શނোಛఆ๏ͱͷ૊Έ߹ͤʹ ର͢Δtop-5ਫ਼౓ͷฏۉ஋ ධՁ MetricSifter͕ ཧ૝ख๏ʹ ͍ۙਫ਼౓Λୡ੒ தԝ஋ਫ਼౓ͷ ϥΠϯ

Slide 81

Slide 81 text

81 2: End-to-endධՁ -small SS 64 metrics ശͻ͛ਤɿTop-5ਫ਼౓ ંΕઢɿ࣮ߦ࣌ؒ ධՁ ʢ࣮ূʣ ୅දతͳҰ෦ͷ ૊Έ߹ͤΛܝࡌ ɾTop-5ਫ਼౓͸MetricSifter͕࠷ྑͰɺ࣮ ߦޮ཰͸ҟৗੑ࡟ݮΑΓ΋ߴ͍ɻ ɾ࣮ߦ࣌ؒ͸৑௕ੑ࡟ݮʢHDBS- SBD/HDBS-Rʣ͕࠷ྑ͕ͩਫ਼౓ ͸࠷΋௿͍ɻ தԝ஋ਫ਼౓ ͷϥΠϯ

Slide 82

Slide 82 text

82 2: ࣮ূσʔλৄࡉʢେن໛ >100 metricsʣ -medium SS -large SS -small TT -medium TT 184 metrics 1312 383 1349 ಛఆͷނোಛఆ๏ʢRCDʣͷΈ͕ݱ࣮తͳ࣌ؒ಺ʢ3600ඵҎ಺ʣͰॲཧΛ ऴ͑ͨɻ ධՁ ଞ͸ɺނোಛఆΞϧΰϦζϜʹฒྻੑ͕ͳ͍ݱ࣮తͳ࣌ؒ಺ʹ׬ྃͤͣɻ ͔͠͠ɺϝτϦΫε਺>1000Ͱ͸ɺ͍ͣΕͷέʔεʹ͓͍ͯ΋ ඇৗʹ௿͍ਫ਼౓ͱͳͬͨɻ

Slide 83

Slide 83 text

83 3: ύϥϝʔλͷහײੑͱAblation Study ධՁ ύϥϝʔλʔ͕ద੾Ͱ͋Ε ͹ɺਫ਼౓ࠩ͸খ͍͞ɻ ߹੒ͷ͖Ε͍ͳσʔλͰ͸ɺ มԽ఺ݕग़ਫ਼౓͕ߴ͗͢Δͨ ΊͰ͋Δͱߟ͑Δɻ STEP1ʢมԽ఺ݕग़ʣͷύϥϝʔ λ ͕௿͍ͱਖ਼֬ੑ͕௿Լɻ ω ͔͠͠ɺSTEP2/3ʹΑΓਫ਼౓ ͕޲্͍ͯ͠Δɻ ੨ɿMetricSifter ׬શ൛ ஡ɿMetricSifter STEP1ͷΈ

Slide 84

Slide 84 text

84 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧ಺ͷޮ཰తू໿ ʹΑΔτϨʔγϯάͷܭ૷๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼౓ͷ௿Լ ϝτϦΫεͷݸ਺ͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊૚Խ๏ͱ֊૚ؒҠߦ๏ ো֐ʹؔ࿈͠ͳ͍ϝτϦΫε ΛࣗಈͰ࡟ݮ͢Δલॲཧ๏ ωοτϫʔΫ઀ଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ΋૿େ ୈ̐෦ ߩݙᶅ ·ͱΊ ɾಛ௃࡟ݮͷఆྔతͳൺֱධՁΛߦͬͨॳͷݚڀ ɾہॴతͳมԽ఺ͷू߹͔ΒେҬతͳো֐Λଊ͑Δख๏ ΛఏҊɻ ɾ߹੒ɿ࠷ྑͷਖ਼ղ཰ɻEnd-to-endਫ਼౓Λ24%޲্ɻ ɾ࣮ূɿEnd-to-endͰਫ਼౓ͱ࣮ߦޮ཰ͷ྆ํ·ͨ͸͍ ͣΕ͔Λ޲্ɻ

Slide 85

Slide 85 text

5. ૯ׅ (Chapter 6)

Slide 86

Slide 86 text

86 ૯ׅɿςϨϝτϦʔϫʔΫϩʔυεέʔϦϯά ςϨϝτϦʔγεςϜ Ϋϥ΢υ ΞϓϦέʔγϣϯ ΦϖϨʔλʔ Ϣʔβʔ Πϯλʔωοτ ܭଌ ετϨʔδ ϚΠχϯά Ϧιʔεফඅ Ϧιʔεফඅ ϫʔΫϩʔυͷ૿େ ⾭ ⾭ ߩݙ ᶃ Χʔωϧ಺ωοτϫʔΫϑ ϩʔͷूଋʹΑΔ௿Φʔόʔ ϔουͳܭ૷๏ͷఏҊɻ ߩݙ ᶄ औΓࠐΈޮ཰ͱ௕ظอ࣋Λ ཱ྆ՄೳͳҟछKVSͷ֊૚ ԽΞʔΩςΫνϟͷఏҊɻ ʢैདྷൺ࠷େ3.98ഒͷεϧʔ ϓοτ޲্ʣ ߩݙ ᶅ ো֐ʹؔ࿈͢ΔϝτϦΫε ͷมԽ఺ͷूதੑʹண໨͠ ͨಛ௃࡟ݮ๏ͷఏҊɻ ʢैདྷൺฏۉ+4.5%ͷਫ਼౓޲্ ฏۉ࣮ߦ࣌ؒ45-52%ͷ޲্ʣ ʢCPU࢖༻཰2.2%ҎԼɺRTT Φʔόʔϔου࠷େ6μsʣ

Slide 87

Slide 87 text

87 ɾ ʮӡ༻ෳࡶੑΛ௿͘཈͑Δ͜ͱʯΛ੍໿৚݅ͱͯ͠ɺʮςϨϝτϦʔϫʔΫϩʔυ εέʔϦϯάʯͱզʑ͕ݺͿ໰୊Λಠࣗʹઃఆͨ͠ɻ ɾ ςϨϝτϦʔγεςϜΛ3ͭͷ૚ʹ෼ྨ͠ɺ֤૚ͷ՝୊Λ੔ཧ͠ɺͦΕΒΛղܾ͢ ΔͨΊͷٕज़ఏҊΛࣔͨ͠ɻ ຊݚڀͷҙٛ ֶज़తߩݙ ࣾձతҙٛ ɾ DX͕Ճ଎͢ΔதɺΦϯϥΠϯαʔϏεͷن໛͕֦ு͞ΕΔʹͭΕͯɺςϨϝτϦʔ γεςϜͷϫʔΫϩʔυ͸·͢·͢૿େ͢ΔͩΖ͏ɻ ɾ ༗ݶͷܭࢉػͱਓతࢿݯͷதͰɺӡ༻ෳࡶੑΛ௿ݮ্ͨ͠ͰͷςϨϝτϦʔϫʔΫ ϩʔυͷॲཧޮ཰ͷ޲্͸ඞཁͰ͋Δɻ ɾ ຊݚڀ͸ɺΦϖϨʔλʔͷ࿑ྗͷ࡟ݮͱαʔϏεͷ৴པੑͷ޲্ʹد༩͢Δ΋ͷͰ ͋Δͱߟ͑Δɻ

Slide 88

Slide 88 text

88 ຊݚڀͷࣾձ࣮૷ ※3 https://github.com/ai4sre/metricsifter ※2 https://github.com/yuuki/go-conntracer-bpf ※1 https://mackerel.io/ja/blog/entry/weekly/20180126 ɾߩݙᶃ ɿGoݴޠͷϥΠϒϥϦͱͯ͠ެ։ࡁΈ ɾߩݙᶄɿαʔόʔ؂ࢹSaaS “Mackerel” ͷDBΞʔΩςΫνϟͱͯ͠ద༻ࡁΈ ɾߩݙᶅɿPythonݴޠͷϥΠϒϥϦͱͯ͠ެ։ࡁΈ ※2 ※1 ※3 ※2 ͱ ※3 ͸࣮؀ڥͰͷ࢖༻ྫ͕·ͩͳ͍ͨΊɺࠓޙීٴ׆ಈΛߦ͏ɻ

Slide 89

Slide 89 text

89 ࠓޙͷల๬ 1. Collect-First͔Β Use-First΁ 2. LLMʹΑΔো֐؅ཧ 3. ෼ࢄਂ૚ֶशΠϯϑϥ ͷͨΊͷςϨϝτϦʔ σʔλར༻ύλʔϯΛϑΟʔυόοΫ͠ɺඞཁͳσʔλͷΈ Λऩू͢ΔΑ͏ʹࣗಈదԠ͢ΔดϧʔϓγεςϜͷݚڀɻ LLMΛ׆༻ͨ͠ނোಛఆࣗಈԽʹ͍ͭͯɺϓϩϯϓτ௕ͷ্ ݶΛߟྀͨ͠࡟ݮɾѹॖʹجͮ͘ʮো֐εφοϓγϣοτʯ ͷੜ੒ख๏ͷݚڀɻ GPUΛ࢖༻͢Δେن໛Ϋϥελʹ͓͍ͯɺ෼ࢄֶशϫʔΫ ϩʔυͷ࠷దԽ΍଱ো֐ੑ޲্ͷͨΊͷ৽͍͠ςϨϝτϦγ εςϜͷݚڀɻ ςϨϝτϦʔ3૚ͷશମ࠷దԽ ৽ٕज़ʹ͓͚ΔϚΠχϯά૚ͷ ϫʔΫϩʔυεέʔϦϯά Ϋϥ΢υΞϓϦέʔγϣϯ Ҏ֎ͷγεςϜ

Slide 90

Slide 90 text

90 ݚڀۀ੷ɹड৆ ɾ ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ2020 ༏ल࿦จ৆ ௶಺༎थ, ௽ాതจ, ݹ઒խେ, TSifter: Ϛ ΠΫϩαʔ ビ εʹ͓͚Δੑೳҟৗͷਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻ デ ʔλͷ࣍ݩ࡟ݮख๏, 2020೥12݄. ɾ ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ2020 ༏लϓϨθϯςʔγϣϯ৆ ௶಺༎थ, TSifter: ϚΠΫ ϩαʔ ビ εʹ͓͚Δੑೳҟৗͷਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻ デ ʔλͷ࣍ݩ࡟ݮख๏, 2020೥12݄. ɾ 2020೥౓ ৘ใॲཧֶձ ࢁԼه೦ݚڀ৆ɼ௶಺༎थ, Transtracer: ෼ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴ୺఺ ͷ؂ࢹʹΑΔϓϩηεؒґଘؔ܎ͷࣗಈ௥੻, 2020೥. ɾ ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ2019ʢIOTS2019ʣ༏ल࿦จ৆ ௶಺༎थ, ݹ઒խେ, দຊ ྄հ, Transtracer: ෼ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴ୺఺ͷ؂ࢹʹΑΔϓϩηεؒґଘؔ܎ͷࣗಈ௥੻, 2019೥12݄. ɾ ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ2019ʢIOTS2019ʣף৆: γʔɾΦʔɾίϯϰ৆ ௶಺༎ थ, ݹ઒խେ, দຊ྄հ, Transtracer: ෼ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴ୺఺ͷ؂ࢹʹΑΔϓϩηεؒґଘ ؔ܎ͷࣗಈ௥੻, 2019೥12݄.

Slide 91

Slide 91 text

91 ɾ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, March 2022. ݚڀۀ੷ɹ࿦จࢽɾࠃࡍձٞ ࿦จࢽ ࠃࡍձٞ ɾ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Transtracer: Socket-Based Tracing of Network Dependencies among Processes in Distributed Applications, The 1st IEEE International COMPSAC Workshop on Advanced IoT Computing (AIOT 2020), July 2020. ɾ ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछ෼ࢄKVSؒͷࣗ ಈ֊૚ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ৘ใॲཧֶձ࿦จࢽ, Vol.62, No.3, pp.818-828, 2021೥3݄. ɾ Y. Tsubouchi, A. Wakisaka, K. Hamada, M. Matsuki, H. Abe, R. Matsumoto, HeteroTSDB: An Extensible Time Series Database for Automatically Tiering on Heterogeneous Key-Value Stores, The 43rd Annual IEEE International Computers, Software & Applications Conference (COMPSAC), pp. 264-269, July 2019. ɾ ௶಺༎थ, ҏ໺จ඙, ஔాਅੜ, ࢁ઒૱, ദ໦ַ඙, ഡݪ݉Ұ, ॏෳഉআετϨʔδͷͨΊͷSHA-1ܭࢉγεςϜͷ SSE໋ྩʹΑΔߴεϧʔϓοτԽ, ిࢠ৘ใ௨৴ֶձ࿦จࢽ D, 96(10), pp.2101-2109 2013೥10݄. ɾ Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Ef fi cient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. ʢߩݙ̎ʣ ʢߩݙ̍ʣ ʢߩݙ̏ʣ ʢߩݙ̍ʣ ʢߩݙ̎ʣ

Slide 92

Slide 92 text

92 ݚڀۀ੷ɹࠃ಺γϯϙδ΢Ϝʢࠪಡ෇ʣ ɾ ʢߩݙ̏ʣ௶಺༎थ, ௽ాതจ, ݹ઒խେ, TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷਝ଎ͳ਍அʹ޲͍ͨ࣌ ܥྻσʔλͷ࣍ݩ࡟ݮख๏, ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2020, 9-16 (2020- 11-26), 2020೥12݄. ɾ ௶಺༎थ, ੨ࢁਅ໵, MeltriaɿϚΠΫϩαʔϏεʹ͓͚Δҟৗݕ஌ɾݪҼ෼ੳͷͨΊͷσʔληοτͷಈతੜ੒ γεςϜ, ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2021, 63-70 (2021-11-18), 2021೥11݄. ɾ ྛ༑Ղ, দݪࠀ໻, ࿯๺ݡ, ௶಺༎थ, Situation Awarenessͱೝ஌৺ཧֶʹ΋ͱ͍ͮͨϚΠΫϩαʔϏεܕγες Ϝ޲͚؂ࢹμογϡϘʔυͷઃܭ, ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2021, 97-98 (2021-11-18), 2021೥12݄. ɾ ௽ాതจ, ௶಺༎थ, ෼ࢄγεςϜͷੑೳҟৗʹର͢Δػցֶशͷղऍੑʹجͮ͘ݪҼ਍அख๏, ৘ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2021, 24-31 (2021-11-18), 2021೥11݄. ɾ ʢߩݙ̍ʣ௶಺༎थ, ݹ઒խେ, দຊ྄հ, Transtracer: ෼ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴ୺఺ͷ؂ࢹʹΑ Δϓϩηεؒґଘؔ܎ͷࣗಈ௥੻, Πϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ࿦จू, 2019, 64-71 (2019-11-28), 2019೥12݄. ɾ ʢߩݙ̎ʣ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछࠞ߹Ωʔ バ ϦϡʔετΞ Λ༻͍ͨࣗಈ֊૚ԽͷͨΊͷ࣌ܥྻ デ ʔλ ベ ʔεΞʔΩςΫνϟ, ৘ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯ ϙδ΢Ϝ࿦จू, 2018, 7-15 (2018-11-29), 2018೥12݄.

Slide 93

Slide 93 text

93 ݚڀۀ੷ɹࠃ಺ձٞ࿥ʢࠪಡͳ͠ʣ ɾ ྛ༑Ղ, দݪࠀ໻, ࿯๺ݡ, ௶಺༎थ, ϚΠΫϩαʔϏεܕγεςϜͷ؂ࢹʹ͓͚ΔμογϡϘʔυUIઃܭʹىҼ ͢Δঢ়گೝࣝ΁ͷӨڹ, No.2022-IOT-56, Vol.38, pp.1-8, 2022೥3݄. ɾ দຊ྄հ, ௶಺༎थ, ΫϥΠΞϯτϓϩηεͷݖݶ৘ใʹجͮ͘TCPΛհͨ͠ಁաతͳݖݶ෼཭ํࣜͷઃܭ, ৘ ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2020-IOT-49, Vol.11, pp.1-6, 2020೥5݄. ɾ ྛ༑Ղ, ҏ੎ా࿇, দݪࠀ໻, ࿯๺ݡ, ௶಺༎थ, দຊ྄հ, ಈతదԠੑΛ࣋ͭ෼ࢄγεςϜΛର৅ͱͨ͠γεςϜ ঢ়ଶՄࢹԽख๏ͷݕ౼, ৘ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2020-IOT-48, Vol.22, pp.1-8, 2020೥3݄. ɾ ௶಺༎थ, ݹ઒խେ, দຊ྄հ, ௒ݸମܕσʔληϯλʔΛ໨ࢦͨ͠ωοτϫʔΫαʔϏεؒґଘؔ܎ͷࣗಈ௥ ੻ͷߏ૝, ϚϧνϝσΟΞɺ෼ࢄɺڠௐͱϞόΠϧʢDICOMO2019ʣγϯϙδ΢Ϝ, 6A-2, pp. 1169-1174, 2019 ೥7݄. ɾ ௶಺༎थ, দຊ྄հ, ௒ݸମܕσʔληϯλʔʹ͓͚Δ෼ࢄڠௐΫΤϦΩϟογϡߏ૝, ৘ใॲཧֶձݚڀใࠂ Πϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2019-IOT-45, Vol.14, pp.1-7, 2019೥5݄. ɾ দຊ྄հ, ௶಺༎थ, ٶԼ߶ี, ෼ࢄܕσʔληϯλʔOSΛ໨ࢦͨ͠ϦΞΫςΟϒੑΛ࣋ͭίϯςφ࣮ߦج൫ٕ ज़, ৘ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2019-IOT-45, Vol.12, pp.1-8, 2019೥3݄.

Slide 94

Slide 94 text

Appendix

Slide 95

Slide 95 text

ݚڀ֓ཁ: Scaling Telemetry Workloads in Cloud Applications എܠͱ໨త ՝୊ ߩݙ 1. Ϋϥ΢υΞϓϦέʔγϣϯͷςϨϝτϦʔ 2. ςϨϝτϦʔϫʔΫϩʔυͷ૿େ 3. ςϨϝτϦʔϫʔΫϩʔυεέʔϦϯά 1. ܭଌɿܭଌॲཧΦʔόʔϔουͷ૿େ 2. ετϨʔδɿऔΓࠐΈσʔλྔͷ૿େͱ௕ظอଘ 3. ϚΠχϯάɿނোಛఆͷਫ਼౓ɾ࣮ߦޮ཰ͷ௿Լ 1. ୹໋ͳωοτϫʔΫ௨৴͕૿େ͢ΔͱɺैདྷͷܭଌॲཧͰ͸ɺܭଌݩͷOS Χʔωϧ͔Βͷసૹॲཧίετ͕ߴ͍ɻ ϝτϦΫε਺ͷ૿େʹରͯ͠ɺऔΓࠐΈॲཧޮ཰ͷ޲্ͱ̍೥Ҏ্ͷ௕ ظอଘΛཱ྆͢Δ͜ͱ͕೉͍͠ɻ ϝτϦΫε਺ͷ૿େʹରͯ͠ɺطଘͷಛ௃࡟ݮΛద༻ͨ͠ͱͯ͠΋ɺγες Ϝશମͷো֐Λଊ͑ΒΕͣɺِཅੑɾِӄੑ͕૿Ճ͢Δɻ ܭଌॲཧͷޮ཰Խ [1] Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, March 2022. [2] ௶಺༎थ, ࿬ࡔேਓ, ᖛా݈, দ໦խ޾, খྛོߒ, Ѩ෦ത, দຊ ྄հ, HeteroTSDB: ҟछ෼ࢄKVSؒͷࣗಈ֊૚ԽʹΑΔߴੑೳͳ ࣌ܥྻσʔλϕʔε, ৘ใॲཧֶձ࿦จࢽ, Vol.62, No.3, pp.818- 828, 2021೥3݄. [3] Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Ef fi cient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. 2. औΓࠐΈॲཧͱ௕ظอଘͷޮ཰ͷ޲্ 3. ނোಛఆͷલॲཧͰো֐ʹؔ࿈͠ͳ͍มྔͷ࡟ݮ OSΧʔωϧ಺ͰTCP/UDP௨৴ΠϕϯτΛूଋ͢Δ͜ͱʹΑΔసૹॲཧޮ཰ͷ޲্ ҟछKVSΛ֊૚Խ͠ɺΠϯσοΫεࢀরޮ཰ͱ҆ՁͳετϨʔδ΁ͷ֨ೲΛ࣮ݱɻ ো֐ൃੜ࣌ʹ֤࣌ܥྻͷมԽ఺͕࣌ؒूத͢Δ͜ͱΛߟྀͨ͠ಛ௃࡟ݮʹΑΓɺ ނোಛఆਫ਼౓ͱ࣌ؒΛվળɻ ֤૚ͷϫʔΫϩʔυ૿େ࣌ͷ՝୊ղܾ ςϨϝτϦʔϫʔΫϩʔυ૿େͷ՝୊ ޮ཰తʹεέʔϧՄೳͳςϨϝτϦʔγ εςϜͷ࣮ݱʹ޲͚ͯ ΞϓϦέʔγϣϯ͕ෳࡶԽ͓ͯ͠ΓɺςϨϝτϦʔʹΑΔӡ༻ ؅ཧ͕ඞਢͰ͋Δɻ [1] [2] [3] ςϨϝτϦʔγεςϜͰɺܭଌɾετϨʔδɾϚΠχϯάͷ֤૚ ͰϫʔΫϩʔυ͕૿େ͍ͯ͠Δɻ ܭࢉػࢿݯͷফඅ૿େͳͲͷ໰୊ʹରͯ͠ޮ཰Α͘εέʔϧͤ͞Δ ͜ͱΛ໨తͱ͢Δɻͨͩ͠ɺӡ༻ෳࡶੑΛߟྀ͢Δ͜ͱɻ