特徴量抽出と変化点検出に基づくWebサーバの高集積マルチテナント方式におけるリソースの自律制御アーキテクチャ / 2017 iot36

特徴量抽出と変化点検出に基づくWebサーバの高集積マルチテナント方式におけるリソースの自律制御アーキテクチャ / 2017 iot36

Webホスティングサービスにて管理者がテナントごとのコンテンツを制御できないような高集積マルチテナントWebサーバ環境では,ホスト間のリソース競合を減らすことが安定運用にとって不可欠である.しかしホスト数が増えるにつれ,サーバ内の原因となるホストの監視や制御のコストも増加するため運用は難しくなる.本論文ではリソースの各指標の時系列データの変化点検出,ならびにサーバ内ホストやプログラムの各指標の重みづけによって,システムリソース逼迫状況下で多量のリソースを消費するリクエストを同定し隔離する自律的アーキテクチャを提案する.

In a highly integrated multi-tenant Web server enviroment such as a Web hosting service where the server administrator has no control of the contents in each tenant, reducing resource competition between the server hosts is essential for the stable operation. As the number of the hosts increases, however, the operation becomes difficult due to the
increasing cost of monitoring and control to manage the responsible hosts in the server. We propose an autonomous architecture of identifying and isolating the requests causing high system resource consumption under the resource exhaustion condition, by detecting the change points of the resource metrics as time series data, and by weighting the metrics of the hosts and programs in the server.

2b692bd83f4418103142a053ecf5ff59?s=128

MATSUMOTO Ryosuke

March 22, 2017
Tweet

Transcript

  1. 8.

    8FCαʔόͷجຊతͳϞσϧ 1BSFOUIUUQEQSPDFTT PXOFSSPPU $IJMEIUUQEQSPDFTT PXOFSIUUQE $IJMEIUUQEQSPDFTT PXOFSIUUQE $IJMEIUUQEQSPDFTT PXOFSIUUQE $MJFOU

    ϦΫΤετ Ϩεϙϯε 6/*9ܥ04ͷ৔߹ࣄલʹϦΫΤετΛड͚Δ ϓϩηεΛෳ਺GPSL ͯ͠ϓʔϧ͓ͯ͘͠ ʢ͜ΕΒશͯΛؚΊͯʮ୯Ұͷαʔόϓϩηεʯͱఆٛʣ  ϦΫΤετ Ϩεϙϯε ϦΫΤετ Ϩεϙϯε
  2. 9.

    ߴूੵ8FCϗεςΟϯάͷΞʔΩςΫνϟ IUUQE Ϣʔβ" IUUQE Ϣʔβ# IUUQE Ϣʔβ$ IUUQE Ϣʔβ" Ϣʔβ#

    Ϣʔβ$ ޮ཰ྑ͘࢒ϦιʔεΛ࢖͑Δ ىಈʹඞཁ ͳϦιʔε ىಈʹඞཁ ͳϦιʔε ىಈʹඞཁ ͳϦιʔε ىಈʹඞཁͳϦιʔε ߴूੵ͕ཁٻ͞ΕΔ৔߹ͷ Ϛϧνςφϯτʢຊݚڀʣ ୯Ұͷαʔόϓϩηε ϚϧνςφϯτͷผΞϓϩʔν ෳ਺ͷαʔόϓϩηε 
  3. 28.
  4. 29.

    ఏҊख๏ͷࣗ཯੍ޚϑϩʔ 8FCαʔό ϓϩηε ΫϥΠΞϯτ ϦΫΤετ Ϩεϙϯε ॏΈ෇͚Ϧετ Ϩεϙϯεੜ੒ʹ ফඅͨ͠Ϧιʔε஋͔ ΒมԽ఺είΞܭࢉ

    ϗετ ͱεΫϦϓτ ຖͷϦ ιʔε࢖༻ྔͷ࣌ܥྻσʔλ ͔Βஞ࣍ܭࢉͨ͠౷ܭϞσϧ ͷ܎਺Λอଘ มԽ఺είΞΛॏΈ ෇͚ϦετʹՃࢉ ߴෛՙ࣌͸ॏΈ෇͚Ϧ ετ্Ґͷ৔߹ɺ੍ݶԼ ͰϨεϙϯεੜ੒ Ϧιʔεݶఆ؀ڥ $16ˋ *014 ϑΝΠϧ΁ͷಉ࣌઀ଓ਺ 
  5. 32.

    ॏΈ෇͚Ϧετͷσʔλߏ଄ྫ { host1: {ɹɹɹɹɹɹɹɹɹɹɹɹ # ϗετ໊ st_score: 83, # ϗετͷಉ࣌઀ଓ਺είΞ

    rc_score: 32, # ϗετͷϨεϙϯελΠϜείΞ files: { path_to_progmra1: { # ϓϩάϥϜϑΝΠϧύε st_score: 30, # ϑΝΠϧͷಉ࣌઀ଓ਺είΞ rc_score: 20, # ϑΝΠϧͷϨεϙϯελΠϜείΞ }, path_to_progmra2: { st_score: 53, rc_score: 12, }, }, }, ϑΝΠϧ΁ͷϦΫΤετʹରͯ͠ܭࢉͨ͠είΞΛɺ֘౰ ͢ΔϗετͱϑΝΠϧͷείΞʹͦΕͧΕՃࢉ͢Δ
  6. 33.

    มԽ఺είΞܭࢉྫ > cf = ChangeFinder.new 5, 0.01, 10, 0.01, 7

    => #<ChangeFinder:0x7fad5c80be50 @ts_data_buffer=[], @change_point_analyze=#<ChangeFinder::SDAR:0x7fad5c80bb80>, @smooth_term=5, @outlier_analyze=#<ChangeFinder::SDAR: 0x7fad5c80be20>> > cf.learn [1,2,1,2,3,2,1,2,1] => [6.2017912433901, 1.3973555597559, 2.4211198000217, 2.3979400886673, 1.7835503570548, 1.4166612339939, 1.4837836144657, 1.2835583707215, 1.1556254255408] > cf.score 1 => 1.1044914205061
  7. 35.

    มԽ఺ݕग़ΤϯδϯͷॳظԽ࣮૷ྫ # ChangeFinderΠϯελϯεੜ੒ cf = ChangeFinder.new(5, 0.1, 10, 0.1, 3)

    # ԾֶशσʔλʹΑΔࣄલֶश cf.learn [1,1,1,1,1,1,1,1,1,1] # ֤ϑΣʔζͰσʔλΛऔΓग़ͤΔΑ͏ʹϢʔβσʔλʹอଘ Userdata.new.cf_list = {} Userdata.new.cf = cf
  8. 36.

    ϗετ୯ҐͷมԽ఺είΞͷܭࢉྫ r = Apache::Request.new cf = Userdata.new.cf cf_list = Userdata.new.cf_list


    hostname = r.hostname res_time = r.response_time # vhostͷChangeFinderΠϯελϯε͕ଘࡏ͠ͳ͚Ε͹ෳ੡ unless cf_list.has_key?(hostname) usercf[hostname] = cf.clone end # ϦΫΤετλΠϜ͔ΒมԽ఺είΞΛܭࢉ͠ϩάʹग़ྗ Apache.log Apache::APLOG_ERR, “requesttime: #{r.response_time.to_s} score: #{cf_list[hostname].score(res_time)} hostname: #{hostname}”
  9. 38.

    ੍ݶख๏ͷίϯϙʔωϯτ͸࣮૷ࡁΈ wϦΫΤετ୯ҐͰ$16ͷ࠷େ࢖༻཰Λมߋ˞  wDHSPVQ NSVCZDHSPVQ ΍SMJNJU NSVCZSFTPVSDF  wϗετ୯Ґ΍ϑΝΠϧ୯ҐͰͷಉ࣌઀ଓ਺Λมߋ wNPE@NSVCZͱIUUQBDDFTTMJNJUFS

    ˞দຊ྄հɾԬ෦णஉ ϦΫΤετ୯ҐͰԾ૝తʹίϯϐϡʔλϦιʔεΛ෼཭͢Δ8FCαʔόͷϦιʔε੍ޚΞʔΩςΫ νϟ ৘ใॲཧֶձݚڀใࠂ7PM*05 /P ೥݄ 
  10. 42.

    ࣮ݧ؀ڥ ߲໨ ࢓༷ $16 *OUFM9FPO&W()[ .FNPSZ (#ZUFT 4FSWFS /&$&YQSFTT3G& 04

    $FOU04-JOVY,FSOFM ΞΧ΢ϯτ਺ ໿ΞΧ΢ϯτ ϗετ਺ ໿ϗετ ೔ͷฏۉΞΫηε਺ ໿ສΞΫηε
  11. 43.

    BMCBैདྷͷෛՙ΍ௐࠪ΍੍ݶΛࣗಈԽ BMCB 8FCαʔόϓϩηε 8FCαʔόϓϩηεͷ ઃఆ 8FCαʔόϓϩηεͷ ϩά ؂ࢹίϯςϯπͷϨεϙϯε͕ ඵҎ্·ͨ͸ϩʔυΞϕϨʔδҎ্ ͔Λ෼ຖʹνΣοΫ

    աڈ෼ͷϩά͔Β  ॲཧ࣌ؒͷ߹ܭ͕࠷΋ଟ͍ΞΧ΢ϯτPSϗετ  ΞΫηε਺ͷ߹ܭ͕࠷΋ଟ͍ΞΧ΢ϯτPSϗετ Λநग़ நग़ͨ͠ΞΧ΢ϯτ΍ϗετͷΞΫηε਺΍ॲཧ͕࣌ؒ  શମͷҎ্઎༗͍ͯ͠Ε͹࠷େಉ࣌઀ଓ਺  Ҏ্Ͱ͋Ε͹࠷େಉ࣌઀ଓ਺ ʹઃఆʢ෼ޙʹෆՄ͕؇࿨ͨ͠Βղআʣ
  12. 47.

    0 10 20 30 40 50 60 70 2016/3/25 2016/4/25

    2016/5/25 2016/6/25 2016/7/25 2016/8/25 2016/9/25 2016/10/25 2016/11/25 2016/12/25 2017/1/25 the number of alert/day the number of alert/day BMCBಋೖ  ϗετऩ༰ఀࢭ  ೥ؒͷ೔ͷΞϥʔτ਺ͷਪҠ
  13. 48.

    0 500 1000 1500 2000 2500 3000 2016/3/25 2016/4/25 2016/5/25

    2016/6/25 2016/7/25 2016/8/25 2016/9/25 2016/10/25 2016/11/25 2016/12/25 2017/1/25 total number of alert total number of alert BMCBಋೖ  ϗετऩ༰ఀࢭ  ೥ؒͷΞϥʔτ૯਺ͷਪҠ
  14. 51.