Slide 1

Slide 1 text

࿦จ঺հ Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting Taichi Murayama

Slide 2

Slide 2 text

2 Title: Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting Author: Haoyi Zhou1, Shanghang Zhang2, Jieqi Peng1, Shuai Zhang1, Jianxin Li1, Hui Xiong3, Wancai Zhang4 (1 Beihang University (๺ژߤۭߤఱେֶ), 2 UC Berkeley, 3 Rutgers University, 4 SEDD Company) Conference: AAAI 2021 (Thirty-Fifth AAAI Conference on Artificial Intelligence) AAAI-21 Outstanding Paper! Bibliographic Information

Slide 3

Slide 3 text

3 • ਓ޻஌ೳܥͷࠃࡍձٞͷτοϓΧϯϑΝϨϯε AAAIͷOutstanding Paper (͢͝ ͍ձٞͷதͰ΋͍͢͝࿦จ) • ௕ظతͳܥྻ༧ଌΛऔΓѻͬͨ࿦จ • Transformerͱ͸Ͳ͏͍͏΋ͷ͔Λཧղ Purpose: Why I choose this paper. ҰݴͰݴ͏ͱ… ʮ௕ظతͳܥྻ༧ଌΛ໨తͱͨ͠ɼܭࢉίετ΍ϝϞϦίετΛ ࡟ݮͨ͠TransformerͷมछΛఏҊ͠༧ଌੑೳΛ޲্ʯ • ܥྻ༧ଌλεΫʹ͓͍ͯTransformer(ଞͷਂ ૚ֶशϞσϧ΋ؚΉ)͕ۤखͳ఺͸Ͳ͜ͳͷ ͔ʁͲ͏͍͏ղܾࡦ͕͋Δ͔ʹ͍ͭͯཧղ

Slide 4

Slide 4 text

4 Motivation • աڈͷܥྻ͔Β௕ظతͳকདྷͷܥྻ ྫ͑͹ϙΠϯτઌ΍ि ؒઌͷܥྻ Λ༧ଌ • ӳޠͰ͸಄จࣈΛऔͬͯ-45'ͱݺশ • طଘͷϞσϧͰ͸ࠔ೉ ྫ-45.ʹΑΔܥྻ༧ଌ औΓ૊Έ͍ͨ՝୊௕ظతͳܥྻ༧ଌ -POH4FRVFODF5JNFTFSJFT'PSFDBTUJOH MSE score: ⻑期先の予測で 予測精度が低下 Inference speed :推論速度の低下

Slide 5

Slide 5 text

5 Motivation • 5BTLաڈͷܥྻ͔Β௕ظతͳকདྷͷܥྻΛ༧ଌ͢Δͱ͍͏ܥྻ ༧ଌ໰୊ • *OQVU𝑋! = 𝑥" !, 𝑥# !, … , 𝑥$ ! 𝑥% ! ∈ ℝ&!} • 0VUQVU𝑌! = 𝑦" !, 𝑦# !, … , 𝑦$ ! 𝑦% ! ∈ ℝ&"} औΓ૊Έ͍ͨ՝୊௕ظతͳܥྻ༧ଌ -POH4FRVFODF5JNFTFSJFT'PSFDBTUJOH

Slide 6

Slide 6 text

6 -45'ʹऔΓ૊ΉͨΊͷͭͷ՝୊ • -POH4FRVFODF*OQVU-FBSOJOH1SPCMFN ௕ظతͳܥྻʹ͓͚Δ಺෦ؔ܎Λ ͲͷΑ͏ʹϞσϧͰଊ͑Δ͔ʁ ϝϞϦྔ໰୊ΛͲ͏ղܾ͢Δ͔ʁ • -POH4FRVFODF'PSFDBTUJOH1SPCMFN ೖྗͱग़ྗؒͷؔ܎ΛͲͷΑ͏ʹͯ͠ ଊ͑Δ͔ʁ Motivation

Slide 7

Slide 7 text

7 -45'ʹର͢Δ5SBOTGPSNFS׆༻ͷ՝୊ • 5SBOTGPSNFSͷ௕ॴ • ࣗવݴޠॲཧ΍ը૾ॲཧͳͲʹݶΒͣܥྻ༧ଌʹ͓͍ͯ΋ߴ͍ਫ਼౓Λୡ੒ [Wu, 2020], [Yu, 2020] • ௕ظܥྻͷೖྗ಺Ͱͷؔ܎΍ΞϥΠϝϯτΛͱΔ͜ͱ͕Մೳ • 5SBOTGPSNFSͷ୹ॴ • ௕ظܥྻͷೖྗ΍ग़ྗͷܭࢉ΍ϝϞϦͷίετ͕ߴ্͘ख͍͔͘ͳ͍ • ௕ظܥྻͷਪ࿦ʹ͕͔͔࣌ؒΔ Motivation

Slide 8

Slide 8 text

8 -45'ʹର͢Δ5SBOTGPSNFS׆༻ͷ՝୊ • 5SBOTGPSNFSͷ௕ॴ • ࣗવݴޠॲཧ΍ը૾ॲཧͳͲʹݶΒͣܥྻ༧ଌʹ͓͍ͯ΋ߴ͍ਫ਼౓Λୡ੒ [Wu, 2020], [Yu, 2020] • ௕ظܥྻͷೖྗ಺Ͱͷؔ܎΍ΞϥΠϝϯτΛͱΔ͜ͱ͕Մೳ • 5SBOTGPSNFSͷ୹ॴ • ௕ظܥྻͷೖྗ΍ग़ྗͷܭࢉ΍ϝϞϦͷίετ͕ߴ্͘ख͍͔͘ͳ͍ • ௕ظܥྻͷਪ࿦ʹ͕͔͔࣌ؒΔ Motivation ͦ΋ͦ΋5SBOTGPSNFSͱ͸Կ͔ʁ

Slide 9

Slide 9 text

9 5SBOTGPSNFSͱ͸ʁ • (PPHMF͕ʮ"UUFOUJPOJTBMMZPVOFFEʯͰఏҊ [Vaswani, 2017] • ໊લͷͱ͓Γɼ"UUFOUJPO͕ओཁͳߏ੒ཁૉ • ࣗવݴޠॲཧ /-1 ͚ͩͰͳ͘ɼ࠷ۙͰ͸$7 ͳͲͷଞ෼໺Ͱ΋༷ʑͳݚڀͰ׆༻ FY#&35 (15 7J5 • 5SBOTGPSNFSҎલ͸$//ͱ3//͕த৺ What’s Transformer?

Slide 10

Slide 10 text

10 5SBOTGPSNFSҎલ 3FDVSSFOU/FVSBM/FUXPSL 3// What’s Transformer? Value Times

Slide 11

Slide 11 text

11 5SBOTGPSNFSҎલ 3FDVSSFOU/FVSBM/FUXPSL 3// What’s Transformer? Value Times

Slide 12

Slide 12 text

12 5SBOTGPSNFSҎલ $POWPMVUJPOBM/FVSBM/FUXPSL $// What’s Transformer? Value Times

Slide 13

Slide 13 text

13 5SBOTGPSNFS What’s Transformer? Value Times Transformer

Slide 14

Slide 14 text

14 5SBOTGPSNFSͷத਎ What’s Transformer? • جຊ͸&ODPEFS%FDPEFSϞσϧ Encoder Decoder

Slide 15

Slide 15 text

15 5SBOTGPSNFSͷத਎ What’s Transformer? • جຊ͸&ODPEFS%FDPEFSϞσϧ • 1PTJUJPOBM&ODPEJOHͱ͍͏֤ೖྗͷ Ґஔ৘ใΛϕΫτϧͰදݱ

Slide 16

Slide 16 text

16 5SBOTGPSNFSͷத਎ What’s Transformer? • جຊ͸&ODPEFS%FDPEFSϞσϧ • 1PTJUJPOBM&ODPEJOHͱ͍͏֤ೖྗͷ Ґஔ৘ใΛϕΫτϧͰදݱ • &ODPEFS %FDPEFSͱ΋ʹ .VMUJIFBEBUUFOUJPOͱ'FFE'PSXBSEͰ ߏ੒͞Εͨ5SBOTGPSNFS#MPDLͷੵΈॏͶ Transformer Block

Slide 17

Slide 17 text

17 1PTJUJPOBM&ODPEJOH What’s Transformer? • ೖྗͷ֤UPLFOͷҐஔΛϕΫτϧͰදݱ • جຊ͸ TJOؔ਺ͱDPTؔ਺Ͱ໌ࣔతʹఏڙ

Slide 18

Slide 18 text

18 1PTJUJPOBM&ODPEJOH What’s Transformer? from: https://kazemnejad.com/blog/transformer_architecture_positional_encodin

Slide 19

Slide 19 text

19 5SBOTGPSNFS#MPDL What’s Transformer? • &ODPEFSɼ%FDPEFSͱͱ΋ʹ5SBOTGPSNFS #MPDLͷੵΈॏͶͰߏ੒ • 5SBOTGPSNFS#MPDLͷߏ੒ .VMUJ)FBE4FMGBUUFOUJPO 3FTJEVBM$POOFDUJPO ࢒ࠩ઀ଓ -BZFS/PSNBMJ[BUJPO 1PTJUJPOXJTF'FFE'PSXBSE %SPQPVU Transformer Block

Slide 20

Slide 20 text

20 4FMGBUUFOUJPO What’s Transformer? ⼊⼒系列の 潜在表現 系列⻑ × 次元数 Key : K Query: Q 𝑊' 𝑊( Value: V Attention Map : M 系列⻑ × 系列⻑ 𝑊) 𝑊*+! Output 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑄𝐾, 𝑑 )

Slide 21

Slide 21 text

21 4FMGBUUFOUJPO What’s Transformer? Value: V Attention Map : M 系列⻑ × 系列⻑ 𝑊*+! Output 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑄𝐾, 𝑑 )

Slide 22

Slide 22 text

22 .VMUJ)FBE4FMGBUUFOUJPO What’s Transformer? • ઌఔͷ4FMGBUUFOUJPOͷܭࢉΛෳ਺ฒߦ ࣮ͯ͠ߦ • ෳ਺ͷϕΫτϧʹ੾Γ෼͚ͯܭࢉ͢Δ͜ͱ Ͱೖྗؒͷଟ༷ͳྨࣅੑΛൃݟ͠ɼΑΓ ଟ༷ͳදݱྗΛ֫ಘ

Slide 23

Slide 23 text

23 -45'ʹର͢Δ5SBOTGPSNFS׆༻ͷ՝୊ ࠶ܝ • 5SBOTGPSNFSͷ௕ॴ • ࣗવݴޠॲཧ΍ը૾ॲཧͳͲʹݶΒͣܥྻ༧ଌʹ͓͍ͯ΋ߴ͍ਫ਼౓Λୡ੒ [Wu, 2020], [Yu, 2020] • ௕ظܥྻͷೖྗ಺Ͱͷؔ܎΍ΞϥΠϝϯτΛͱΔ͜ͱ͕Մೳ • 5SBOTGPSNFSͷ୹ॴ • ௕ظܥྻͷೖྗ΍ग़ྗͷܭࢉ΍ϝϞϦͷίετ͕ߴ্͘ख͍͔͘ͳ͍ • ௕ظܥྻͷਪ࿦ʹ͕͔͔࣌ؒΔ Motivation ఏҊख๏*OGPSNFSͰղܾ

Slide 24

Slide 24 text

24 5SBOTGPSNFSͷ$PNQVUBUJPOBM$PNQMFYJUZ Method Complexity per Layer Convolutional 𝑂 𝐾 / 𝐷! / 𝐿 Recurrent 𝑂 𝐿 / 𝐷! Self-attention (Transformer) 𝑂 𝐿! / 𝐷 Method: • ௕ظܥྻͰͳ͚Ε͹ %-ͱͳΓܭࢉ࣌ؒ͸ͦ͜·Ͱ໰୊ͳ͍͕ɼ ௕ظܥྻΛೖྗͱ͢Δͱɼ-%ͱͳΓܭࢉ࣌ؒ΍ϝϞϦ͕໰୊ʹ • ߋʹɼ-BZFSΛੵΈॏͶΔ͜ͱͰܭࢉ͕࣌ؒ/ഒʹ K: the length of filter D: dimensionality of space L: input length N: Number of layers Computational Complexityは 𝑶 𝑵×(𝑳𝟐 & 𝑫)

Slide 25

Slide 25 text

25 *OGPSNFSʹΑΔ$PNQMFYJUZ࡟ݮ Method: ̎ͭͷ$PNQMFYJUZ࡟ݮख๏ΛఏҊ • 1SPC4QBSTFॏཁ౓ͷߴ͍"UUFOUJPO.BQͷΈΛར༻ 𝑂 𝐿) # 𝐷 ˠ 𝑂 𝐿 log 𝐿 # 𝐷 ʹ࡟ݮ • 4FMGBUUFOUJPO%JTUJMMJOH ηϧϑΞςϯγϣϯ૚Λग़Δ౓ ܥྻͷ௕͕͞൒෼ʹͳΔΑ͏ৠཹ 𝑂 N # ⋯ ˠ 𝑂 2 − 𝜖 # ⋯ ʹ࡟ݮ

Slide 26

Slide 26 text

26 4FMGBUUFOUJPO ࠶ܝ Method: ProbSparse ⼊⼒系列の 潜在表現 Key : K Query: Q 𝑊' 𝑊( Value: V Attention Map : M 系列⻑ × 系列⻑ 𝑊) 𝑊*+! Output 系列⻑(L) × 次元数(D) 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑄𝐾, 𝐷 )

Slide 27

Slide 27 text

27 1SPC4QBSTF Method: ProbSparse ⼊⼒系列の 潜在表現 系列⻑(L) × 次元数(D) Key : K Query: 2 𝑄 𝑊' 𝑊( Value: V Attention Map : M 系列⻑ × 系列⻑ 𝑊) 𝑊*+! Output 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 8 𝑄𝐾, 𝐷 ) 上位u件のQuery のみを利⽤

Slide 28

Slide 28 text

28 1SPC4QBSTF Method: ProbSparse ⼊⼒系列の 潜在表現 系列⻑(L) × 次元数(D) Key : K Query: 2 𝑄 𝑊' 𝑊( Value: V Attention Map: M 系列⻑ × 系列⻑ 𝑊) 𝑊*+! Output 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 8 𝑄𝐾⊺ 𝐷 ) 上位u件のQuery のみを利⽤ 1: その上位はどのように選択するのか? 2: 計算効率はどれぐらい良くなるの?

Slide 29

Slide 29 text

29 1SPC4QBSTFॏཁ౓ͷߴ͍2VFSZͷબ୒ Method: ProbSparse 2VFSZ2ͷߦϕΫτϧ𝑞* ʹର͢Δ"UUFOUJPO.BQͷॏΈ ͜ͷॏΈ͕Ұ༷෼෍͔Βҳ୤͍ͯ͠Δ΄Ͳɼ ॏཁ౓͕ߴ͍4QBSTJUZNFBTVSFNFOUͰࢉग़ Query: 2 𝑄 Attention Map : M 系列⻑ × 系列⻑ 9 5 𝑒𝑥𝑝 𝑞% , 𝑘5 𝐷 ∑6 𝑒𝑥𝑝 𝑞%, 𝑘6 𝐷 正規化項

Slide 30

Slide 30 text

30 1SPC4QBSTFॏཁ౓ͷߴ͍2VFSZͷબ୒ Method: ProbSparse "UUFOUJPO.BQࣗମ͕-POHUBJM%JTUSJCVUJPOͷੑ࣭Λ͍࣋ͬͯΔ

Slide 31

Slide 31 text

31 1SPC4QBSTFॏཁ౓ͷߴ͍2VFSZͷબ୒ Method: ProbSparse Ұ༷෼෍͔Β͔͚཭Ε͍ͯΔ͔Ͳ͏͔Ͱॏཁ౓͕ਪఆՄೳ

Slide 32

Slide 32 text

32 1SPC4QBSTFॏཁ౓ͷߴ͍2VFSZͷબ୒ Method: ProbSparse 4QBSTJUZNFBTVSFNFOU Ұ༷෼෍ͱ"UUFOUJPO.BQͷ,VMMCBDL-FJCMFSڑ཭ʹج͍ͮͨई౓ ͜ͷई౓ͷ஋ʹج͍ͮͯɼ্ҐV݅ͷRVFSZͷΈΛར༻ 𝑀 𝑞! , 𝐾 = 𝑙𝑛 ( "#$ % 𝑒 &!'" ⊺ ( − 1 𝐿) ( "#$ % 𝑞! 𝑘" ⊺ 𝐷

Slide 33

Slide 33 text

33 1SPC4QBSTFॏཁ౓ͷߴ͍2VFSZͷબ୒ Method: ProbSparse 𝐾𝐿 𝑞 𝑝 = 1 𝐿 9 57" $ 𝑙𝑜𝑔 1 𝐿8" − 𝑙𝑜𝑔𝑍% + 𝑞%𝑘5 𝐷 = 𝑙𝑜𝑔 1 𝐿8" − 𝑙𝑜𝑔𝑍% + 1 𝐿 9 57" $ 𝑞%𝑘5 𝐷 = 𝑙𝑜𝑔 1 𝐿8" − 𝑙𝑜𝑔 9 57" $ exp( 𝑞%𝑘5 𝐷 ) + 1 𝐿 9 57" $ 𝑞%𝑘5 𝐷 ⼀様分布とAttention MapのKL距離 𝑞 = 1 𝐿 𝑝 = 𝑒𝑥𝑝 𝑞"𝑘# 𝐷 ∑$ 𝑒𝑥𝑝 𝑞", 𝑘$ 𝐷 = 𝑒𝑥𝑝 𝑞"𝑘# & 𝐷% & ' 𝑍" ⼀様分布 Query 𝒒𝒊 における Attention Map 𝑀 𝑞", 𝐾 = 𝑙𝑛 ; #(& )! 𝑒 *"+# ⊺ , − 1 𝐿- ; #(& )! 𝑞"𝑘# ⊺ 𝐷 Sparsity Measurement 定数項 -1 × Sparsity Measurement ref: https://cookie-box.hatenablog.com/entry/2021/02/11/195

Slide 34

Slide 34 text

34 1SPC4QBSTFܭࢉޮ཰ͷ໰୊ Method: ProbSparse 4QBSTJUZ.FBTVSFNFOUͷܭࢉࣗମ͕𝑶 𝑳𝟐 ͜ͷܭࢉࣜΛҎԼͷΑ͏ʹLFZWFDUPS͔ΒαϯϓϦϯάʹΑΔۙࣅ Λߦ͏͜ͱͰ 𝑂 L log 𝐿 Λୡ੒ (Max − mean Measurementͷܗ) 𝑀 𝑞% , 𝐾 = 𝑙𝑛 9 57" $# 𝑒 :$;% ⊺ & − 1 𝐿' 9 57" $# 𝑞% 𝑘5 ⊺ 𝐷 L 𝑀 𝑞% , 𝐾 = max 5 𝑞% 𝑘5 ⊺ 𝐷 − 1 𝐿' 9 57" $# 𝑞% 𝑘5 ⊺ 𝐷

Slide 35

Slide 35 text

35 ,-ڑ཭≥ 0Ͱ͋Δ͜ͱ͔Βɼ𝐾𝐿 𝑞 𝑝 = 𝑙𝑜𝑔 > ?!" − 𝑀 𝑞* , 𝐾 ΑΓ ·ͨҎԼͷࣜʹΑΓ 1SPC4QBSTFܭࢉޮ཰ͷ໰୊ Method: ProbSparse 𝑀 𝑞%, 𝐾 = 𝑙𝑛 9 57" $# 𝑒 :$;% ⊺ & − 1 𝐿' 9 57" $# 𝑞% 𝑘5 ⊺ 𝐷 ≤ 𝑙𝑛 𝐿 P max 5 𝑒 :$;% ⊺ & − 1 𝐿' 9 57" $# 𝑞% 𝑘5 ⊺ 𝐷 = 𝑙𝑛𝐿 + max 5 :$;% ⊺ & − " $# ∑ 57" $# :$;% ⊺ & 𝑙𝑜𝑔𝐿 ≤ 𝑀 𝑞! , 𝐾 = G 𝑀 𝑞* , 𝐾 : Sparsity Measurementの近似解

Slide 36

Slide 36 text

36 1SPC4QBSTFΞϧΰϦζϜ Method: ProbSparse Sampling Sparsity Measurement Select Top-u from Q Calculate Attention Map

Slide 37

Slide 37 text

37 *OGPSNFSʹΑΔ$PNQMFYJUZ࡟ݮ ࠶ܝ Method: ̎ͭͷ$PNQMFYJUZ࡟ݮख๏ΛఏҊ • 1SPC4QBSTFॏཁ౓ͷߴ͍"UUFOUJPO.BQͷΈΛར༻ 𝑂 𝐿) # 𝐷 ˠ 𝑂 𝐿 log 𝐿 # 𝐷 ʹ࡟ݮ • 4FMGBUUFOUJPO%JTUJMMJOH ηϧϑΞςϯγϣϯ૚Λग़Δ౓ ܥྻͷ௕͕͞൒෼ʹͳΔΑ͏ৠཹ 𝑂 N # ⋯ ˠ 𝑂 2 − 𝜖 # ⋯ ʹ࡟ݮ

Slide 38

Slide 38 text

38 Self-attention Distilling Method: Self-attention Distilling ೖྗ͕4FMGBUUFOUJPO૚Λग़Δͨͼʹɼܥྻͷ௕͕͞൒෼ʹͳΔΑ͏ʹৠཹ ৠཹࣗମ͸෺ମݕग़[Yu, 2017]ͳͲʹ΋༻͍ΒΕ͍ͯΔܰྔԽख๏ͷ1छ Attention Blockから でるたびに蒸留

Slide 39

Slide 39 text

39 Self-attention Distilling Method: Self-attention Distilling K൪໨ͷK൪໨ͷϨΠϠʔʹೖྗ͢Δͱ͖ʹɼ$POWE LFSOFMXJEUI ͱ.BY1PPMJOHΛ௨͢͜ͱͰܥྻ௕Λѹॖ ϨΠϠʔ਺෼͔ΒഒҎԼͷ$PNQMFYJUZʹѹॖ 𝑂 N # ⋯ ˠ 𝑂 2 − 𝜖 # ⋯ 𝑋@A> B = 𝑀𝑎𝑥𝑃𝑜𝑜𝑙 𝐸𝐿𝑈 𝐶𝑜𝑛𝑣1𝑑 𝑋@ B

Slide 40

Slide 40 text

40 Method: Decoder Outputs through one forward "VUPSFHSFTTJWFʹΑΔਪ࿦ [Chen, 2019] /POBVUPSFHSFTTJWFʹΑΔਪ࿦ ճؼʹΑΒͳ͍ਪ࿦

Slide 41

Slide 41 text

41 Method: Decoder Outputs through one forward "VUPSFHSFTTJWFʹΑΔਪ࿦ [Chen, 2019] /POBVUPSFHSFTTJWFʹΑΔਪ࿦ ճؼʹΑΒͳ͍ਪ࿦ Informerはこれを採⽤ 推論が早くなるというメリット

Slide 42

Slide 42 text

42 Experiment: Dataset • &5'தࠃͷͭͷ஍Ҭͷిྗڙڅͷσʔλ BVUIPSTDPMMFDU • ͭͷܥྻ • ࣌ؒ୯Ґͱ෼୯Ґͷه࿥ • 5SBJOWBMUFTUNPOUIT • &$-ਓͷిྗফඅͷσʔλ [Li, 2019] • ࣌ؒ୯Ґͷه࿥ • 5SBJOWBMUFTUNPOUIT • 8FBUIFSΞϝϦΧ߹ऺࠃͷ஍఺ͷؾީσʔλ • ࣌ؒ୯Ґͷه࿥ • 5SBJOWBMUFTUNPOUIT

Slide 43

Slide 43 text

43 Experiment: Baseline and Evaluation Metric • #BTFMJOF • ARIMA • Prophet [Taylor, 2018] • LSTMa [Bahdanau, 2014] • LSTnet [Lai, 2018] • DeepAR [Salinas, 2020] • LogTrans [Li, 2019] • Reformer [Kitaev, 2019] • &WBMVBUJPO.FUSJDT • Mean Squared Error (MSE): " < ∑%7" < 𝑦 − Q 𝑦 # • Mean Absolute Error (MAE): " < ∑%7" < 𝑦 − Q 𝑦

Slide 44

Slide 44 text

44 Result: Univariate Time-series Forecasting • Informerが多くのデータセット+ 予測先において⾼い精度を達成 (特に⻑期予測) • Query Sparsityを採⽤しなかったInformerと⽐べても,Informerが⾼い精度を達成 Attentionが着⽬する場所を制限することの効果を⽰す

Slide 45

Slide 45 text

45 Result: Multivariate Time-series Forecasting • 多変量でもInformerが多くのデータセット+ 予測先において⾼い精度を

Slide 46

Slide 46 text

46 Result: Multivariate Time-series Forecasting

Slide 47

Slide 47 text

47 1SPC4QBSTFʹΑͬͯ௕ظ༧ଌ͕ϝϞϦ໰୊ʹࠔΒͣɼߴ͍ਫ਼౓Ͱ ༧ଌΛୡ੒ Result: Ablation Study "CMBUJPOPG1SPC4QBSTF

Slide 48

Slide 48 text

48 4FMGBUUFOUJPO%JTUJMMJOH΋ಉ༷ʹϝϞϦ໰୊ʹࠔΒͣɼ௕ظ༧ଌ͕ Մೳʹ Result: Ablation Study "CMBUJPOPG 4FMGBUUFOUJPO%JTUJMMJOH

Slide 49

Slide 49 text

49 4FMGBUUFOUJPO%JTUJMMJOH΋ಉ༷ʹϝϞϦ໰୊ʹࠔΒͣɼ௕ظ༧ଌ͕ Մೳʹ Result: Ablation Study "CMBUJPOPG 4FMGBUUFOUJPO%JTUJMMJOH

Slide 50

Slide 50 text

50 /POBVUPSFHSFTTJWFͳग़ྗʹΑͬͯ༧ଌޡࠩͷੵΈॏͶΛແࢹ Result: Ablation Study "CMBUJPOPGHFOFSBUJWFTUZMFEFDPEFS 推論速度も⻑期予測に おいて顕著な差

Slide 51

Slide 51 text

51 • ௕ظతͳܥྻ༧ଌΛ໨తͱͨ͠ɼܭࢉίετ΍ϝϞϦίετΛ ࡟ݮͨ͠Transformerͷมछ: InformerΛఏҊ͠༧ଌੑೳΛ޲্ • ॏཁ౓ͷߴ͍෦෼ͷΈΛAttention͢ΔProbSparseͱৠཹख๏Λ׆ ༻ͨ͠Self-attention DistillingʹΑͬͯϝϞϦίετΛ࡟ݮ • Non-autoregressiveͳग़ྗΛߦ͏͜ͱͰ௕ظ༧ଌʹ͓͍ͯ΋ਫ਼౓ Λҡ࣋ͨ͠ૣ͍ਪ࿦Λ࣮ݱ • ిྗσʔλ΍ؾީσʔλͳͲͷσʔλͰInformerͷ༗ޮੑΛࣔ͢ Result: Summary

Slide 52

Slide 52 text

52 [Wu, 2020] Wu, Neo, et al. "Deep transformer models for time series forecasting: The influenza prevalence case." arXiv preprint arXiv:2001.08317 (2020). [Yu, 2020] Yu, Cunjun, Xiao Ma, Jiawei Ren, Haiyu Zhao, and Shuai Yi. "Spatio-temporal graph transformer networks for pedestrian trajectory prediction." In European Conference on Computer Vision, pp. 507-523. Springer, Cham, 2020. [Vaswani, 2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). [Yu, 2017] Yu, Fisher, Vladlen Koltun, and Thomas Funkhouser. "Dilated residual networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [Chen, 2019] Chen, Nanxin, et al. "Listen and fill in the missing letters: Non-autoregressive transformer for speech recognition." arXiv preprint arXiv. [Li, 2019] Li, Shiyang, et al. "Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting." Advances in Neural Information Processing Systems 32 (2019): 5243-5253. [Taylor, 2018] Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37-45. [Bahdanau, 2014] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014). [Lai, 2018] Lai, Guokun, et al. "Modeling long-and short-term temporal patterns with deep neural networks." The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018. [Salinas, 2020] Salinas, David, et al. "DeepAR: Probabilistic forecasting with autoregressive recurrent networks." International Journal of Forecasting 36.3 (2020): 1181-1191. [Li, 2019] Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.-X.; and Yan, X. 2019. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Fore- casting. arXiv:1907.00235 . [Kitaev, 2019] Kitaev, N.; Kaiser, L.; and Levskaya, A. 2019. Reformer: The Efficient Transformer. In ICLR. Reference