Slide 1

Slide 1 text

TabNet: Attentive Interpretable Tabular Learning ಡΈձ@2021/01/05 ༶໌఩

Slide 2

Slide 2 text

• ஶऀ • Sercan O. Arik, Tomas Pfister • Google Cloud AI • ग़య: ArxivͷPreprint • ICLR 2020ͰϦδΣΫτ͞Εͨ࿦จ ࿦จ৘ใ

Slide 3

Slide 3 text

• ςʔϒϧσʔλ޲͚ͷDNNϞσϧ • ܾఆ໦ͱNNϞσϧͷ͍͍ͱ͜औΓΛ໨ࢦͨ͠ख๏ • ղऍੑ + ਫ਼౓ ͷ޲্͕ୡ੒Ͱ͖ͨɽ ֓ཁ ͲΜͳ࿦จʁ

Slide 4

Slide 4 text

• DNNͷϞσϧ͕ಛʹը૾,ݴޠ,Ի੠ͷ෼໺ͰSOTAͰ͋Δɽ • KaggleͳͲͷ෼ੳίϯϖͰ͸ॳΊʹܾఆ໦ϕʔεͷख๏͕ओྲྀ • ղऍੑ͕ߴ͍͔Β ং࿦ ݚڀഎܠ

Slide 5

Slide 5 text

• ͳΜͰςʔϒϧσʔλʹରͯ͠ɼਂ૚ֶशΛऔΓೖΕ͍ͨͷ͔ʁ • େن໛ͳσʔληοτʹ͍ͨͯ͠͸ɼਂ૚ֶशʹΑͬͯ޲্͕ظ଴Ͱ͖Δ ͔Β • Deep Learning Scaling is Predictable, Empirically.(Hestness et al., 2017) ং࿦ ݚڀഎܠ

Slide 6

Slide 6 text

• ςʔϒϧσʔλʹରͯ͠NNϞσϧΛ࢖͏3ͭͷϝϦοτ 1. ෳ਺ͷσʔλΛޮ཰Α͘ΤϯίʔσΟϯάͰ͖Δ 2. ಛ௃ྔΤϯδχΞϦϯάͷखؒΛݮΒͤΔ 3. End-to-endͰѻ͏͜ͱ͕Ͱ͖Δɽ ং࿦ ݚڀഎܠ

Slide 7

Slide 7 text

• σʔλͷલॲཧΛߦΘͣʹend-to-endͰͷֶशΛߦ͑Δɽ • ஞ࣍஫ҙΛ༻͍Δ͜ͱͰղऍੑͷߴ͍Ϟσϧʹͳ͍ͬͯΔɽ • Local interpretability: ೖྗಛ௃ͷॏཁ౓ • Global interpretability: ֤ಛ௃ྔ͕Ϟσϧʹରͯ͠Ͳͷ͘Β͍Өڹ͔ͨ͠ ং࿦ ఏҊख๏ͷߩݙ

Slide 8

Slide 8 text

• DNN+DT • ஞ࣍஫ҙΛ༻͍ͯɼಛ௃બ୒Λߦ͍ಛ௃ΛೖΕࠐΜͰ͍Δɽ • Tree-based learning • ಛ௃બ୒ʹDNNΛ༻͍͍ͯΔɽ • Feature Selection • ίϯύΫτͳදݱ͕Ͱ͖ͨɽ ؔ࿈ݚڀ

Slide 9

Slide 9 text

• Attentive transformer • ಛ௃ྔʹରͯ͠࢖͏MaskͷֶशΛߦ͏ɽ • Feature transformer • ಛ௃ྔͷม׵ɼ࣍εςοϓʹ࢖͏΋ͷΛܾΊΔɽ ఏҊख๏ ॏཁͳύʔπ

Slide 10

Slide 10 text

• ͜ΕҎ߱ग़ͯ͘Δ ͸εςοϓ1,2,...ʹରԠ͍ͯ͠Δ i ఏҊख๏ શମͷߏ଄

Slide 11

Slide 11 text

• • : աڈͷMͰ࢖ΘΕ͍ͯΔ͔ʁʹΑͬͯ มΘΔॏΈ(࣮૷Ͱ͸ར༻੍ݶΈ͍ͨͳ΋ͷ) • Sparsemax: softmaxʹࣅͨ׆ੑԽؔ਺ M[i] = sparsemax(P[i] ⋅ hi (a[i − 1])) P[i] ఏҊख๏ Attentive Transformer: ϚεΫͷֶशΛߦ͏ɽ

Slide 12

Slide 12 text

• SoftmaxΑΓ΋ૄʹͳΓ΍͍͢ ͔Βɼॏཁͳಛ௃ྔΛऔΓग़ ͠΍͍͢ ίϥϜ SparseMax (Andre et al., 2016)

Slide 13

Slide 13 text

• ɼa͸࣍ͷεςοϓʹճ͞ΕΔ [d[i], a[i]] = fi (M[i] ⋅ f) ఏҊख๏ Feature Transformer: ೖྗΛม׵͠ɼ࣍ʹ࢖͏΋ͷΛܾΊΔ

Slide 14

Slide 14 text

• ֤εςοϓ Λूܭ ͯ͠࠷ऴతͳ༧ଌʹ ༻͍Δ d[i] ఏҊख๏ ࠷ऴ༧ଌ

Slide 15

Slide 15 text

• ಛ௃ྔͷॏཁ౓͸ϚεΫΛ࢖ͬͯܭࢉ͢Δ • ؆୯ʹܭࢉ͢ΔͨΊɼϚεΫͰ͸ͳ͘ಛ௃ྔΛ༻͍Δ ɹɹɹ ɹˠͲͷαϯϓϧ͕ॏཁ͔ʁ • → ಛ௃ྔͷॏཁ౓ ηb [i] = Nd ∑ c ReLU(db,c [i]) Magg−b,j = ∑Ns teps i=1 ηb [i]Mb,j [i] ∑D j=1 ∑Nsteps i=1 ηb [i]Mb,j [i] ఏҊख๏ ղऍੑʹ͍ͭͯ

Slide 16

Slide 16 text

• Feature selection͕֤εςοϓʹରԠ ఏҊख๏ ಛ௃ྔબ୒ͷΠϝʔδ

Slide 17

Slide 17 text

• ֤ϚεΫʹΑͬͯ࡞ΒΕΔಛ௃ྔ͕෼ذʹରԠ͍ͯ͠Δɽ ఏҊख๏ Ͳ͕ܾ͜ఆ໦Ά͍ͷʁ

Slide 18

Slide 18 text

• ର߅ख๏: • ޯ഑ϒʔεςΟϯάܥ: LightGBM, XGBoost, CatBoost • NNϞσϧ • ͳʹͰൺ΂Δ͔ʁ • ςετσʔλʹର͢Δaccuracy • ϞσϧͷαΠζ ࣮ݧ ࣮ݧઃఆ

Slide 19

Slide 19 text

• ࣮σʔλ(ForestCoverType)Ͱ͸ର߅ख๏ΑΓ΋ਫ਼౓͕ྑ͔ͬͨɽ ࣮ݧ݁Ռ ਫ਼౓ʹؔͯ͠

Slide 20

Slide 20 text

• ϞσϧαΠζ͕ܰྔͰ΋ਫ਼౓͕͍͍ɽ ࣮ݧ݁Ռ ϞσϧαΠζʹؔͯ͠

Slide 21

Slide 21 text

࣮ݧ݁Ռ ղऍੑʹ͍ͭͯ • ͷ݁ՌΛՄࢹԽ • ߦ͕αϯϓϧɼྻ͕ಛ௃ྔ • ന͍ͱ͜Ζ͕ಛ௃ྔͱͯ͠ॏཁ ͱ൑அͨ͠ͱ͜Ζ ηb [i]

Slide 22

Slide 22 text

• ஞ࣍஫ҙΛߦ͏͜ͱͰɼॏཁͳಛ௃ྔબ୒Λߦͳ͍ͬͯΔɽ • ϚεΫΛ༻͍Δ͜ͱͰղऍੑͷߴ͍Ϟσϧʹͳͬͨɽ • ༷ʑͳྖҬͷςʔϒϧσʔλͰ΋ੑೳΛൃشͰ͖Δ͜ͱΛࣔ͠ ͨɽ ·ͱΊ

Slide 23

Slide 23 text

• Accuracy: 0.81, ROC-AUC: 0.78 ͓·͚ TitanicσʔληοτͰTabNetΛ༡ΜͰΈͨɽ

Slide 24

Slide 24 text

͓·͚ LightGBM vs NN model vs TabNet LightGBM NN model • TabNet: Accuracy: 0.81, ROC-AUC: 0.78 https://github.com/mei28/playground_python/blob/main/notebooks/titanic.ipynb ϋΠύϥ͸ॳظ஋ͷ··Ͱ νϡʔχϯάΛߦͳ͍ͬͯͳ͍