Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[解説資料] Interpretable Machine Learning for Scien...

Shota Kato
September 16, 2024

[解説資料] Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

進化的アルゴリズムに基づくシンボリック回帰のライブラリであるPySRの論文についてまとめた資料です。
元論文: https://arxiv.org/abs/2305.01582

Shota Kato

September 16, 2024
Tweet

More Decks by Shota Kato

Other Decks in Research

Transcript

  1. Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl Miles

    Cranmer ࿦จϦϯΫɿhttps://arxiv.org/abs/2305.01582 ژ౎େֶ Ճ౻ ↅଠ ͱ͘ʹ஫ऍ͕ͳ͍ݶΓɼਤද΍ࣄྫ͸঺հ࿦จ͔ΒͷҾ༻Ͱ͢ ˞͸঺հऀͷίϝϯτͰ͢
  2. l 4ZNCPMJD3FHSFTTJPO 43 ͷ֓؍Λ೺Ѳ͢ΔͨΊɽ l 1ZUIPOͰ࢖༻Ͱ͖Δ43ϥΠϒϥϦͷதͰ ͓ͦΒ͘࠷΋༗໊Ͱɼཧղ͢΂͖ͱ൑அͨͨ͠Ίɽ l ඃҾ༻਺ɿʢ(PPHMF4DIPMBSʣ (JU)VCTUBS਺ɿL̈ʢ೥݄೔࣌఺ʣ

    ຊ঺հͷ໨త 1Z43ͷΞϧΰϦζϜΛཧղ͢Δɽ ˞1Z43͸4ZNCPMJD3FHSFTTJPOKMΛݺͼग़͍ͯ͠Δɽ ຊࢿྉͰࣔ͢࿦จதͷ஋ͱ࠷৽ͷ1Z43ͷσϑΥϧτ஋͸ҟͳΔ৔߹ ͕͋ΔͨΊ஫ҙɽ ͳͥ͜ͷ࿦จΛબΜ͔ͩʁ 1 †: https://github.com/MilesCranmer/PySR
  3. PySRͷΞϧΰϦζϜʢAlgorithm 1ʣ 2 ೖྗ දݱΛݟ͚͍ͭͨ σʔληοτ 𝑋 𝐻 = pysr(𝑋)

    ਐԽతΞϧΰϦζϜ [Koza, 1994]ͷվྑ൛ ग़ྗ ֤ෳࡶ౓ʹରͯ͠ ࠷΋ਖ਼֬ͳදݱ 𝐻 ॳظूஂͷ࡞੒ 𝐿ݸͷ਺ࣜΛؚΉू߹𝑃! ɼ࠷΋ਖ਼֬ͳදݱΛอଘ͢Δू߹𝑀! ͱ𝐻Λ ࡞੒͢Δ (𝑖 = 1, … , 𝑛") ɽ ਐԽ୯७Խ࠷దԽ 1) 𝑃! ʹਐԽΛద༻͢Δɽ 2) 𝑃! தͷ֤਺ࣜΛ୯७Խ͠ɼ࠷దԽͰఆ਺ΛٻΊΔɽ ࠷΋ਖ਼֬ͳදݱͷอଘ 1) ֤ෳࡶ౓ʹରͯ͠𝑃! ͷதͰ࠷΋ਖ਼֬ͳදݱΛ𝑀! ͱ͢Δɽ 2) 𝑀! ∪ 𝐻ͷதͰ࠷΋ਖ਼֬ͳදݱΛ𝐻ͱ͢Δɽ Ҡॅ 𝑃! தͷ֤਺ࣜΛ֬཰𝛼Ͱ𝑀! (𝑖 ≠ 𝑗)ʢ·ͨ͸𝐻ʣͷཁૉͰஔ͖׵͑Δɽ 𝑖 = 1, … , 𝑛" ʹରͯ͠ਐԽʙҠॅΛద༻ͯ͠ 𝑃! , 𝑀! , 𝐻Λߋ৽͢Δɽ 𝑛#$%& ճ܁Γฦͨ͠Βऴྃ͢Δɽ
  4. 𝐿ݸͷ਺ࣜΛؚΉू߹𝑃! ɼ࠷΋ਖ਼֬ͳදݱΛอଘ͢Δू߹𝑀! ͱ𝐻Λ࡞੒͢Δ (𝑖 = 1, … , 𝑛" )

    ɽ l σϑΥϧτͷύϥϝʔλ § ֤ू߹ʹؚ·ΕΔ਺ࣜͷ਺ɿ𝐿 = 1000 § ू߹ͷ਺ɿ𝑛" = 40 l ॳظूஂʹؚ·ΕΔ਺ࣜͷෳࡶ౓͸ͱ͢Δɽ ॳظूஂͷ࡞੒ 3 ෳࡶ౓ ਺ࣜΛ໦ߏ଄Ͱදͨ͠ͱ͖ͷϊʔυͷ਺ ʢʹม਺ɾԋࢉࢠɾఆ਺ͷ૯਺ʣɽ ˞Ϣʔβಠࣗͷෳࡶ౓ͷఆٛ΋Մೳɽ 0.86 𝑦 1.15 × + 𝑥' 3.2 + ෳࡶ౓ɿ3 ෳࡶ౓ɿ5
  5. τʔφϝϯτબ୒ΞϧΰϦζϜʢઆ໌͸Qʣʹج͍ͮͯબ୒ͨ͠਺ࣜʹରͯ͠ɼ ಥવมҟ·ͨ͸ަ伹Λ𝑛# ճ܁Γฦ͠ద༻ͯ͠ߋ৽͢Δɽ l ಥવมҟʢཚ਺ͷ஋ʼ 𝑝(&)** ͷͱ͖ʣ l ࣄલʹܾΊͨنଇͷ͍ͣΕ͔Λ༻͍ͯ਺ࣜΛมߋ͢Δʢઆ໌͸Qʣɽ l

    ަ伹ʢཚ਺ͷ஋ ≤ 𝑝(&)** ͷͱ͖ʣ  ͭͷ਺ࣜ𝐸' ͱ𝐸+ ͷҰ෦ΛೖΕସ͑ͯ ɼ ৽ͨͳ਺ࣜ𝐸' ∗ͱ𝐸+ ∗Λੜ੒͢Δɽ  ࠷΋ݹ͍ͭͷ਺ࣜΛ 𝐸' ∗ ͱ 𝐸+ ∗ʹஔ͖׵͑Δɽ  ࡞੒ͨ͠਺ࣜ𝐸' ∗ ͱ 𝐸+ ∗ ੍͕໿৚݅Λຬͨ͢·ͰrΛ܁Γฦ͢ɽ ਐԽʢAlgorithm 2ʣ 4 0.86 𝑦 1.15 × + 𝑦 𝑥 ^ ަ伹 𝑥 0.86 + 𝑦 ^ 𝑦 1.15 ×
  6. l Թ౓ʹґଘ͢Δ֬཰Ͱ਺ࣜ𝐸Λಥવมҟͤ͞Δɽ l ಥવมҟͷύλʔϯ͸௨Γʢύλʔϯ𝑖Λ༻͍Δ֬཰͸ύϥϝʔλ 𝑤! ͱͯ͠ઃఆ͢Δʣɽ  ఆ਺Λมߋ͢Δɽ  ԋࢉࢠΛมߋ͢Δɽ

     ࠜϊʔυ͔ࢠϊʔυʹϥϯμϜͳϊʔυΛ௥Ճ͢Δɽ  ϥϯμϜͳϊʔυΛૠೖ͢Δɽ  ͋ΔϊʔυͱͦͷϊʔυҎԼͷࢠϊʔυΛ ఆ਺·ͨ͸ม਺ʹஔ͖׵͑Δɽ  ࣜΛ୯७Խ͢Δɽ  શ͘৽͍͠਺ࣜʹมߋ͢Δɽ  Կ΋͠ͳ͍ɽ ಥવมҟʢAlgorithm 3ʣ 5 0.86 𝑦 1.15 × − 0.86 𝑦 1.15 × + ྫʣύλʔϯ2 ԋࢉࢠͷมߋ
  7. ਺ࣜू߹𝑃͔Β𝑛$ ݸͷ਺ࣜΛؚΉαϒू߹𝑄Λ࡞੒͠ɼ𝑄 ͔ΒͭͷཁૉΛબ୒͢Δɽ l ࠷ྑʹଛࣦ͕࠷খɽଛࣦ͸༧ଌޡ͕ࠩେ͖͍਺ࣜ΍ಉ͡ෳࡶ౓ͷ਺ࣜΛආ͚ΔΑ͏ʹઃܭɽ l σϑΥϧτͷύϥϝʔλɿ𝑛* = 12ɼ 𝑝-./012341-

    = 0.90 τʔφϝϯτબ୒ʢAlgorithm 4ʣ 6 :&4 1.21 $ 2.4 × + ( − ࠷ྑͷཁૉ 𝐸 rand() ≥ 𝑝-./012341- PS𝑙𝑒𝑛𝑔𝑡ℎ(𝑄) == 1 /0 𝐸 Λ 𝑄 ͔Β࡟আ͠ɼ࢒Γͷ਺ࣜ ͷதͰ࠷ྑͷཁૉΛ 𝐸 ͱ͢Δɽ 𝐸 Λग़ྗ͢Δɽ ࠷ྑͷཁૉ 𝐸 Λ 𝑄 ͔Βબ୒͢Δɽ ਺ࣜू߹ 𝑃ʢ਺ࣜ 𝐿 ݸʣ αϒू߹ 𝑄ʢ਺ࣜ 𝑛* ݸʣ 0.86 % 1.15 × + ! " ^ 1.21 $ 2.4 × + ( −
  8. 1Z43Ͱ͸ෳ਺ͷ਺͕ࣜಘΒΕΔɽ࠷ऴతʹɼ ʢMPTTʻ࠷খͷMPTTYʣΛຬͨ͢਺ࣜͷதͰɼTDPSF͕࠷ߴͷ਺ࣜΛग़ྗ͢Δɽ 𝑠𝑐𝑜𝑟𝑒 = − log( OPQQ! OPQQ!"# )/(𝐶R −

    𝐶RST ) 𝑙𝑜𝑠𝑠! ਺ࣜ 𝑖 ͷMPTT ༧ଌޡࠩ ɼ 𝐶! ਺ࣜ 𝑖 ͷෳࡶ౓ ௚ײతʹ͸ɼෳࡶ౓ͷ্͕Γ෯ʹରͯ͠MPTT͕େ෯ʹݮগͨ͠ͱ͖ͷ਺ࣜΛग़ྗ͢Δɽ ࠷ऴతʹग़ྗ͢Δ਺ࣜ͸Ҿ਺ʢNPEFM@TFMFDUJPOʣͰઃఆՄೳʢ࠷௿MPTTPS࠷ߴTDPSFʣɽ ࠷ऴతʹಘΒΕΔ਺ࣜ 7 1Z43Λ༻͍ͨͱ͖ͷग़ྗྫɽෳ਺ͷෳࡶ౓Λ༗͢Δ਺ࣜΛMPTTͱTDPSFͱͱ΋ʹग़ྗ͢Δɽ
  9. 1Z43͸ɺՊֶσʔλʹಛԽͨ͠43ϥΠϒϥϦ l ϚϧνίΞɾϚϧνϊʔυͰͷ෼ࢄॲཧ͕ՄೳͰɼ େن໛σʔλʹ΋ద༻Մೳɽ l ༷ʑͳσʔλʹద༻͢ΔͨΊͷػೳΛ࣋ͭɽ ྫ͑͹ɼΨ΢εաఔճؼʹΑΔϊΠζআڈɼ ޯ഑ϒʔεςΟϯάʹجͮ͘ಛ௃ྔબ୒ͳͲɽ l 1ZUIPOͱ+VMJBͰར༻ՄೳͳΦʔϓϯιʔεϥΠϒϥϦ

    l ֦ுੑ͕ߴ͘ɼࣗ࡞ͷԋࢉࢠ΍ଛࣦؔ਺Λ؆୯ʹಋೖͰ͖Δɽ l 4ZN1Zɼ1Z5PSDIɼ+"9ͱͷ࿈ܞ͕༰қͰɼଞπʔϧͱͷ ૊Έ߹Θͤ΋ॊೈɽ $PMBCσϞϦϯΫ IUUQTDPMBCSFTFBSDIHPPHMFDPNHJUIVC.JMFT$SBONFS1Z43 CMPCNBTUFSFYBNQMFTQZTS@EFNPJQZOC ·ͱΊʴଞͷSRख๏ͱൺ΂ͨಛ௃ 8 PySR Eureqa Compiled X X Multi-core X X Multi-node X ⇥ Scalability GPU-capable ⇥ ⇥ No pre-training X X Denoising X X Feature selection X X Differential equations ⇥ X High-dimensional ⇥ ⇥ Practicality Full Pareto curve X X API X ⇥ SymPy Interface X ⇥ Interfacing Deep Learning export X ⇥ Expressivity score 4 5 Open-source X ⇥ Real Constants X X Custom operators X ⇥ Discontinuous operators X X Custom losses X X Symbolic Constraints X ⇥ Custom complexity X X Extensibility Custom types X ⇥ Citation [self] [11] - Code ￿ ￿ Expressivity scores: (1a) Pre-trained on equations gener 13