論文紹介:Unsupervised word embeddings 
capture latent knowledge from materials science literature

論文紹介:Unsupervised word embeddings 
capture latent knowledge from materials science literature

Materials Informatics若手の会 第1回論文読み会(2020/7/5)で発表したスライドです。

もとの論文はこちら:
Tshitoyan, V., Dagdelen, J., Weston, L. et al.
Unsupervised word embeddings capture latent knowledge from materials science literature.
Nature 571, 95–98 (2019).
https://www.nature.com/articles/s41586-019-1335-8

論文本文はこちらから無料で読めます(DL不可)
https://www.nature.com/articles/s41586-019-1335-8.epdf?author_access_token=NB1RRPZTDGRDUyjJsVicPtRgN0jAjWel9jnR3ZoTv0P9QxlcO86f_GXZRxwYijrqVZp6i8RcDehbFoibDsaMWW41O3qexhJAZZaR8aHNX-gDwSaeWiSaMEe291D7g-msWZFrZ9mOotgkboEp2Pl1XQ%3D%3D

Materials Informatics若手の会はFacebook上のグループです。参加自由なので興味ある方はこちらから
https://www.facebook.com/groups/234530303620699

31bf3df4bac9b584c89bc11b0c84e132?s=128

Yuta Suzuki

July 05, 2020
Tweet

Transcript

  1. ླ໦༤ଠʢ૯߹ݚڀେֶӃେֶʣɹ Unsupervised word embeddings 
 capture latent knowledge from materials

    science literature Tshitoyan, V., Dagdelen, J., Weston, L. et al., Nature 571, 95–98 (2019). .BUFSJBMT*OGPSNBUJDTएखͷձ࿦จಡΈձ 
  2. ࣗݾ঺հ w3FTFBSDI*OUFSFTUɿػցֶशΛ༻͍ͨ෺࣭ɾࡐྉͷཧղ wػցֶशΛԠ༻ͨ͠෺࣭ܭଌٕज़ͷ։ൃ wओʹɺ9ઢΛ༻͍ͨ෺࣭ܭଌσʔλͷ෼ੳʢ࣓ੴɾి஑ͷճંɾ෼ޫͳͲʣ wܭଌσʔλʹ͓͚ΔσʔλϚΠχϯά w&EVDBUJPO w૯߹ݚڀେֶӃେֶɹߴΤωϧΪʔՃ଎ثՊֶݚڀՊ wখ໺ݚڀࣨ 1I% QSFTFOU

     w౦ژཧՊେֶɹجૅ޻ֶݚڀՊ wখ࢚ݚڀࣨ #4.4JO&OHJOFFSJOH  w'VOEFECZ+45"$5*+414%$ ླ໦༤ଠʢ:VUB46;6,*ʣhttps://resnant.github.io Ἒ৓ݝͭ͘͹ࢢɹߴΤωϧΪʔՃ଎ثݚڀػߏ 
  3. ࠓ೔঺հ͢Δ࿦จ • Unsupervised word embeddings capture latent knowledge from materials

    science literature
 ʢڭࢣͳ͠୯ޠຒΊࠐΈʹΑΔࡐྉՊֶจݙ͔Βͷજࡏ஌ࣝͷ֫ಘʣ
 Tshitoyan, V., Dagdelen, J., Weston, L. et al., Nature 571, 95–98 (2019). https://www.nature.com/articles/s41586-019-1335-8 w ҰݴͰݴ͑͹ɿ ࣗવݴޠॲཧʢ/-1ʣͷٕज़ʹΑΓࡐྉͷ࿦จΛαʔϕΠͯ͠ ະൃݟͷ஌ࣝΛநग़ͨ͠ݚڀ w ͳͥ͜ͷ࿦จΛબΜ͔ͩʁ w .*ʹ͓͚Δ/-1ͷԠ༻͸৽͠ΊͷྖҬɺڵຯ͋Γ w ݁Ռ͕Θ͔Γ΍͘͢ɺ෯޿͍ઐ໳ͷਓʑʹڵຯΛ࣋ͬͯ΋Β͑ͦ͏ w /BUVSFʹ௨Δ࿦จ͸΍͸Γߏ੒΍දݱ΋Α͍ɻֶͼ͕ଟ͍ 
  4. എܠɾղܾ͍ͯ͠Δ໰୊ w Պֶతͳ஌ࣝ͸࿦จͱͯ͠ग़൛͞ΕΔͨΊɺ౷ܭ΍ػցֶशͰ͸ѻ͍ͮΒ͍ w ஌ࣝͷྫɿ w ෺࣭ͷଐੑɿ w -J$P0ˠແػ෺࣭ ۚଐࢎԽ෺

    -JJPOೋ࣍ి஑ਖ਼ۃࡐྉ ʜ w ෺࣭ಉ࢜ͷؔ܎ੑɿ w ʮ4N'Fͱ/E'F#͸Ӭٱ࣓ੴͱ͍͏఺Ͱࣅ͍ͯΔʯͱ͔ w จݙதͷ୯ޠʢFH෺໊࣭ʣΛɺ ͦͷந৅తͳҙຯΛଊ͑ͨߴ࣍ݩϕΫτϧʢFNCFEEJOHʣͱͯ͠දݱ͠ ػցֶशͷ࿮૊ΈͰऔΓѻ͏͜ͱΛՄೳʹͨ͠ w ந৅తͳ֓೦ΛɺܭࢉػͰѻ͍΍͍͢ܗࣜͰදݱͨ͠ʢݻఆ௕ͷ਺ྻʣ w FNCFEEJOHɿʮຒΊࠐΈʯ΍ʮ෼ࢄදݱʯͱݺ͹ΕΔʢ/-1Ͱ͸සग़ʣ w ࡐྉʹؔ͢Δ஌ࣝΛ൓ө͍͍ͨ͠ײ͡ͷۭ͕ؒ͋Δͱͯ͠ɺ ʮ೚ҙͷ୯ޠΛͦͷۭؒʹࣸ૾͢Δؔ਺ΛֶशʹΑΓ֫ಘͨ͠ʯͱ͍͏ݴ͍ํ΋Ͱ͖Δ 
  5. ॏཁͳٕज़ʹ͍ͭͯิ଍ w%FFQ/FVSBM/FUXPSLʢ%//ʣͷ͓͞Β͍ wೖྗ૚ˠӅΕ૚ˠग़ྗ૚ͷߏ੒ wʢී௨ͷॱ఻ൖܕ//Λલఏʹʣ wӅΕ૚͸ɺೖྗσʔλͷந৅తͳಛ௃Λදݱ w͜ͷग़ྗ͕FNCFEEJOH wͭ·ΓɺԿΒ͔ͷλεΫͰ܇࿅ͨ͠%//Λ࢖͑͹FNCFEEJOH͕ಘΒΕΔ wFNCFEEJOHͷग़ྗࣗମ͸େ఍ͷ%//ͰՄೳʢ8PSE7FDʹݶΒͳ͍ʣ w໰୊͸ɺͲΜͳ%//ΛͲΜͳλεΫͰ܇࿅͢Δ͔ʁ w͜͜Ͱ8PSE7FD͕ొ৔ʢ೥ʣ

    8PSE7FDɿ୯ޠΛFNCFEEJOHʹม׵͢Δ%// ೖྗ૚ ӅΕ૚ ग़ྗ૚ FNCFEEJOH Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean.
 Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 (2013)  ʢݫີʹ͸8PSE7FD͸EFFQͰ͸ͳ͍͕ʜʣ
  6. ॏཁͳٕज़ʹ͍ͭͯิ଍ w8PSE7FD͸ɺ୯ޠຒΊࠐΈʹಛԽͨ͠Ϟσϧͱɺ৽ֶ͍͠श༻λεΫΛఏҊ wTLJQHSBNʢࠓճ͸ͪ͜ΒʹϑΥʔΧεʣ w$POUJOVPVT#BHPG8PSET wจষதͷ୯ޠʹ͍ͭͯɺͦͷલޙͷ୯ޠΛ༧ଌ͢Δ wจষ͑͋͞Ε͹ڭࢣσʔλΛੜ੒Ͱ͖Δ w୯ޠಉ࢜ͷҙຯΛଊ͑Δͷʹదͨ͠λεΫ wߴ඼࣭ͳFNCFEEJOH͕֫ಘͰ͖Δ w͜ΕͰ܇࿅ͨ͠%//Λ༻͍ͯFNCFEEJOHΛग़ྗ wʢླ໦͸/-1ʹৄ͘͠ͳ͍ͷͰઆ໌ʹޡΓ͕͋Ε͹

    ɹͥͻڭ͑ͯԼ͍͞ʣ 8PSE7FDɿ୯ޠΛFNCFEEJOHʹม׵͢Δ%// TLJQHSBNͷྫ Word2Vec࿦จΑΓҰ෦վม  ྫɿlEBUBz * IBWF UP MFBSO EBUB
  7. ໨తɾख๏ w໨తɿେྔͷ࿦จ͔ΒɺࡐྉՊֶͷ஌ࣝΛ൓өͨ͠୯ޠFNCFEEJOHΛಘΔ wख๏ w೥ʹग़൛͞Εͨສ݅ͷࡐྉܥ࿦จͷΞϒετΛऩू wؚ·ΕΔ୯ޠ਺͸໿ສ w͜ͷσʔλͰTLJQHSBNܥ8PSE7FDϞσϧΛ܇࿅ɺ ֤୯ޠʹ͍ͭͯ࣍ݩͷFNCFEEJOHΛಘͨ w࿦จதͰ͸ɺ͜ͷख๏ΛNBUWFDͱ໋໊ wίʔυ΋ެ։͞Ε͍ͯΔ https://github.com/materialsintelligence/mat2vec

    'JHBɿ-J$P0ͱ-J.O0ʹ͓͚Δ TLJQHSBN༧ଌͷྫ 
  8. ͭͷ݁Ռ͕ड़΂ΒΕ͍ͯΔ  FNCFEEJOHͷੑ࣭ʹ͍ͭͯ  FNCFEEJOHΛ༻͍ͨະ஌ͷࡐྉ෺ੑͷ༧ଌ 

  9. ݁ՌɿFNCFEEJOHͷੑ࣭ʹ͍ͭͯ wFNCFEEJOHΛ௿࣍ݩͰՄࢹԽͯ͠෼ੳ wࡐྉͷ஌ࣝΛ൓өͨ͠FNCFEEJOH͕ಘΒΕͨͱ݁࿦͚͍ͮͯΔ w1$"ͷ݁Ռ͸ɺ݁থߏ଄΍Խֶঢ়ଶΛଊۭ͕͑ͨؒ ಘΒΕ͍ͯΔ͜ͱΛࣔࠦ 'JHC  wݩૉ໊ͷFNCFEEJOHΛU4/&ͰՄࢹԽ͢Δͱɺ पظදͷߏ଄Λଊ͑ͨΫϥελʔ͕Ͱ͖ͨΑ͏ʹݟ͑Δʢ&YUFOEFE%BUB'JHBʣ Fig.1b

    Extended Data Fig. 1a 
  10. ݁ՌɿFNCFEEJOHͷੑ࣭ʹ͍ͭͯ wFNCFEEJOHϕΫτϧͷԋࢉʹΑΔΞφϩδʔ΋ݕূ wݩωλ͸ɺ8PSE7FD࿦จʹ͓͚ΔLJOHNBO XPNBORVFFOͳͲ w෺໊࣭ɿϔϦ΢Ϝ)F 'Fమ wओͳࢎԽ෺ɿ"M0"M 4J4J0 Extended Data

    Table 1 
  11. ݁Ռɿະ஌ͷࡐྉ෺ੑͷ༧ଌ wFNCFEEJOHͷྨࣅ౓͔Β৽͍͠೤ిࡐྉΛ༧ଌ wࡐྉ໊ͱzUIFSNPFMFDUSJDzͷFNCFEEJOHͷྨࣅ౓͕େ͖͚Ε͹ɺ ͦͷࡐྉ͸೤ిࡐྉͳͷͰ͸ɺͱ͍͏ΞΠσΟΞ wൃݟͨ͠ީิ෺࣭ʹ͍ͭͯ%'5ܭࢉʹΑΓ೤ిޮՌͷύϫʔϑΝΫλʔ;5Λਪఆɺ ༗๬ͦ͏ͳ৽͍͠೤ిࡐྉΛݟ͚ͭͨʢ'JHB Cʣ w֤ࡐྉ໊͔ΒzUIFSNPFMFDUSJDzʹܨ͕ΔύεΛՄࢹԽʢ'JHDʣҙຯ͋Δͷ͔͸Ṗ 

  12. ݁Ռɿະ஌ͷࡐྉ෺ੑͷ༧ଌ w෺ੑ༧ଌੑೳͷIJTUPSJDBMWBMJEBUJPO w͋Δ೥·Ͱͷ࿦จͰֶशͨ͠ϞσϧͰɺ೤ిࡐྉͷީิΛྨࣅ౓ॱʹ݅ग़ྗɺ ͦͷ೥Ҏ߱ʹ೤ిࡐྉͱͯ͠ใࠂ͞Εͨ਺ΛධՁʢ'JHBʣ wྫɿ೥·Ͱͷ࿦จͰֶशɺ೥ͷ࿦จͰݕূ wಉ༷ʹڧ࣓ੑࡐྉɺޫىిࡐྉɺτϙϩδΧϧઈԑମʹ͍ͭͯ΋ݕূ ʢ&YUFOEFE%BUB'JHʣ Fig. 3a Extended

    Data Fig. 2 
  13. ·ͱΊɾίϝϯτ w·ͱΊ wࣗવݴޠॲཧͷٕज़Λ༻͍ͯࡐྉͷ࿦จΛ෼ੳ͢Δ͜ͱͰɺ֤୯ޠʹ͍ͭͯ Պֶతͳ஌ࣝΛ൓өͨ͠FNCFEEJOHΛग़ྗ͢Δख๏ʢNBUWFDʣΛ։ൃͨ͠ wࡐྉ໊ͱ෺ੑͷFNCFEEJOHͷྨࣅ౓Λ༻͍ͯɺ ͋Δࡐྉ͕͜Ε·Ͱ஌ΒΕ͍ͯͳ͔ͬͨ෺ੑΛࣔ͢͜ͱΛ༧ଌͨ͠ wίϝϯτ w৽ࡐྉͷ։ൃʹ໾ཱ͔ͭ͸ผͱͯ͠ɺίϯηϓτͱ݁Ռͷݟͤํ͕ྑ͍ w୯ޠͱͯ͠ࡐྉΛදݱ͢Δʹ͸ݶք͕͋Γɺ͜ͷํ޲ੑ͚ͩͰ͸ൃల͸೉͍͠ʁ w૊੒ࣜ΍ݩૉ໊͚ͩͰ͸ࡐྉΛදݱͰ͖ͳ͍ɻඍࡉ૊৫ͱ͔݁থߏ଄ͱ͔

    wٯʹɺ։ൃݱ৔Ͱͬ͘͟Γ࢖͏ͳΒࣗવݴޠʹΑΔදݱ͸࢖͍΍͍͢ͷ͔΋ w/-1Λ࢖ͬͨจݙαʔϕΠ͸ɺࡐྉ߹੒ͷϓϩηε༧ଌͷํ໘Ͱ͸ಛʹ༗๬ʹࢥ͏ 
  14. ؔ࿈ݚڀ • Learning atoms for materials discovery
 Quan Zhou, Peizhe

    Tang, Shenxiu Liu, et al., PNAS 115 (28), E6411-E6417 (2018).
 https://www.pnas.org/content/115/28/E6411 wࠓճͷݚڀʹܨ͕Δ࿦จ w"UPN7FDΛఏҊɺݩૉ໊ͷFNCFEEJOHΛ֫ಘ w໊લ௨Γ8PSE7FDͷѥछ w݁থதͷ͋Δݪࢠʹ͍ͭͯɺ ͦͷۙ๣ͷݩૉ໊Λ༧ଌ͢ΔTLJQHSBNΛ༻͍Δ