Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SNLP2025 中石 発表スライド (Physics of Language Models:...

Avatar for Kai Nakaishi Kai Nakaishi
August 22, 2025
140

SNLP2025 中石 発表スライド (Physics of Language Models: Part 3.3)

Avatar for Kai Nakaishi

Kai Nakaishi

August 22, 2025
Tweet

Transcript

  1. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

    Zeyuan Allen-Zhu & Yuanzhi Li (ICLR 2025 Spotlight) ୈ 17 ճ ࠷ઌ୺ NLP ษڧձ ൃදऀɿதੴɹւʢཧԽֶݚڀॴɼࠃཱࠃޠݚڀॴʣ 2025 ೥ 8 ݄ 31 ೔
  2. ▶ ൃද͸ SSRN ൛ʹجͮ͘ɿhttps://ssrn.com/abstract=5250617 ▶ ஶऀʹΑΔʢPhysics of LMs શମͷʣղઆಈըɾղઆϖʔδɿ ▶

    https://www.youtube.com/watch?v=yBL7J0kgldU ▶ https://physics.allen-zhu.com/ ▶ அΓͷͳ͍ݶΓɼਤ͸ݪ࿦จ͔ΒͷҾ༻ɽ 1/23
  3. Part 3.3, Knowledge Capacity Scaling Laws ▶ Physics of LMs

    ͷߟ͑ํͰεέʔϦϯάଇΛચ࿉͢Δɽ 4/23
  4. ஌ࣝ ▶ ஌ࣝͷஅยʢpieceʣ ɿ ໊લʢnameʣ ɼଐੑʢattributeʣ ɼ஋ʢvalueʣͷࡾͭ૊ (n, a, v∗(n,

    a)). ྫɿ(Anya Briar Forger, birthday, 10/2/1996) ▶ ஌ࣝू߹ʢknowledge setʣ ɿ Z def = {(n, a, v∗(n, a)) | n ∈ N, a ∈ A}. 8/23
  5. ߹੒σʔλ ▶ ໊લɿਓ໊ɽҰ༷ϥϯμϜʹબͿɽ ▶ ଐੑɿ஀ੜ೔ɼग़਎஍ɼग़਎େֶɼઐ߈ɼޏ༻ओɼۈ຿஍ɽ ▶ ஋ɿ໊֤લɼଐੑ͝ͱʹҰ༷ϥϯμϜʹબͿɽۈ຿஍ͷΈޏ༻ओʹґଘɽ ▶ ֤ਓ෺͸ 6

    ͭͷςϯϓϨʔτจͰهड़ɽ ଐੑ͝ͱʹςϯϓϨʔτ͸ 50 ௨Γɽ Anya Briar Forger was born on October 2, 1996. She spent her early years in Princeton, NJ. She received mentorship and guidance from faculty members at Massachusetts Institute of Technology. She completed her education with a focus on Communications. She had a professional role at Meta Platforms. She was employed in Menlo Park, CA. ▶ ॱ൪΋ؚΊͯ on-the-fly ʹϥϯμϜੜ੒ɽ ▶ σʔλશମͰɼදݱͷҟͳΔಉ͡஌ࣝஅย͕ෳ਺ճ࿐ग़ʢexposureʣ ɽ 9/23
  6. ༰ྔൺ ▶ ༰ྔൺʢcapacity ratioʣ ɿ Ϟσϧ͕ 1 ύϥϝʔλ౰ͨΓʹอଘ͍ͯ͠Δ஌ࣝͷྔɽ ▶ Ϟσϧ

    F ͷ༰ྔൺ͸ R(F) def = N log2 N0 exp(p1) + N log2 S0 exp(p2) P , p1 def = E n − log(n Λੜ੒͢Δ֬཰), p2 def = E n ∑ a − log(n, a Λ༩͑ΒΕͨͱ͖ɼv∗(n, a) Λੜ੒͢Δ֬཰). P ɿϞσϧͷύϥϝʔλ਺ɼ N ɿ໊લͷू߹ͷαΠζɼ N0 ɿ໊લͷީิͷू߹ͷαΠζɼ S0 ɿՄೳͳ஋ͷશύλʔϯ਺ɽ 10/23
  7. ༰ྔൺͷଌΓํ ▶ p1 ͱ p2 ͸ަࠩΤϯτϩϐʔଛࣦ͔ΒܭࢉͰ͖Δɽ ▶ “Anya Briar Forger’s

    ID7 is v7 ” ্ͰධՁ͢Δͱ͖ɼ − log(“Anya Briar Forger” Λੜ੒͢Δ֬཰) = (“Anya Briar Forger” ͷτʔΫϯશମʹΘͨΔଛࣦͷ߹ܭ), − log(“Anya Briar Forger” ͱ ID7 Λ༩͑ΒΕͨͱ͖ɼv7 Λੜ੒͢Δ֬཰) = (v7 ͷτʔΫϯશମʹΘͨΔଛࣦͷ߹ܭ). 11/23
  8. ݁Ռ ▶ GPT-2 Λ࿐ग़ճ਺ 1000 ͷσʔλͰ܇࿅ɽ ▶ ༰ྔൺ͸ 2 Ϗοτ/ύϥϝʔλɿ

    R(F) ≈ 2. ▶ σʔλΛ LLM Ͱॻ͖׵͑ͯ΋༰ྔൺ͸ଛͳΘΕͳ͍ɽ 14/23
  9. ࡶײ ▶ ૯͓ͯ͡΋͠Ζ͍ɽ ▶ ʮ੍ޚ͞Εͨ؀ڥͰ࣮ݧ͠Α͏ʯ͸ࢸ౰ɽ ▶ ๛෋ͳ࣮ݧɼ๛෋ͳ݁Ռɽ ▶ ༰ྔൺͷఆٛ͸ΘΓͱ࿅ΒΕ͍ͯͦ͏ɽ ▶

    ৘ใཧ࿦తഎܠɽ ▶ ଌΕΔɽ ▶ ఆཧͷಋग़ɽ ▶ ஌ࣝ༰ྔεέʔϦϯάଇ͸ʮ෺ཧ๏ଇΒ͍͠ʯ͔ʁ 22/23
  10. Ұൠͷ஌ࣝू߹ ▶ ஌ࣝू߹ͷʮෳࡶ͞ʯΛ N Ҏ֎ͷύϥϝʔλͰ΋੍ޚɽ ▶ K ɿଐੑͷ਺ɼ Cɿ஋ͷ਺ɼ D

    ɿ஋ͷଟ༷ੑɼ Lɿ஋ͷτʔΫϯ௕ɼ TɿτʔΫφΠβʔͷޠኮαΠζɽ ▶ Ұ؏ͯ͠ R(F) ≥ 2ɽ 25/23
  11. ྔࢠԽ ▶ 8bit Ͱ΋ R(F) ≈ 2ɽ ▶ 8bit ͷͱ͖ͷཧ࿦తݶք͸

    R(F) ≲ 8ɽ ▶ ཧ࿦తݶքͷ 1/4 ͱ͍͏ߴޮ཰ɽ 26/23
  12. Mixture of Experts ▶ ύϥϝʔλ਺໿ 11.3 ഒͷ MoEɽ ▶ MoE

    ແ͠ͷͱ͖ͱ༰ྔ͕มΘΒͳ͍ͳΒɼ༰ྔൺ͸ 1/11.3ɽ ▶ શͯͷύϥϝʔλΛ஌ࣝอଘʹར༻͍ͯ͠ΔͳΒɼ༰ྔൺ͸ 1ɽ ▶ ਪ࿦ʹ͸͘͝Ұ෦ͷύϥϝʔλ͔͠ར༻͠ͳ͍͕ɼ ஌ࣝอଘʹ͸ύϥϝʔλશମΛར༻͍ͯ͠Δɽ 27/23