AI Λ׆༻͠ɺAI ͱڠௐ͠Α͏
ࢀߟ: PaperBench: Evaluating AI’s Abilities to Replicate AI Research - OpenAI
https://openai.com/index/paperbench/
֓ཁ: AI ظͰߴ͍ύϑΥʔϚϯεΛൃش͢ΔɺਓظͰߴ͍ύϑΥʔϚϯεΛൃش͢Δ
ʢ͕ͨͩ͠গͳ͘ɺHuman ത࢜՝ఔͷΤΩεύʔτͰ͋ΔͳͲɺҰൠԽͰ͖Δ΄ͲͰͳ͍ʣ
Slide 10
Slide 10 text
AI Λ׆༻͠ɺAI ͱڠௐ͠Α͏
ࢀߟ: PaperBench: Evaluating AI’s Abilities to Replicate AI Research - OpenAI
https://openai.com/index/paperbench/
֓ཁ: AI ظͰߴ͍ύϑΥʔϚϯεΛൃش͢ΔɺਓظͰߴ͍ύϑΥʔϚϯεΛൃش͢Δ
ʢ͕ͨͩ͠গͳ͘ɺHuman ത࢜՝ఔͷΤΩεύʔτͰ͋ΔͳͲɺҰൠԽͰ͖Δ΄ͲͰͳ͍ʣ
AI 🤝 Human?
RailsνϡʔτϦΞϧ × సֶश
w ࠓճͷΞϯέʔτ݁Ռʢ3BX%BUB(PPHMF'PSNʣ
IUUQTCJUMZMFBSOUPDPEFXJUIDIBUHQUSBXEBUB
w (BUFT/PUFT5IF"HFPG"*IBTCFHVO
IUUQTXXXHBUFTOPUFTDPN5IF"HFPG"*)BT#FHVO
w 0QFO"*$IBU(15࠲ஊձ:PV5VCF
IUUQTXXXZPVUVCFDPNXBUDI WBCO(W(&NXP"
w "**5 ϑϨʔϜϫʔΫ։ൃಛ
ʹ͓͚Δࣄྫ
IUUQTUXJUUFSDPNZBTVMBCTUBUVT
w ੜ"*ͷॳதڭҭͰͷΨΠυϥΠϯࡦఆʹ͚ͨఏݴ
IUUQTQSUJNFTKQNBJOIUNMSEQIUNM
w ੜ"*ͷྙཧతɾ๏తɾࣾձత՝ʢ&-4*ʣͷ֓؍
IUUQTFMTJPTBLBVBDKQSFTFBSDI
w δϣʔδΞՊେɺ5"͕ਓೳͩͬͨ͜ͱʹֶੜͷ୭ؾ͔ͮͳ͔ͬͨ
IUUQTXXXHJ[NPEPKQQPTU@IUNM
ؔ࿈ϦϯΫʢ֓ཁཝ͔Β֤ϦϯΫΛḷΕ·͢ʂʣ
Slide 39
Slide 39 text
RailsνϡʔτϦΞϧ × సֶश
w ະ౿δϡχΞখதߴੜΫϦΤʔλࢧԉϓϩάϥϜ
IUUQTKSNJUPVPSH
fi
OBM
w 3BJMTνϡʔτϦΞϧϓϩμΫτ։ൃͷˠΛֶ΅͏
IUUQTSBJMTUVUPSJBMKQ
w 3BJMTνϡʔτϦΞϧ"*αϙʔτػೳ
IUUQTSBJMTUVUPSJBMKQBJ@TVQQPSU
w 3BJMTνϡʔτϦΞϧOPUFϚΨδϯ
IUUQTOPUFDPNZBTTMBCNNEEG
w :PV5VCFνϟϯωϧ3BJMTνϡʔτϦΞϧ
IUUQTZPVUVCFDPN:BTT-BC
w :PV5VCFνϟϯωϧ$PEFS%PKP+BQBO
IUUQTZPVUVCFDPN$PEFS%PKP+BQBO
w :PV5VCFνϟϯωϧະ౿δϡχΞ
IUUQTZPVUVCFDPN.JUPV+S
ؔ࿈ϦϯΫʢ֓ཁཝ͔Β֤ϦϯΫΛḷΕ·͢ʂʣ