$30 off During Our Annual Pro Sale. View Details »

安全なAI利用のためのLLM(大規模言語モデル)の利用と評価 / japanr2025

Avatar for Uryu Shinya Uryu Shinya
December 06, 2025

安全なAI利用のためのLLM(大規模言語モデル)の利用と評価 / japanr2025

Avatar for Uryu Shinya

Uryu Shinya

December 06, 2025
Tweet

More Decks by Uryu Shinya

Other Decks in Science

Transcript

  1. എܠ--.ධՁͷඞཁੑ σʔλ४උ Ϟσϧ܇࿅ʢֶशʣ ςετσʔλͰධՁ ਫ਼౓ɾ࠶ݱ཰ͳͲࢉग़ ίʔυͰ࠶ݱՄೳ ϓϩϯϓτઃܭ --.Ͱਪ࿦ ࠾఺ ίʔυͰ࠶ݱՄೳʁ

    ػցֶशϞσϧͷධՁ --.ͷධՁ ධՁ͢΂͖ͭͷϙΠϯτ ✓ͲͷϞσϧ͕ߴੑೳ͔ͩͬͨ ✓ͳͥͦͷ݁࿦ʹࢸͬͨͷ͔ ✓खॱͱաఔ͕࢒ͤΔ͔ w $IBU(15Ͱ਺ճࢼͯ͠ʮ͍͍ײͩ͡ͳʯ w ʮ(15͕ݡ͍ʯͱ͍͏ӟ͚ͩͰϞσϧબ୒ w ͨ·ͨ·੒ޭͨ͠ϓϩϯϓτͰʮ༏लʯͱ൑அ
  2. 6SZV 4  &WBMVBUJOH-BSHF-BOHVBHF.PEFMTGPS*6$/3FE-JTU4QFDJFT*OGPSNBUJPOBS9JW w *6$/ઈ໓ةዧछධՁͷࣄྫ w ੜ෺ଟ༷ੑอશͷ෼໺Ͱ--.ͷ׆༻͕ظ଴͞Ε͍ͯΔ͕ɺ ઐ໳త൑அʹ͓͚Δ৴པੑʹ͸ٙ໰͕࢒Δɻ w

    ʢݱߦͷ--.ʹڞ௨ͨ͠ʣͭͷॏେͳ՝୊ w ஌ࣝͱਪ࿦ͷΪϟοϓ ˠࣄ࣮͸஌͍ͬͯΔ͕ɺͦΕΛԠ༻ͨ͠൑அ͸ࠔ೉ w ಺ࡏ͢ΔόΠΞε ˠ੸௣ಈ෺ʢਓؾछʣʹ͸ڧ͘ɺແ੸௣ಈ෺ʹ͸ऑ͍ എܠ--.ධՁͷඞཁੑ https://arxiv.org/abs/2510.02830 ٬؍త͔ͭݫີͳධՁϑϨʔϜϫʔΫ͕ෆՄܽ ਖ਼ղ཰ͷဃ཭ ෼ྨֶత஌ࣝ อશঢ়گͷਪ࿦ 94.9% 27.2%
  3. w Φʔϓϯιʔεಁ໌ੑͷߴ͍࣮૷ w ҆શੑࢤ޲҆શੑͱ৴པੑΛ࠷ॏཁࢹ w ࠶ݱੑ࠶ݱՄೳͳՊֶతݕূ w ॊೈੑͱ֦ுੑ0QFO"* (PPHMF "OUISPQJD

    Y"*  ϩʔΧϧ؀ڥʢ0MMBNBʣ౳ɺଟ༷ͳϞσϧΛ ϕϯμʔϩοΫΠϯͳ͠ͰධՁɻ ӳࠃ"*҆શݚڀॴ͕ओಋ ධՁϑϨʔϜϫʔΫʮ*OTQFDU"*ʯ https://inspect.aisi.org.uk/ ++"MMBJSF 34UVEJP૑ઃऀ ͕ ϓϩδΣΫτΛϦʔυ
  4. ධՁͷϞδϡʔϧԽ5BTL %BUBTFU 4PMWFS 4DPSFS ධՁϩδοΫΛίʔυͱͯ͠ମܥతʹ؅ཧɺ࠶ར༻͕ՄೳͱͳΔ 5BTL࣮ݧܭը %BUBTFUೖྗσʔλ 4PMWFSճ౴ઓུ 4DPSFSධՁج४ ධՁʹ࢖༻͢Δೖྗσʔλͱ

    ਖ਼ղϥϕϧͷηοτ ϓϩϯϓτΤϯδχΞϦϯάͳͲɺ Ϟσϧ͔Βճ౴ΛҾ͖ग़ͨ͢Ίͷઓུ ධՁશମͷϫʔΫϑϩʔΛఆٛ Ϟσϧͷग़ྗΛਖ਼ղͱൺֱ͠ɺ είΞΛࢉग़͢ΔͨΊͷධՁج४ Task( dataset=..., solver=chain(...), scorer=..., ) 
  5. *OTQFDU"*ʹΑΔ*6$/ධՁλεΫͷ࣮૷ 6SZV  ͷͭͷλεΫ΁ͷద༻ྫ λεΫ ໨త ࢖༻ͨ͠4PMWFS4DPSFSͷྫ ෼ྨֶత෼ྨ ϨουϦετΧςΰϦධՁ ஍ཧత෼෍

    ڴҖͷಛఆ ਖ਼͍͠෼ྨ܈Λબ୒ͤ͞Δ ͭͷΧςΰϦ͔ΒͭΛಛఆ ࠃ໊ͷϦετΛੜ੒ ͷڴҖΧςΰϦ͔Βෳ਺Λબ୒ https://github.com/uribo/iucn-redlist-evals chain(), optimize_choices()*, system_message(), multiple_choice_with_cache()*, taxon_partial_scorer()* system_message(), generate(), match() system_message(), generate(), geo_distribution_scorer()* system_message(), generate(), threat_assessment_scorer()*
  6. *OTQFDU"*ʹΑΔ*6$/ධՁλεΫͷ࣮૷ 6SZV  ͷͭͷλεΫ΁ͷద༻ྫ *OQVU 5BSHFU  Aquila chrysaetos https://github.com/uribo/iucn-redlist-evals

       1IPUP3PDLZ $$#:IUUQTDSFBUJWFDPNNPOTPSHMJDFOTFTCZ WJB8JLJNFEJB$PNNPOT B LC $IPJDFT A. Animalia (Kingdom) > Chordata (Phylum) > Aves (Class) > Accipitriformes (Order) > Pandionidae (Family), B. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Accipitridae (Family), C. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Cathartidae (Family), D. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Sagittariidae (Family)”, E. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Elanidae (Family)" "OTXFS B &WBMVBUF Correct EX, EW, CR, EN, VU, NT, LC, DD NT Incorrect Montenegro; Italy; France; Albania etc., Country list Montenegro; France; Iraq etc., Partial Agriculture & aquaculture; Pollution; Energy production & mining; Transportation & service corridors etc. Threats list None Incorrect 5BTL ʢΠψϫγʣ ܽམɺ৑௕
  7. 3൛΋͋ΔϤʂWJUBMTύοέʔδ --.ͱͷର࿩͸FMMNFSύοέʔδΛհͯ͠ߦ͏ https://vitals.tidyverse.org/ library(vitals) library(ellmer) simple_qa <- tibble::tibble( input =

    c("日本の初代総理大臣は誰か", "Posit(旧RStudio)のチーフサイエンティストは誰か"), target = c("伊藤博文", "Hadley Wickham") ) tsk <- Task$new( dataset = simple_qa, solver = generate(chat_ollama(model = "gpt-oss:20b")), scorer = model_graded_fact() ) tsk$eval() tsk$score()  5BTL࣮ݧܭը %BUBTFUೖྗσʔλ 4PMWFSճ౴ઓུ 4DPSFSධՁج४ ਪ࿦Ϟσϧͷࢦఆ