Slide 15
Slide 15 text
引用文献
1. Inan, H. et al. "Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations." Meta AI
Research, 2023.
2. Meta. "Prompt Guard." Hugging Face, 2024.
3. Deng, Y. et al. "Multilingual Jailbreak Challenges in Large Language Models." ICLR 2024.
4. 国立情報学研究所 大規模言語モデル研究開発センター. "AnswerCarefully Dataset." 2024.
5. Wang, Y. et al. "Do-Not-Answer: Evaluating Safeguards in LLMs." Findings of the Association for
Computational Linguistics: EACL 2024, pages 896-911, 2024.
6. Anthropic. "Many-shot jailbreaking." Anthropic Research, 2024.
7. Gandhi, S. et al. "Gandalf the Red: Adaptive Security for LLMs." arXiv preprint arXiv:2501.07927, 2025.
8. Shen, X. et al. ""Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large
Language Models." Conference on Computer and Communications Security (2023).
1 / 15