A Prompt Injection Attack is an assault where special instructions are inputted into an AI chatbot, resulting in outputs not intended by the developer/operator. Through this attack, one can potentially steal confidential information stored within the AI chatbot or induce it to engage in unethical behaviors, thereby damaging the site's reputation. It is expected that AI chatbot services using large-scale language models will increase in the future. Vulnerability assessors must thus learn diagnostic techniques for these chatbots. Many chatbots incorporate rule-based filtering functions to block malicious inputs or outputs of unethical/confidential information. But is that enough? We will introduce payloads and bypass techniques for input/output filters, gleaned from managing a CTF (Capture The Flag) for Prompt Injection Attacks.
【ja】https://github.com/satoki/AVTOKYO2023/blob/main/AVTOKYO2023_ja.pdf
【en】https://github.com/satoki/AVTOKYO2023/blob/main/AVTOKYO2023_en.pdf