INTERFACE by apidays 2023 - Securing LLM and NLP APIs: A Journey to Avoiding Data breaches, Attacks and More, Ads Dawson & Jared Krause, Cohere

Language AI Security at the API level: Avoiding Hacks, Injections
and Breaches Ads Dawson github.com/GangGreenTemperTatum linkedin.com/in/adamdawson0 Jared Krause github.com/kravse kravse.dev

Language AI Security at the API level: Avoiding Hacks, Injections
and Breaches Ads Dawson github.com/GangGreenTemperTatum linkedin.com/in/adamdawson0 Jared Krause github.com/kravse kravse.dev Or… Research Notes Detailing the Hacks and Attacks in the AI Wild West.

Cohere, one of the leading pioneers in generative AI, runs
a platform based on state-of-the-art AI models enabling it to provide developers with a range of tools to create customized NLP solutions. Ads Dawson Jared Krause

Lets Cover Some Basics LLM ➠ Large Language Model. These
are neural networks trained on large collections of data that we use to process and analyze text. NLP ➠ Natural Language Processing The branch of computer science focused on teaching computers to speak.

How does security inﬂuence LLM API Security?

Threat Model of an API based ChatBot Application Resource sourced
from GTKlondike

TB01 - STRIDE table & vulnerability list Resource sourced from
GTKlondike

9 The Institute for Ethical AI & Machine Learning #
OWASP Vulnerability MLSecOps Equivalent 1 Broken Access Control Unrestricted Model Endpoints 2 Cryptographic Failures Access to Model Artifacts 3 Injection Artifact Exploit Injection 4 Insecure Design Insecure ML Systems/Pipeline Design 5 Security Misconfigurations Data & ML Infrastructure Misconfigurations 6 Vulnerable & Outdated Components Supply Chain Vulnerabilities in ML Code 7 Identification & Auth Failures IAM & RBAC Failures for ML Services 8 Software and Data Integrity Failures ML Infra / ETL / CI / CD Integrity Failures 9 Logging and Monitoring Failures Observability, Reproducibility & Lineage 10 Server-side Request Forgery ML-Server Side Request Forgery Resources sourced from https://ethical.institute/security.html

Lets focus on a few important LLM API Vulnerabilities 1.
Prompt Hacking (aka Jailbreaks, adversarial prompting) ➔ What the AI is a Jailbreak? 2. Prompt Injection ➔ CPRF (Cross-Plugin Request Forgery) ➔ Package hallucinations ➔ XSS - Data Exﬁltration 3. Training Data Poisoning

Disclaimer! • The information contained in these slides are not
a direct vulnerability of a speciﬁc company, organization or service • It is advised not to perform these types of vulnerability detections without explicit consent

Prompt Injection -> CPRF | Cross-Plugin Request Forgery Resources sourced
from https://embracethered.com

13 Prompt Injection -> GitHub PWN - “The Confused Deputy”
Resources sourced from https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/

Code Injection - What Does Prompt Injection Look Like in
NLP Code?

Prompt Injection in the Wild - LLM hallucination Resources sourced
from https://vulcan.io/blog Popular techniques for spreading malicious packages 1. Typosquatting 2. Masquerading 3. Dependency Confusion 4. Software Package Hijacking 5. Trojan Package

Prompt Injection -> Data Exfiltration (XSS - Cross-Site Scripting) Resources
sourced from https://embracethered.com/blog/posts/2023/bing-chat-data-exfiltration-poc-and-fix/ & https://wuzzi.net/posts/data-exfiltration/

➔ Continually evaluating our models against prompt injection techniques ➔
Benchmarking this analysis How Can We Mitigate This Risk? Our Journey and Lessons Learnt ➔ Integrating LLM vulnerability scanning into our rigorous testing QA ➔ Regularly testing our LLM’s with prompt injection to verify results

Analysing and Benchmarking Reports for Further Analysis FMEA - Failure
Mode and Eﬀect Analysis Intentional failures where the failure is caused by an active adversary attempting to subvert the system to attain their goals — either to misclassify the result, infer private training data, or to steal the underlying algorithm. Unintentional failures where the failure is because an ML system produces a formally correct but completely unsafe outcome.

Current frameworks within an early lifecycle of a technology which
is rapidly developing at scale 🚀 ➔ OWASP Top 10 for Large Language Model Applications ➔ Google’s Secure AI Framework ➔ NIST 1.0 (“AI RMF”) ➔ CSA Security Implications of ChatGPT ➔ Ethical ML Institute ➔ PROPOSED: Fundamental Limitations of Alignment in Large Language Models (Behavior Expectation Bounds (BEB))

How is Cohere the & joining LLM NLP Security Space?

Thank you! Q&A?

INTERFACE by apidays 2023 - Securing LLM and NL...

INTERFACE by apidays 2023 - Securing LLM and NLP APIs: A Journey to Avoiding Data breaches, Attacks and More, Ads Dawson & Jared Krause, Cohere

apidays PRO

More Decks by apidays

Other Decks in Programming

Featured

Transcript

Language AI Security at the API level: Avoiding Hacks, Injections

Language AI Security at the API level: Avoiding Hacks, Injections

Cohere, one of the leading pioneers in generative AI, runs

Lets Cover Some Basics LLM ➠ Large Language Model. These

How does security inﬂuence LLM API Security?

Threat Model of an API based ChatBot Application Resource sourced

TB01 - STRIDE table & vulnerability list Resource sourced from

9 The Institute for Ethical AI & Machine Learning #

Lets focus on a few important LLM API Vulnerabilities 1.

Disclaimer! • The information contained in these slides are not

Prompt Injection -> CPRF | Cross-Plugin Request Forgery Resources sourced

13 Prompt Injection -> GitHub PWN - “The Confused Deputy”

Code Injection - What Does Prompt Injection Look Like in

Prompt Injection in the Wild - LLM hallucination Resources sourced

Prompt Injection -> Data Exﬁltration (XSS - Cross-Site Scripting) Resources

➔ Continually evaluating our models against prompt injection techniques ➔

Analysing and Benchmarking Reports for Further Analysis FMEA - Failure

Current frameworks within an early lifecycle of a technology which

How is Cohere the & joining LLM NLP Security Space?

Thank you! Q&A?