Slide 11
Slide 11 text
Prompt Hardener
Prompt Hardener
12
Security Evaluation of System Prompts
Output evaluation results for improvement
System prompt
+
Hardening criteria
• Spotlighting
• Random Sequence Enclosure
• Instruction Defense
• Role Consistency
LLM
(OpenAI, Claude,
Bedrock)
Input Output
{
“Spotlighting”: {
“Tag user inputs”: {
“satisfaction”: 8,
“mark”: “ ”,
“comment”: ”…”
},
…
“Instruction Defense”: {
“Handle inappropriate…”: {
“satisfaction”: 5,
“mark”: “ ”,
“comment”: ”…”
},
…
“critique”: “…”,
“recommendation”: “…”
}