PyCon Taiwan 2025: AI Guardrails--Building Enterprise-Level LLM Safety Strategies with Python by Nero Un 阮智軒

©PyCon TW 2025 1 AI Guardrails—— 以 Python 構建企業級 LLM
安全防護策略 PyCon Taiwan 2025 阮智軒 Nero UN

©PyCon TW 2025 2

©PyCon TW 2025 3 # WHOAMI • 我是 • Nero
Un 阮智軒｜來自澳門的開發者 • Consultant @IBM Taiwan • 關注 • 資料科學｜資料工程｜生成式 AI • 近況 • 在家裡架了 k8s、練習分散式應用和架構 • 整理了 100+k 的 QA 資料準備 Fine Tune SLM • 希望在 PyCon 多認識技術同好 Medium：@NeroHin LinkedIn: @nerouch

©PyCon TW 2025 4 # TAKEAWAY 談論不談論 ✓深入淺出講解 Guardrails
原理 ✓使用 Python 實作 Guardrails 服務 ✓企業導入的架構、成本和成效的比較 X 具體客戶的業務情景 (主要是不能) X AI Guardrails 工具選型與比較

©PyCon TW 2025 5 # TAKEAWAY 談論不談論 ✓深入淺出講解 Guardrails
原理 ✓使用 Python 實作 Guardrails 服務 ✓企業導入的架構、成本和成效的比較 X 具體客戶的業務情景 (主要是不能) X AI Guardrails 工具選型與比較

©PyCon TW 2025 6 # AGENDA 1. 企業導入 LLM 的常見風險
2. AI Guardrails 的原理和架構 3. 實作分享：Hallucination Detector 4. 企業落地實務分享 5. 總結與 Q&A

©PyCon TW 2025 7 企業導入 LLM 的常見風險當我們歡天喜地完成 LLM 應用開發後，
老闆突然問：「你打算甚麼防止它胡說八道？」 Introduction

©PyCon TW 2025 8 LLM 造成的風險——企業機密外洩 Source

©PyCon TW 2025 9 LLM 造成的風險——品牌聲譽受損 Source

©PyCon TW 2025 10 到底有甚麼應對風險的做法 / 產品 / 服務？

©PyCon TW 2025 11 AI Guardrails 的原理和架構老闆的話在你腦海揮之不去，詢問 AI
後你獲得了一個解決方案「導入 AI Guardrails」 Methodology & Architecture

©PyCon TW 2025 12 防止有害/與任務無關的輸入防止不合規/有問題的答案輸出應對風險的做法—— 如何有效地限制控管模型的輸入/輸出？

©PyCon TW 2025 13 我們快速回顧剛剛台北捷運助手的情境 Prompt Foundation Model (e.g., LLM)
Response { “prompt”: “我想要用 JS 寫出一個偵測網頁點擊的 event listener 的 Code" } { “Response”: “ <!DOCTYPE html> <html lang="zh-Hant"> ………" }

©PyCon TW 2025 14 AI Guardrails 原理 101: 在問題發生前/時積極預防和處理 Prompt
Foundation Model (e.g., LLM) Input Guardrails Response Output Guardrails Intention Detector Prompt Injection Content Safety HAP Detector { “prompt”: “我想要用 JS 寫出一個偵測網頁點擊的 event listener 的 Code" } { “guard”: “不符合使用情景" }

©PyCon TW 2025 15 AI Guardrails 的使用時機——輸入、檢索、生成、輸出輸入舉例：PII Detector
“Please Check TEL：09666666” User Masking / Reject I can’t help you to check <PHONE_NUMBER> LLM 檢索舉例：Document Relevancy “Topic A, Topic B” Retrieval Filter By Relevancy (Only Topic A) LLM 生成舉例：Faithfulness Detector ”Sky is Yellow” LLM Ref: Sky is Blue Detector No Result. Response 輸出舉例：Schema Check JSON Validator No Result. Response LLM [123, 123]

©PyCon TW 2025 16 實作分享：構建辨識虛假內容偵測器平日自詡為手作派的你想找個題目來練手，你看，這不就來了嗎？ Use Case Study

©PyCon TW 2025 17 是否為事實？是否偏離上下文內容？

©PyCon TW 2025 18 是否為事實？是否偏離上下文內容？ X 天空是紫色的 ✓天空是藍色的

©PyCon TW 2025 19 是否為事實？是否偏離上下文內容？ X 中國是 1911
年建國的 ✓ 中華民國是 1911 年建國的 {生成時引用維基百科內容}

©PyCon TW 2025 20 實作簡介——任務目標、資料集、模型/工具資料集模型/工具構建一個用於針對 LLM 生成內容
是否足夠忠實的檢測器

©PyCon TW 2025 21 實作架構——以開發 HALLUCINATION DETECTOR 服務為例 LLM API
Provider Hallucination Detector Service AI Gateway Guardrails Modular Testing Modular Generator Modular OpenAI GPT Google Gemini OpenAI SDK User

©PyCon TW 2025 22 實作資料：以 HaluEval 資料集中的 Q&A 為主 <<HaluEval:
A Large-Scale Hallucination Evaluation Benchmark for Large Language Models>> arxiv GPT 生成、人工標註幻覺資料

©PyCon TW 2025 23 實作資料：以 HaluEval 資料集中的 Q&A 為主 arxiv
問答 10K HotpotQA 資料對話 10K OpenDialKG 資料總結 10K CNN/Daily Mail 摘要回應 5K Alpaca 標註 LLM 回覆 Question • Where was the film A Taxi Driver released? Knowledge (Context) • Jang Hoon (born May 4, 1975) is a South Korean film director. It was selected as the South Korean entry for the Best Foreign Language Film at the 90th Academy Awards. Right Answer • South Korea Hallucinated Answer • The film A Taxi Driver was released in North Korea. <<HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models>>

©PyCon TW 2025 24 實作流程：調用不同 LLM 的痛點維運與學習成本皆會增加 OpenAI GPT
Google Gemini 不同的 Provider 有著各自 Model API 的 SDK

©PyCon TW 2025 25 實作流程：使用 LiteLLM 統一調用不同 LLM AI Gateway
GPT-4.1-nano Gemini-2.5-Flash LLM API Provider

©PyCon TW 2025 26 實作流程：使用 LiteLLM 統一調用不同 LLM AI Gateway
GPT-4.1-nano Gemini-2.5-Flash LLM API Provider

©PyCon TW 2025 27 實作流程：Guardrails.ai 是一套用於構建 Guardrails 的工具 Prompt Foundation
Model (e.g., LLM) Input Guardrails Response Output Guardrails Intention Detector Prompt Injection Content Safety HAP Detector

©PyCon TW 2025 28 實作流程：使用 Guardrails.ai 構建 Detector • 示意
• 左圖是 Guardrails 的驗證流程 • 例子是檢查輸入內容是否存在偏見 • 使用 SLM / Keywords 來設定規則來判讀 • 優點 • 現成的驗證器很多 (有一個 Guardrails Hub) • 開發彈性度高 • 只需替換模型和任務的 Prompt • 也可以自訂客製用途 Guardrails • 缺點 • 目前 Hub 上都是英文的驗證器為主

©PyCon TW 2025 29 實作流程：設計 HALLUCINATION DETECTOR *完整程式碼見附錄 Repo 中
detector.py Input HALLUCINATION DETECTOR LLM As a Judge Result Parsing Validator Response Question、Knowledge (Context)、Hallucinated Answer { "is_factual"：true } ```json\n{"is_factual": true}\n``` '{"is_factual": true}' JSON Markdown JSON String (JSON) 判斷是否通過？ Return YES NO Retries, Alert, Ignore, etc.

©PyCon TW 2025 30 實作流程：使用 Pytest 撰寫測案來驗證 Question • Chuck
Russell and Russ Meyer, have which mutual occupations? Knowledge (Context) • Charles "Chuck" Russell (born May 9, 1958) is an American film director, producer, screenwriter and actor, known for his work on several genre films. Russell Albion "Russ" Meyer (March 21, 1922 – September 18, 2004) was an American film director, producer, screenwriter, cinematographer, film editor, actor, and photographer. Right Answer • film director, producer, screenwriter and actor Hallucinated Answer • Chuck Russell and Russ Meyer have different occupations. *完整程式碼見附錄 Repo 中 tests/simple_test.py Prepare HALLUCINATION DETECTOR FAIL PASS { "is_factual"：false } { "is_factual"：true } Hallucinated Answer Right Answer

©PyCon TW 2025 31 實作流程：使用 Pytest 撰寫測案來驗證 *完整程式碼見附錄 Repo 中
tests/simple_test.py

©PyCon TW 2025 32 實作流程：使用 FastAPI 建立 API 服務 FastAPI
應不用我介紹了吧 *完整程式碼見附錄 Repo 中 detector/app.py (示意圖) 問就是用！ 1. 高效能 2. API as a Documentation 3. 整合 Python Typing 這次主要設計了兩支 API： • 單筆：用於一般計算 • 多筆：用於批次 / 離線計算

©PyCon TW 2025 33 實作流程：使用 FastAPI 建立 API 服務 *完整程式碼見附錄
Repo 中 detector/app.py

©PyCon TW 2025 34 接下來讓我們看看導入前後的成本 (以下皆以 gpt-4.1-nano 測試、資料筆數 50)

©PyCon TW 2025 35 + 57% (這是時間成本的增加)

©PyCon TW 2025 36 成本明顯反映在服務的延遲上

©PyCon TW 2025 37 如果明確知道導入會有額外成本、那企業要怎麼去衡量？

©PyCon TW 2025 38 企業落地實務分享原理和實踐你都懂了，但提案總是被問「成本和成效」、「怎麼導入到企業的產品/服務中」，不能不準備吧？ Hands-on Experience

©PyCon TW 2025 39 企業導入 Guardrails 的實務考量——Benchmarking 1. 延遲：不同 LLM
API 執行速度的差異，統一使用 OpenRouter 2. 成本：使用不同參數量模型的成效比較，從 1B 到 70B 3. 成效：針對 HulEval Q&A 來測試，共 100 筆

©PyCon TW 2025 40 企業導入 Guardrails 的實務考量——Benchmarking 開源 SLM 開源
LLM (>20B) Mistral-small-3.2 24B Llama-3-70B GLM-4-32B Qwen-2.5-7B Gemma-3-4B Llama-3.2-1B 商源模型 GPT-4o GPT-4.1-nano Gemini-2.0- Flash-Lite Gemini-2.0- Flash 開源 LPU LLM Llama-3.1-8B Llama-3.3-70B Llama-4-Scout-17B-16e

©PyCon TW 2025 41 （太小了嗎？下一頁馬上放大）

©PyCon TW 2025 45 1. CP 值高：Qwen-2.5-7B 準確度比 Gemini- 2.0-Flash
高 +8%，API 成本低 2.4x 2. 自建優勢：可在本地低成本部署，適合企業需要控管資料隱私或降低 API 成本

©PyCon TW 2025 49 企業如何整合 Guardrails 到服務或產品上？ Context Construction e.g.,
RAG, Agent Read-only Actions e.g., vector search, run SQL queries, web search Middleware Guardrails (RAG Guardrails) Input Guardrails e.g., PII redaction Output Guardrails e.g., Content Safety Databases e.g., documents, tables, chat history, vectorDB AI Gateway Platform Guardrails e.g., Policy Check, Whitelist Check Generation Response Query USER 中介層 Middleware Layer 平台層 Platform Layer 應用層 Application Layer

©PyCon TW 2025 54 自我評估——AI Guardrails 成熟度評估模型對齊治理與法遵需求（L4 Governance）
流程自動化（L3 Automation ）模組共用化（L2 Modularization）實踐與導入（L0 Implementation ） • 生成內容驗證 • 輸入/輸出篩檢（HAP、PII) • 導入 Red Team Testing • Guardrails Routing • 在企業 / 解決方案中可以複用 Guardrails • 有專責的團隊和維運的流程 (R&R) • 滿足產業或法規的安全治理要求 • 如：ISO 42001, 23894 或 NIST AI RMF

©PyCon TW 2025 55 技術趨勢——Guardrails 成為 AI Service 標準配備 Open
Source Cloud Provider OpenAI Agent Business Service LiteLLM *private preview

©PyCon TW 2025 56 無論你是開發者、PM/PO、資料科學家或其他角色都執行「三多」行動 • 市面的產品或解決方案有沒有導入 Guardrails
的功能？ • 它們是怎麼設計和應用的? 多分析 • 所在產業或專案中有法規、業務需要使用 Guardrails？ • 導入時遇到甚麼困難和挑戰? • 作為 AI 產品的使用者、你希望產品的創意、延遲及安全的考慮排序為何？為甚麼? 多觀察多思考

©PyCon TW 2025 57 延伸閱讀 • 實作程式碼：https://github.com/NeroHin/2025-pycon-tw-ai-guardrails • 工具 •
OpenAI Guardrails • OpenAI Moderation • Safety-Prompts • A practical guide to building agents • 文章 • Custom LLM as a Judge to Detect Hallucinations with Braintrust • Building low-latency guardrails to secure your agents • Measuring the Effectiveness and Performance of AI Guardrails in Generative AI Applications • What are AI guardrails? • LLM Guardrails: Your Guide to Building Safe AI Applications • Deploying Enterprise LLM Applications with Inference, Guardrails, and Observability • 講者 Medium • 【How To Guard Your LLMs Output】使用 LiteLLM 和 Guardrails 來驗證 LLM 的輸出結果 • 深入 Guardrail 的世界！使用 LiteLLM 和 Guardrails.ai 打造客制化驗證器及探討應對生成式 AI 安全性設計與優化方向

PyCon Taiwan 2025: AI Guardrails--Building Ente...

PyCon Taiwan 2025: AI Guardrails--Building Enterprise-Level LLM Safety Strategies with Python by Nero Un 阮智軒

Other Decks in Programming

Featured

Transcript