Avoid common LLM pitfalls

Avoid common LLM pitfalls Mete Atamel Developer Advocate @ Google
@meteatamel atamel.dev speakerdeck.com/meteatamel github.com/meteatamel/genai-beyond-basics

Introduction Pitfalls Solutions Summary 01 02 03 04 Agenda

Introduction 01

Artificial Intelligence NLP AI Landscape Data Science Machine Learning —
Unsupervised, Supervised, Reinforcement Learning Deep Learning — Artificial, Convolution, Recurrent Neural Networks Generative AI — GAN, VAE, Transformers LLMs — Transformers Image Gen — GAN, VAE

LLM Landscape

Gemini (brand) Gemini App previously Bard Gemini Cloud Assist previously
Duet AI Gemini Code Assist previously Duet AI for developers … Google AI Landscape Vertex AI Google AI Studio previously MakerSuite Model Garden Codey Imagen Gemma Llama 3 Claude 3 Falcon Vicuna Stable Diffusion … Search & Conversation Vector Search Notebooks Pipelines AutoML Gemini (model) … Vision, Video, TTS / STT, NL APIs

Natively multimodal Large context window Sophisticated reasoning

Proprietary + Confidential

Open model derived from Gemini

Gemini Gemma Type Closed, proprietary Open Size Very large Smaller
(2B & 7B versions) Modality Text, image, video, speech Only text Languages 39 languages English-only Function calling ✅ ❌ Context window 32K for 1.0 Pro (8K out max) 1M+ for 1.5 Pro 8K tokens (in + out) Performance State-of-the-art in large models, high quality out-of-the-box State-of-the-art in its class, but can require ﬁne-tuning Use cases Enterprise, scale, SLOs, model updates, etc. Experimentation, research, education Can run locally, privacy Pricing & Management Fully managed API Pay per character Manage yourself Pay for your own hardware & hosting Customization Through managed tuning: supervised, RLHF, distillation Programmatically modify underlying weights

Pitfalls 02

⚠ LLMs require pre and post processing

⚠ LLMs hallucinate

⚠ LLMs rely on outdated public data

⚠ LLM outputs can be chaotic

⚠ LLM inputs can get expensive

⚠ LLM outputs are hard to measure

⚠ LLM outputs can contain PII, harmful content, etc.

Solutions 03

LangChain is the most popular one Firebase Genkit, Semantic Kernel,
AutoGen and others github.com/meteatamel/genai-beyond-basics/tree/main/samples/frameworks/langchain github.com/meteatamel/genai-beyond-basics/tree/main/samples/frameworks/semantic-kernel ⚠ LLMs require pre and post processing 💡LLM frameworks

Grounding with Google Search for public data Grounding with Vertex
AI Search for private data github.com/meteatamel/genai-beyond-basics/tree/main/samples/grounding/google-search github.com/meteatamel/genai-beyond-basics/tree/main/samples/grounding/vertexai-search ⚠ LLMs hallucinate 💡Grounding (easy way)

At some point, you’ll need Retrieval-Augmented Generation (RAG) to ground
on your own private data and for more control ⚠ LLMs hallucinate 💡Grounding (RAG)

LLM Vector DB vector embeddings chunks DOCS calculate split store
vector + chunk ❶ INGESTION RAG

Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate
prompt vector embedding split calculate ﬁnd similar answer prompt + chunks as context store vector + chunk ❶ INGESTION ❷ QUERYING RAG

• How to parse & chunk docs? • What embedding
model to use? • What vector database to use? • How to retrieve similar docs and add to the prompt? • What about images? RAG get complicated github.com/meteatamel/genai-beyond-basics/tree/main/samples/grounding/rag-pdf-langchain-firestore

Function calling: Augment LLMs with external APIs for more real-time
data ⚠ LLMs rely on outdated public data 💡Function calling

Chatbot app Gemini What’s the weather like in Antwerp ?
It’s sunny in Antwerp! External API or service user prompt + getWeather(String) function contract call getWeather(“Antwerp”) for me please 󰚦 getWeather(“Antwerp”) {“forecast”:”sunny”} function response is {“forecast”:”sunny”} Answer: “It’s sunny in Antwerp!” Function calling github.com/meteatamel/genai-beyond-basics/tree/main/samples/function-calling/weather

LLMs now support response type (JSON) and response schemas to
control the output format better github.com/meteatamel/genai-beyond-basics/tree/main/samples/controlled-generation ⚠ LLM outputs can be chaotic 💡Response type and schema

Reduce costs (not necessarily latency) when a large context is
referenced repeatedly by shorter requests github.com/meteatamel/genai-beyond-basics/tree/main/samples/context-caching ⚠ LLM inputs can get expensive 💡Context caching

Send multiple prompts at once and get results async when
latency is not important at a discounted price ⚠ LLM inputs can get expensive 💡Batch generation github.com/meteatamel/genai-beyond-basics/tree/main/samples/batch-generation

DeepEval and Promptfoo are open-source evaluation frameworks Vertex AI has
rapid evaluation and AutoSxS evaluation github.com/meteatamel/genai-beyond-basics/tree/main/samples/evaluation/deepeval ⚠ LLM outputs are hard to measure 💡Evaluation frameworks

Rely on the safety settings of the library for basic
safety measures Promptfoo and LLMGuard are open-source testing/security frameworks github.com/meteatamel/genai-beyond-basics/tree/main/samples/evaluation/promptfoo github.com/meteatamel/genai-beyond-basics/tree/main/samples/evaluation/llmguard ⚠ LLM outputs can contain PII, harmful content, etc. 💡Testing/security frameworks

Summary 04

LLM frameworks to orchestrate LLM calls Grounding and function calling
for private and real-time data Response type and schemas to structure outputs Context caching and batch processing to optimize costs Testing & Security frameworks to evaluate, test, and secure LLM inputs/outputs 📋 Summary

Thank you! Mete Atamel Developer Advocate at Google @meteatamel atamel.dev
speakerdeck.com/meteatamel github.com/meteatamel/genai-beyond-basics

Avoid common LLM pitfalls

Avoid common LLM pitfalls

More Decks by Mete Atamel

Other Decks in Technology

Featured

Transcript