Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Avoid common LLM pitfalls

Mete Atamel
November 06, 2024

Avoid common LLM pitfalls

It’s easy to generate content with a Large Language Model (LLM), but the output often suffers from hallucinations (fake content), outdated information (not based on the latest data), and reliance on public data only (no private data). Additionally, the output format can be chaotic, often littered with harmful or personally identifiable information (PII), and using a large context window can become expensive—making LLMs less than ideal for real-world applications.

In this talk, we’ll begin with a quick overview of the latest advancements in LLMs. We’ll then explore various techniques to overcome common LLM challenges: grounding and Retrieval-Augmented Generation (RAG) to enhance prompts with relevant data; function calling to provide LLMs with more recent information; batching and context caching to control costs; frameworks for evaluating and security testing your LLMs and more!

By the end of this session, you’ll have a solid understanding of how LLMs can fail and what you can do to address these issues.

Mete Atamel

November 06, 2024
Tweet

More Decks by Mete Atamel

Other Decks in Technology

Transcript

  1. Avoid common LLM pitfalls Mete Atamel Developer Advocate @ Google

    @meteatamel atamel.dev speakerdeck.com/meteatamel github.com/meteatamel/genai-beyond-basics
  2. Artificial Intelligence NLP AI Landscape Data Science Machine Learning —

    Unsupervised, Supervised, Reinforcement Learning Deep Learning — Artificial, Convolution, Recurrent Neural Networks Generative AI — GAN, VAE, Transformers LLMs — Transformers Image Gen — GAN, VAE
  3. Gemini (brand) Gemini App previously Bard Gemini Cloud Assist previously

    Duet AI Gemini Code Assist previously Duet AI for developers … Google AI Landscape Vertex AI Google AI Studio previously MakerSuite Model Garden Codey Imagen Gemma Llama 3 Claude 3 Falcon Vicuna Stable Diffusion … Search & Conversation Vector Search Notebooks Pipelines AutoML Gemini (model) … Vision, Video, TTS / STT, NL APIs
  4. Gemini Gemma Type Closed, proprietary Open Size Very large Smaller

    (2B & 7B versions) Modality Text, image, video, speech Only text Languages 39 languages English-only Function calling ✅ ❌ Context window 32K for 1.0 Pro (8K out max) 1M+ for 1.5 Pro 8K tokens (in + out) Performance State-of-the-art in large models, high quality out-of-the-box State-of-the-art in its class, but can require fine-tuning Use cases Enterprise, scale, SLOs, model updates, etc. Experimentation, research, education Can run locally, privacy Pricing & Management Fully managed API Pay per character Manage yourself Pay for your own hardware & hosting Customization Through managed tuning: supervised, RLHF, distillation Programmatically modify underlying weights
  5. LangChain is the most popular one Firebase Genkit, Semantic Kernel,

    AutoGen and others github.com/meteatamel/genai-beyond-basics/tree/main/samples/frameworks/langchain github.com/meteatamel/genai-beyond-basics/tree/main/samples/frameworks/semantic-kernel ⚠ LLMs require pre and post processing 💡LLM frameworks
  6. Grounding with Google Search for public data Grounding with Vertex

    AI Search for private data github.com/meteatamel/genai-beyond-basics/tree/main/samples/grounding/google-search github.com/meteatamel/genai-beyond-basics/tree/main/samples/grounding/vertexai-search ⚠ LLMs hallucinate 💡Grounding (easy way)
  7. At some point, you’ll need Retrieval-Augmented Generation (RAG) to ground

    on your own private data and for more control ⚠ LLMs hallucinate 💡Grounding (RAG)
  8. Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate

    prompt vector embedding split calculate find similar answer prompt + chunks as context store vector + chunk ❶ INGESTION ❷ QUERYING RAG
  9. • How to parse & chunk docs? • What embedding

    model to use? • What vector database to use? • How to retrieve similar docs and add to the prompt? • What about images? RAG get complicated github.com/meteatamel/genai-beyond-basics/tree/main/samples/grounding/rag-pdf-langchain-firestore
  10. Function calling: Augment LLMs with external APIs for more real-time

    data ⚠ LLMs rely on outdated public data 💡Function calling
  11. Chatbot app Gemini What’s the weather like in Antwerp ?

    It’s sunny in Antwerp! External API or service user prompt + getWeather(String) function contract call getWeather(“Antwerp”) for me please 󰚦 getWeather(“Antwerp”) {“forecast”:”sunny”} function response is {“forecast”:”sunny”} Answer: “It’s sunny in Antwerp!” Function calling github.com/meteatamel/genai-beyond-basics/tree/main/samples/function-calling/weather
  12. LLMs now support response type (JSON) and response schemas to

    control the output format better github.com/meteatamel/genai-beyond-basics/tree/main/samples/controlled-generation ⚠ LLM outputs can be chaotic 💡Response type and schema
  13. Reduce costs (not necessarily latency) when a large context is

    referenced repeatedly by shorter requests github.com/meteatamel/genai-beyond-basics/tree/main/samples/context-caching ⚠ LLM inputs can get expensive 💡Context caching
  14. Send multiple prompts at once and get results async when

    latency is not important at a discounted price ⚠ LLM inputs can get expensive 💡Batch generation github.com/meteatamel/genai-beyond-basics/tree/main/samples/batch-generation
  15. DeepEval and Promptfoo are open-source evaluation frameworks Vertex AI has

    rapid evaluation and AutoSxS evaluation github.com/meteatamel/genai-beyond-basics/tree/main/samples/evaluation/deepeval ⚠ LLM outputs are hard to measure 💡Evaluation frameworks
  16. Rely on the safety settings of the library for basic

    safety measures Promptfoo and LLMGuard are open-source testing/security frameworks github.com/meteatamel/genai-beyond-basics/tree/main/samples/evaluation/promptfoo github.com/meteatamel/genai-beyond-basics/tree/main/samples/evaluation/llmguard ⚠ LLM outputs can contain PII, harmful content, etc. 💡Testing/security frameworks
  17. LLM frameworks to orchestrate LLM calls Grounding and function calling

    for private and real-time data Response type and schemas to structure outputs Context caching and batch processing to optimize costs Testing & Security frameworks to evaluate, test, and secure LLM inputs/outputs 📋 Summary
  18. Thank you! Mete Atamel Developer Advocate at Google @meteatamel atamel.dev

    speakerdeck.com/meteatamel github.com/meteatamel/genai-beyond-basics