Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tracking Knowledge Diversity in LLM-Generated R...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Tracking Knowledge Diversity in LLM-Generated Responses

PyCon DE 2026 Talk based on our work, Epistemic Diversity and Knowledge Collapse in Large Language Models https://arxiv.org/abs/2510.04226

Avatar for _themessier

_themessier

April 15, 2026

More Decks by _themessier

Other Decks in Research

Transcript

  1. Tracking Knowledge Diversity in LLM-Generated Responses Based on our research

    by Dustin Wright, Sarah Masud, Jared Moore, Srishti Yadav, Maria Antoniak, Peter Ebert Christensen, Chan Young Park, Isabelle Augenstein PyCon DE, 2026
  2. We all have been there Prompted your favourite Large Language

    Model (LLM) or AI Chat System to describe a concept, eg “Democracy” • You have examined your 10th prompt variant. • Lost track of unique information captured by multiple prompt variants. • Thinking “what would Google say?”.
  3. The goal • Measure how diverse the LLM responses are

    ◦ Across prompt variants ◦ Model size and version 8B vs 13B vs 70B ◦ Model family aka Qwen vs LLama ◦ RAG or no RAG Epistemic Diversity and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  4. But how to measure epistemic diversity? Assume LLM gives you

    a list of information about a topic say “democracy”. • A simple approach: count the items in the list. ◦ ✅ More items → more diversity? • Loophole? ◦ 🤔 What about synonms/similar sentences that can conflate the results?
  5. But how to measure epistemic diversity? Assume LLM gives you

    a list of information about a topic say “democracy”. • Better approach: cluster the sentences. • ✅ More clusters → more unique types of information → more diversity? • Loophole? ◦ 🤔 What about long tail, a model produces many singleton clusters aka noisy clusters.
  6. But how to measure epistemic diversity? Assume LLM gives you

    a list of information about a topic say “democracy”. • Better approach: cluster the sentences. • ✅ More clusters → more unique types of information → more diversity? • Can all the sentences/claims in this cluster answer a specific question about that topic and no more. Epistemic Diversity and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  7. But how to measure epistemic diversity? Assume LLM gives you

    a list of information about a topic say “democracy”. • Better approach: cluster the sentences. • Loophole? ◦ 🤔 What about long tail, a model produces many singleton clusters aka noisy clusters. • ❌ Naively counting number of items or number of clusters is not a good measure.
  8. But how to measure epistemic diversity? Assume LLM gives you

    a list of information about a topic say “democracy”. • Even better approach: Measure entropy of cluster sizes instead. • ✅ Uneven cluster sizes (low diversity) → balanced cluster sizes (higher diversity) Number of clusters obtained for model m when prompted for topic t. Size of the cluster i.e number of claims in ith cluster/ total number of claims for model m when prompted for topic t.
  9. But how to measure epistemic diversity? Assume LLM gives you

    a list of information about a topic say “democracy”. • Even better approach: Measure Hill-Shannon entropy of cluster sizes instead. • ✅ Exponent of log → We are back to linear scale now!
  10. What to compare? For a given general concept topic we:

    • Use top-40, top-20 Google-USA search result as baseline. • Use 27 LLMs of varing size, family and release dates. • Prompt each LLM with 200 input variation of writing/information seeking. • RAG vs non-RAG • Cluster for each system and obtain it's diversity score. Epistemic Diversity and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  11. Finding 1: Models are getting better with time Epistemic Diversity

    and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  12. Finding 2: But you are better off with top-20 searches

    Epistemic Diversity and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  13. Finding 3: Use search as RAG prompting helps! Epistemic Diversity

    and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  14. Practical Takeaway 1: LLM vs search • LLMs don’t always

    agree → Don’t trust a single model → Compare outputs across multiple LLMs. • Open‑weight models tend to agree more with each other. → Better for consistency. • Search ≠ LLMs → use it to spot what LLMs might miss. Epistemic Diversity and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226 Similarity = 1- Divergence
  15. Practical Takeaways 2: RAG • Smaller models + RAG are

    better → use large models when standalone; else smaller model + RAG • Localize your RAG → RAG only improves diversity when the underlying search is itself diverse. • Guard your knowledge base (KB) → Populating KB with LLM will lead to collapse of knowledge. Epistemic Diversity and Knowledge Collapse in Large Language Models, https://arxiv.org/abs/2510.04226
  16. Useful links Gitub https://github.com/dwright37/llm-knowledge Arxiv https://arxiv.org/abs/2510.04226 My socials https://bsky.app/profile/themessier.bsky.social Based

    on our research by Dustin Wright, Sarah Masud, Jared Moore, Srishti Yadav, Maria Antoniak, Peter Ebert Christensen, Chan Young Park, Isabelle Augenstein PyCon DE, 2026