Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Summaraizer - Lessons learned along the way

Summaraizer - Lessons learned along the way

Given at AI Dev Day with AI Hub FFM 2024

"How can AI help me in my day-to-day job?" I asked myself.

The question was answered quite quickly the next day when I had to read through a 22-comment-long discussion on GitHub, our developer platform of choice at ioki. I thought about adding a comment like "AI, please summarize!" to that discussion to get an AI-generated summary of it.

That idea became a reality, and we built a small helper tool called summaraizer that does exactly that.

Along the way, we discovered a few interesting things about Large Language Models (LLMs) such as:
* What is a token limit, why it matters, and how to address it?
* Why does an LLM stream data?
* How to instruct the model to summarize a series of comments?
* And why doesn't the model always follow the instructions given?
* Why do various model types exist, and how do they differ from one another?

In this talk, I want to provide a brief overview of summaraizer and address some of the questions that arose during its development.

Stefan M.

June 13, 2024
Tweet

More Decks by Stefan M.

Other Decks in Technology

Transcript

  1. StefMa.guru Stefan May Android Developer since 2014 Principal Android Developer

    @ioki since 2020 github.com/@StefMa StefMa.medium.com x.com/StefMa91
  2. StefMa.guru Stefan May Android Developer since 2014 Principal Android Developer

    @ioki since 2020 github.com/@StefMa StefMa.medium.com x.com/StefMa91 ki = künstliche intelligenz = artificial intelligence
  3. Summaraizer 👉 https://github.com/ioki-mobility/summaraizer Go, CLI and Module Supports Multiple Sources

    (GitHub, Reddit, GitLab, more to come) Supports Multiple Providers (Ollama, OpenAI, Mistral, more to come) 👉 https://github.com/ioki-mobility/summaraizer-action JavaScript Supports Multiple Providers
  4. Tokens Model Token (context) window gpt4o 128.000 Llama3 8.000 Claude

    3 200.000 Gemini 1.000.000 (“soon” 2.000.000)
  5. Tokens Example: Token window: 5 tokens (5*4 ~= 20 chars)

    Input: Why is the sky blue? (20 chars, “5 tokens”) Output: (0 chars left, “0 tokens”)
  6. Tokens Example: Token window: 5 tokens (5*4 ~= 20 chars)

    Input: The sky is (10 chars, “2.5 tokens”) Output: blue! (5 chars, “2.25 tokens”)
  7. Tokens Stuffing: Just put in all the data in (and

    hope for the best) MapReduce: Summarize chunks of the data and put all the summarization into a final prompt
  8. Tokens Stuffing: Just put in all the data (and hope

    for the best) MapReduce: Summarize chunks of the data and put all the summarization into a final prompt Refine: Summarize chunks of data and put the summarization plus the next chunk of data to the prompt until your data ends
  9. Tokens Stuffing: Just put in all the data (and hope

    for the best) MapReduce: Summarize chunks of the data and put all the summarization into a final prompt Refine: Summarize chunks of data and put the summarization plus the next chunk of data to the prompt until your data ends
  10. Streaming Example: Input: The sky is Tokenizer Neural Network (next)

    Token Probability blue 0.9 nice 0.4 dog 0.1
  11. Streaming Example: Input: The sky is Tokenizer Neural Network (next)

    Token Probability blue 0.9 nice 0.4 dog 0.1 Greedy decoding
  12. Streaming Example: Input: The sky is blue Tokenizer Neural Network

    (next) Token Probability because 0.7 AI 0.1 frankfurt 0.2
  13. Streaming Example: Input: The sky is blue Tokenizer Neural Network

    (next) Token Probability because 0.7 AI 0.1 frankfurt 0.2
  14. Prompting How to separate comments? Good old <HTML> Solution: Separate

    comments using enclosing tags Example: <comment>Why is the sky blue?</comment> <comment>I actually don’t know. Maybe ask @john</comment> <comment>The sky is blue because…</comment>
  15. Model variants llama3:latest mistral:7b gemini:pro gemma:2b-instruct gemini:flash llama3:70b llama3:8b gemma:2b

    gemma:7b mistral:instruct codellama:[7b|13b|34b|70b] codegemma:[2b|instruct|code] gemma:text
  16. Model variants llama3:latest mistral:7b gemini:pro gemma:2b-instruct gemini:flash llama3:70b llama3:8b gemma:2b

    gemma:7b mistral:instruct codellama:[7b|13b|34b|70b] codegemma:[2b|instruct|code] gemma:text
  17. Model variants More parameters: Are “better” on a variety of

    tasks Less parameters: Might be “optimized” for a specific task More parameters: Uses more resources Is slower Tend to have a bias on (a) topic(s) Less parameters: Uses less resources Are faster Might not have a bias on (a) topic(s) [model]:[x]b
  18. Model variants Text Is optimized for general text processing like

    translations, text summarization or text generation Instruct Is optimized for responding with completions for an specific instruct. [model]:[text|instruct|...]