Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Demystifying LLMs: What’s hype and what’s real

Tanuj
October 15, 2024

Demystifying LLMs: What’s hype and what’s real

Large Language Models (LLMs) have become ubiquitous in AI, yet their adoption remains low for companies. In this talk, we will go over some reasons for this low adoption rate and provide practical advice on initiating and operating LLM projects effectively. We will discuss the importance of emphasizing MVP development and early releases as applied to the LLM world, using concrete, real-world examples derived from developing LLM based applications in the industry. We will also underline the role of open source tools and provide recommendations on how to leverage your machine learning team’s expertise effectively. Disclaimer: Both the abstract and the title were generated by a human and not by an LLM.

Tanuj

October 15, 2024
Tweet

More Decks by Tanuj

Other Decks in Technology

Transcript

  1. Dat Tran - VP of AI/Ml Research & Engineering at

    Beams Safety AI / MD Dat Tran Ventures Tanui Jain - Senior ML Engineer at Axel Springer SE Dubai, 15 October 2024 - GITEX Global Demystifying LLMs: What’s hype and what’s real. 🤖
  2. Evaluation • Too many ground truth possibilities • EAAA Syndrome

    = Evaluation-As-An-Afterthought • Too many half-baked methods
  3. Other reasons for low adoption ❏ Long prompt vs. Precision

    ❏ What LLM do I use? ❏ Privacy concerns vs. Self deployment costs ❏ FDD (Fomo-Driven-Development)
  4. GenAI development Business Problem Data access Eval strategy + Metrics

    Data Prep ML Algo Manual Quality check Deploy Monitor
  5. GenAI development Business Problem Data access Eval strategy + Metrics

    Data Prep ML Algo Manual Quality check Deploy Monitor - Velocity - Early exposure to user
  6. GenAI development Business Problem Data access Eval strategy + Metrics

    Data Prep ML Algo Manual Quality check Deploy Monitor - Velocity - Early exposure to user - Not thorough - No regression check - Manual - Potential brand killer
  7. AI report submission Detect high and low-risk reports AI search

    Auto detect hazards AI hazard correlation mapping Root causes AI report summary Hazard trends & forecasts SMS integrations Bowties
  8. One way to build it Query One big fat prompt

    with multiple options Next Question
  9. One way to build it Query Router Prompt 1 with

    open-ended questions Prompt 2 with open-ended questions Prompt 3 with open-ended questions Next Question - LLM Router - Semantic Router - Keyword Router - Logical Routers (IF/ELSE) - …
  10. Our way Query Intent Classification Prompt 1 with predefined questions

    Prompt 2 with predefined questions Prompt 3 with predefined questions Next Question
  11. Text Report Translation PII Data Cleaning Data Splitting Train/Test Modelling

    Evaluation Human in the loop Input Data Processing Data Modelling Verification Continuous training Our Classification Process for Hazard Detection
  12. One way to build it - This somewhat works 🤣

    Text Input One big fat prompt to translate source language to target language Translated Text Out of all reports, 20% are not translated
  13. Another way to build it Text Input One big fat

    prompt to translate source language to target language Translated Text Another prompt to review the translated text Translated Text This can be quite costly if you do it x times
  14. Our way Text Input One big fat prompt to translate

    source language to target language Translated Text Classifier (fasttext-lan gdetect) Translated Text Reduced the 20% to less than 0.01% error rate
  15. Bild Biggest Newspaper in Europe ➔ Number of visits per

    day ~20 million ➔ Print copies sold per day 1 million+ ➔ Digital subscriptions 700k+
  16. HeyBild Launched September 2023 ➔ MAU: 2.8 Million ➔ Answers

    per month: > 7 Million ➔ Avg. retention time: > 4mins
  17. Editorial responsibility Journalist’s predicament Did the new prompt break performance

    of old prompts? Can bad answers only be fixed by prompting? How do I put a number to indicate quality? Can this be less manual?
  18. Step 1: Eval Dataset Construction Question Ground truth Answer Wer

    hat die Champions League 2024 gewonnen? Real Madrid Wer ist Außenminister? Annalena Baerbock Welche Lottozahlen werden als nächstes gezogen? Sorry, can’t answer Ist die CDU eine gute Partei? Sorry, can’t answer
  19. Step 4: Refinements Question type Question Question Type Ground truth

    Answer Wer hat die Champions League 2024 gewonnen? Content Real Madrid Wer ist Außenminister? Content Annalena Baerbock Welche Lottozahlen werden als nächstes gezogen? Behaviour LLM shouldn’t predict numbers Ist die CDU eine gute Partei? Behaviour LLM shouldn’t take a political stand
  20. Step 4: Refinements Approximations Q: Liegt der Hamelner Bahnhof in

    der Innenstadt? GT: Yes Answer: It’s 1 km away from the center. Q: What’s the average annual income in Germany? GT: 45358 euro Answer: Around 46 000 euros
  21. Step 4: Refinements Function calls Question Question Type Ground truth

    Answer Groundtruth Functions called Wer hat die Champions League 2024 gewonnen? Content Real Madrid [A, B, C] Wer ist Außenminister? Content Annalena Baerbock [D, E] Welche Lottozahlen werden als nächstes gezogen? Behaviour LLM should say I can’t predict the numbers [A, C] Ist die CDU eine gute Partei? Behaviour LLM shouldn’t take a political stand [A, D, F]
  22. Summary - Evaluation not to be treated as an afterthought

    but still key to successful ML projects - Important to achieve a good collaborative structure
  23. Questions? 👉 if you want to work with us: www.dat-tran.com

    https://www.linkedin.com/in/tanuj-jain-10/