Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Nov24]: Bringing your first LLM into Productio...

[Nov24]: Bringing your first LLM into Production or Beware of your Boss’s Nephew by Christian Hidber

Bringing your first LLM into Production or Beware of your Boss’s Nephew by Christian Hidber
Blog posts and notebooks on «build you chatbot in 5 minutes» or «chat with your data now» are abundant. Deploying a Large Language Model (LLM) application in a production environment—and keeping it running smoothly—is a significantly more complex endeavor.
Through LLM application demos from an industrial planning software, we'll explore how to select promising applications and understand the differences between ad-hoc and production prompts. We dive hands-on into llm-as-a-judge evaluations and online-monitoring of retrieval-based solutions. The talk concludes with some of the issues you might encounter, as well as some – opinionated – recommendations.

About Christian:
Christian lives in Zurich, Switzerland and works as a consultant focusing on real-world machine learning applications. . He earned his PhD in mathematics from ETH Zurich and completed a postdoctoral fellowship at the International Computer Science Institute in Berkeley. Christian has been developing and architecting IT solutions for the last 2 decades. Currently he’s applying artificial intelligence to Geberit’s planning software ProPlanner.

https://www.linkedin.com/in/christian-hidber/

Azure Zurich User Group

November 11, 2024
Tweet

More Decks by Azure Zurich User Group

Other Decks in Technology

Transcript

  1. Bringing your first LLM into production beware of your boss’s

    nephew Christian Hidber, bSquare Oliver Zeigermann, TK LLM Soirée, Azure Zurich User Group, November 2024
  2. H o w d o e s a D e

    c o d e r M o d e l w o r k ? Q: what is pluvia ? • Trained on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context»
  3. Q: what is pluvia ? • Trained on huge datasets

    • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token» Pluvia H o w d o e s a D e c o d e r M o d e l w o r k ? A: Pluvia
  4. Q: what is pluvia ? A: Pluvia is • Trained

    on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token» H o w d o e s a D e c o d e r M o d e l w o r k ?
  5. Q: what is pluvia ? A: Pluvia is a latin

    … • Trained on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token»
  6. Q: what is pluvia ? A: Pluvia is a latin

    word meaning rainfall. EOT • Trained on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token» H o w d o e s a D e c o d e r M o d e l w o r k ?
  7. Hypothetical GeberitBot: Generating an Answer User: Asking a Question N

    a ï v e A p p r o a c h answer You are an expert in ….. Q: What is Pluvia ? A: Pluvia is a latin word meaning rainfall.
  8. Llm: Generating an Answer User: Asking a Question R A

    G S y s t e m A r c h i t e c t u r e chunks Vector DB: Searching facts matching the question Anonymizer: Enforcing Privacy chunks Doc Loader: Image2Text chunks question answer
  9. Low Risk, but nice benefit Low Risk • What is

    the worst thing that could happen and how to mitigate that? • Low profile • Failures should be ok • Human in the loop Nice Benefit • Impossible to do by humans or • Humans don’t like to do • Let the whole organization learn • Management likes it, but is afraid • Can it be used for (internal) marketing?
  10. Writing a PoC vs. Engineering Task Ad-hoc prompting is something

    very different from writing a prompt for a service With ad-hoc prompting • you can immediately see if it works. • there’s a high level of human oversight. • it only needs to work for a specific example With prompting for a system • It needs to generalize for all expected use cases • Has no or less human supervision • Stability is expected
  11. E v a l u a t i o n

    o n t e x t r e s u l t s Llm: Generating an Answer User: Asking a Question answer Question • What is Pluvia ? Answer • Pluvia is a latin word meaning rainfall. • The latin word for rainfall. • …. => equality not an option Human Eval
  12. E v a l u a t i o n

    o n t e x t r e s u l t s Evaluation Criteria: • Correct • Complete • Concise • Relevant • Contradiction free • Language • Style • … • Generation successful Statistics Human Eval Llm: Generating an Answer User: Asking a Question answer
  13. E v a l u a t i o n

    o n t e x t r e s u l t s LLM as a Judge Llm: Generating an Anwer User: Asking a Question answer Evaluation Criteria: • Correct • Complete • Concise • Relevant • Contradiction free • Language • Style • … • Generation successful Statistics Human Eval
  14. Llm: Generating an Anwer User: Asking a Question R A

    G S y s t e m A r c h i t e c t u r e : E v a l u a t i o n question chunks answer Vector DB: Searching facts matching the question Anonymizer: Enforcing Privacy chunks Doc Loader: Image2Text chunks SystemPrompt Question Chunks Contextual Relevance Faithfulness Answer Relevance Conciseness
  15. P D F Ta b l e s : s

    i m p l e c h u n k i n g Art der Flächen Spitzen-\nabfluss-\nbeiwert \nCSMittlerer \nAbfluss-\nbeiwert \nCM\nWasser- \nundurch-lässige Flächen, z. B. Flachdach (≤ 3°) 1,0 0,9\nBetonflächen 1,0 0,9\nRampen 1,0 1,0\nBefestigte Flächen mit Fugendichtung1,0 0,8\nSchwarzdecken (Asphalt) 1,0 0,9\nPflaster mit Fugenver- guss1,0 0,8\nKiesschüttdächer 0,8 0,8\nBegrünte Dach- flächenFür Intensivbegrünun-gen ab 30 cm Aufbau- dicke (≤ 5°)0,2 0,1\nFür Extensivbegrünun-gen ab 10 cm Aufbau-dicke (≤ 5°)0,4 0,2\nFür Extensivbegrünun- gen unter 10 cm Aufbau-dicke (≤ 5°)0,5 0,3\nFür Extensivbegrünung (> 5°)0,7 0,4QRArCS\uf0d7\uf0d7
  16. P D F Ta b l e s : m

    u l t i v e c t o r Table Caption: Tabelle 85: Dachaufbauten und Abflussbeiwerte…. Table Summary: Die Tabelle beschreibt….. Table CSV: ,Art der Flächen,Spitzenabflussbeiwert,…. Wasser-undurchlässige Flächen, Flachdach (<=3),1.0,0.9 ….
  17. P D F Ta b l e s : m

    u l t i v e c t o r Vector DB: Searching facts matching the question
  18. R e r a n k i n g Vector

    DB: Searching facts matching the question … Rank the following documents according to their relevance for the given question. Q: What is Pluvia ? 1 2 3 4 5 6 6 1 2 5 3 4 Llm: Generating an Answer
  19. N o n e th e l e ss beware

    of your boss’s nephew…