[Nov24]: Bringing your first LLM into Production or Beware of your Boss’s Nephew by Christian Hidber

Bringing your first LLM into production beware of your boss’s
nephew Christian Hidber, bSquare Oliver Zeigermann, TK LLM Soirée, Azure Zurich User Group, November 2024

Boss: my nephew does GPT too…

LLM Intro

H o w d o e s a D e
c o d e r M o d e l w o r k ? Q: what is pluvia ? • Trained on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context»

Q: what is pluvia ? • Trained on huge datasets
• Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token» Pluvia H o w d o e s a D e c o d e r M o d e l w o r k ? A: Pluvia

Q: what is pluvia ? A: Pluvia is • Trained
on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token» H o w d o e s a D e c o d e r M o d e l w o r k ?

Q: what is pluvia ? A: Pluvia is a latin
… • Trained on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token»

Q: what is pluvia ? A: Pluvia is a latin
word meaning rainfall. EOT • Trained on huge datasets • Does not change • Same for all users • «the model» • Depends on users goal • Unique for each chat & user • Contains the chat history • «the context» • Single «word» • Depends on context and model • «the token» H o w d o e s a D e c o d e r M o d e l w o r k ?

Hypothetical GeberitBot: Generating an Answer User: Asking a Question N
a ï v e A p p r o a c h answer You are an expert in ….. Q: What is Pluvia ? A: Pluvia is a latin word meaning rainfall.

Llm: Generating an Answer User: Asking a Question R A
G S y s t e m A r c h i t e c t u r e chunks Vector DB: Searching facts matching the question Anonymizer: Enforcing Privacy chunks Doc Loader: Image2Text chunks question answer

Demo: RAG Applications

Choosing an application

Low Risk, but nice benefit Low Risk • What is
the worst thing that could happen and how to mitigate that? • Low profile • Failures should be ok • Human in the loop Nice Benefit • Impossible to do by humans or • Humans don’t like to do • Let the whole organization learn • Management likes it, but is afraid • Can it be used for (internal) marketing?

From Prompt Hacking to Production 14

Writing a PoC vs. Engineering Task Ad-hoc prompting is something
very different from writing a prompt for a service With ad-hoc prompting • you can immediately see if it works. • there’s a high level of human oversight. • it only needs to work for a specific example With prompting for a system • It needs to generalize for all expected use cases • Has no or less human supervision • Stability is expected

Evaluation

E v a l u a t i o n
o n t e x t r e s u l t s Llm: Generating an Answer User: Asking a Question answer Question • What is Pluvia ? Answer • Pluvia is a latin word meaning rainfall. • The latin word for rainfall. • …. => equality not an option Human Eval

E v a l u a t i o n
o n t e x t r e s u l t s Evaluation Criteria: • Correct • Complete • Concise • Relevant • Contradiction free • Language • Style • … • Generation successful Statistics Human Eval Llm: Generating an Answer User: Asking a Question answer

E v a l u a t i o n
o n t e x t r e s u l t s LLM as a Judge Llm: Generating an Anwer User: Asking a Question answer Evaluation Criteria: • Correct • Complete • Concise • Relevant • Contradiction free • Language • Style • … • Generation successful Statistics Human Eval

Demo: Evaluation Notebook

Llm: Generating an Anwer User: Asking a Question R A
G S y s t e m A r c h i t e c t u r e : E v a l u a t i o n question chunks answer Vector DB: Searching facts matching the question Anonymizer: Enforcing Privacy chunks Doc Loader: Image2Text chunks SystemPrompt Question Chunks Contextual Relevance Faithfulness Answer Relevance Conciseness

Online Eval: Example

PDF Tables

P D F & Ta b l e s

P D F Ta b l e s : s
i m p l e c h u n k i n g Art der Flächen Spitzen-\nabfluss-\nbeiwert \nCSMittlerer \nAbfluss-\nbeiwert \nCM\nWasser- \nundurch-lässige Flächen, z. B. Flachdach (≤ 3°) 1,0 0,9\nBetonflächen 1,0 0,9\nRampen 1,0 1,0\nBefestigte Flächen mit Fugendichtung1,0 0,8\nSchwarzdecken (Asphalt) 1,0 0,9\nPflaster mit Fugenver- guss1,0 0,8\nKiesschüttdächer 0,8 0,8\nBegrünte Dach- flächenFür Intensivbegrünun-gen ab 30 cm Aufbau- dicke (≤ 5°)0,2 0,1\nFür Extensivbegrünun-gen ab 10 cm Aufbau-dicke (≤ 5°)0,4 0,2\nFür Extensivbegrünun- gen unter 10 cm Aufbau-dicke (≤ 5°)0,5 0,3\nFür Extensivbegrünung (> 5°)0,7 0,4QRArCS\uf0d7\uf0d7

P D F Ta b l e s : m
u l t i v e c t o r Table Caption: Tabelle 85: Dachaufbauten und Abflussbeiwerte…. Table Summary: Die Tabelle beschreibt….. Table CSV: ,Art der Flächen,Spitzenabflussbeiwert,…. Wasser-undurchlässige Flächen, Flachdach (<=3),1.0,0.9 ….

Demo: Azure Document Intelligence...

P D F Ta b l e s : m
u l t i v e c t o r Vector DB: Searching facts matching the question

Reranking

R e r a n k i n g Vector
DB: Searching facts matching the question … Rank the following documents according to their relevance for the given question. Q: What is Pluvia ? 1 2 3 4 5 6 6 1 2 5 3 4 Llm: Generating an Answer

Wrap Up

W r a p U p Scoping Prompting Evaluation Table
Parsing Reranking

N o n e th e l e ss beware
of your boss’s nephew…

Thank you

[Nov24]: Bringing your first LLM into Productio...

[Nov24]: Bringing your first LLM into Production or Beware of your Boss’s Nephew by Christian Hidber

More Decks by Azure Zurich User Group

Other Decks in Technology

Featured

Transcript