Building Production-Ready Apps with AI

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Have you ever interacted with a multi-turn conversational model trained through extensive transformer architectures for dynamic user engagement?

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Have you ever used ChatGPT?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Foundation Models Prompts RAG Vector Database Fine-Tuning Few-Shot Learning Context Hallucinations Zero-Shot Learning

Slide 7

Slide 7 text

Building production- ready apps with LLMs on AWS, without the confusing slang

Slide 8

Slide 8 text

@slobodan_ How do the Large Language Models (LLMs) work?

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Source: https://platform.openai.com/tokenizer

Slide 22

Slide 22 text

@slobodan_ Prompts

Slide 23

Slide 23 text

@slobodan_ Prompts •Prompts are just instructions. •You tell an LLM what you want, and the LLM tries to reply based on its training. •More detailed and better explanation = better answer. •LLM will always answer, but answers are not always based on truth.

Slide 24

Slide 24 text

@slobodan_ Slobodan Stojanović CTO and co-founder of Vacation Tracker co-author of Serverless Apps with Node.js book AWS Serverless Hero JS Belgrade meetup organizer

Slide 25

Slide 25 text

@slobodan_ Models

Slide 26

Slide 26 text

@slobodan_ LLM Models

Slide 27

Slide 27 text

@slobodan_ LLM Models

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

@slobodan_ OpenAI - GPT-4 Turbo •OpenAI has multiple models, but GPT-4 Turbo is the best one. •Price: •Input: US$ 10.00 / 1M tokens •Output: US$ 30.00 / 1M tokens •Quality: ChatGPT level*

Slide 31

Slide 31 text

@slobodan_ Generative Pre-trained Transformer (GPT)

Slide 32

Slide 32 text

@slobodan_ Multimodal Models

Slide 33

Slide 33 text

@slobodan_

Slide 34

Slide 34 text

@slobodan_

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

@slobodan_ Anthropic Claude •Claude 3 oﬀers 3 models: Opus, Sonnet and Haiku •Claude 3 Opus is at the "GPT-4 level." •Price: •Input: US$ 15 / 1M tokens (Opus), US$ 0.25 / 1M (Haiku) •Output: US$ 75 / 1M tokens (Opus), US$ 1.25 / 1M (Haiku)

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

@slobodan_ Google Gemini •People had high hopes for Google LLM. •Good quality and an impressive 1M context. •Price*: •Input: US$ 7 / 1M tokens •Output: US$ 21 / 1M tokens

Slide 39

Slide 39 text

@slobodan_ Open-source models

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

@slobodan_ Mistral & Mixtral •First "WOW!" open-source model. •Both open-source (Mixtral 8x7B and 8x22B) and commercial models (Mistral Large). •Price*: •Input: US$ 0.45 / 1M tokens (Mixtral 8x7B), US$ 8 / 1M (Mistral Large). •Output: US$ 0.7 / 1M tokens (Mixtral 8x7B), US$ 24 / M (Mistral Large)

Slide 42

Slide 42 text

@slobodan_ Mixture of Experts

Slide 43

Slide 43 text

@slobodan_

Slide 44

Slide 44 text

@slobodan_

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

@slobodan_ Meta Llama3 •Llama 3 is Faaaast! And open-source. •Not multi-modal yet, but very good. •Price*: •Input: US$ 0.4 / 1M tokens (8B), US$ 2.65 / 1M (70B) •Output: US$ 0.6 / 1M tokens (8B), US$ 3.5 / 1M (70B)

Slide 47

Slide 47 text

@slobodan_ Parameters (8 billion vs 70 billion)

Slide 48

Slide 48 text

@slobodan_ Small bits of memory that help an LLM decide what to say next.

Slide 49

Slide 49 text

@slobodan_ How can you use these models today?

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

@slobodan_ OpenAI models are available via OpenAI API and Microsoft Azure AI

Slide 52

Slide 52 text

@slobodan_ Azure OpenAI •OpenAI models integrated in Azure. •Same pricing as OpenAI API, but paid with your Azure subscription. •You can control the region where the model is deployed. •Microso"'s ToC and SLA: •"Azure OpenAI doesn't use customer data to retrain models."

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

@slobodan_ Amazon Bedrock •Amazon's Generative AI platform •Amazon hosts foundation models and oﬀers APIs for them. Select your region. •Amazon's ToC and SLA: •"Your training data isn't used to train the base Titan models or distributed to third parties."

Slide 55

Slide 55 text

@slobodan_ Amazon Bedrock Models •Claude (all models) •Mistral AI (most models) •LLama (all models) •Stable Diﬀusion •Cohere (I just tried it initially and forgot about it) •Jurassic (I have no idea what's this) •Amazon's Titan (mostly useless at the moment)

Slide 56

Slide 56 text

@slobodan_

Slide 57

Slide 57 text

@slobodan_ Some models are available in speciﬁc regions only!

Slide 58

Slide 58 text

@slobodan_ US East (N. Virginia) and US West (Oregon) regions has most of the models.

Slide 59

Slide 59 text

@slobodan_

Slide 60

Slide 60 text

@slobodan_ Amazon Bedrock Features •API •Playgrounds •Agents •Fine tuning •Guardrails •Knowledge base (managed RAG*)

Slide 61

Slide 61 text

@slobodan_ What are AI Agents?

Slide 62

Slide 62 text

@slobodan_ LLMs are good at responding to wel-deﬁned and focused tasks

Slide 63

Slide 63 text

@slobodan_ It's better to split a complex task into many smaller and focused sub-tasks

Slide 64

Slide 64 text

@slobodan_

Slide 65

Slide 65 text

@slobodan_ AI Agent frameworks and tools •Langchain •AutoGPT •AutoGen, BabyAGI, and many others •OpenAI Assistants •Bedrock Agents •Or build your own simple agent!

Slide 66

Slide 66 text

@slobodan_ Can I host/run a dedicated model on Bedrock?

Slide 67

Slide 67 text

@slobodan_ Yes, but…

Slide 68

Slide 68 text

@slobodan_

Slide 69

Slide 69 text

@slobodan_

Slide 70

Slide 70 text

@slobodan_ Don't like Azure and AWS? There are also many other places with AI models!

Slide 71

Slide 71 text

@slobodan_ Other platforms •Cloudflare AI: mostly open-source models. •Groq: open-source models, fast! •LPU™ (Language Processing Unit) •Google Vertex AI Studio: Gemini models •Most of the models also oﬀer platforms. •Many others…

Slide 72

Slide 72 text

@slobodan_ Self-hosting of open-source models

Slide 73

Slide 73 text

@slobodan_ How to run an LLM locally •LM Studio (many open-source LLMs, easy download, oﬀers playgrounds) •Ollama •Hugging Face

Slide 74

Slide 74 text

@slobodan_ LLMs require strong machines and a lot of memory!

Slide 75

Slide 75 text

@slobodan_ What can you build using LLMs today?

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

@slobodan_ What are LLMs good at? •Text summarization, labeling, and structuring •Text generation •Personalization and translations •And many more things…

Slide 78

Slide 78 text

@slobodan_ LLMs can improve UX for many products

Slide 79

Slide 79 text

@slobodan_

Slide 80

Slide 80 text

@slobodan_

Slide 81

Slide 81 text

@slobodan_

Slide 82

Slide 82 text

@slobodan_ But we can go a step further: Send a simple email!

Slide 83

Slide 83 text

@slobodan_ Show me the prompts!

Slide 84

Slide 84 text

@slobodan_ Simplified version

Slide 85

Slide 85 text

@slobodan_ Simplified version

Slide 86

Slide 86 text

@slobodan_ Simplified version

Slide 87

Slide 87 text

@slobodan_ "Prompt engineering"

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

@slobodan_ Zero-shot vs few-shot prompting

Slide 90

Slide 90 text

@slobodan_ Zero-shot prompting

Slide 91

Slide 91 text

@slobodan_ Few-shot prompting

Slide 92

Slide 92 text

@slobodan_ Few-shot is more expensive, but it gives more accurate answers

Slide 93

Slide 93 text

@slobodan_ Context is a problem, you can't ﬁt everything in a single prompt

Slide 94

Slide 94 text

@slobodan_ Retrieval-augmented generation (RAG)

Slide 95

Slide 95 text

@slobodan_

Slide 96

Slide 96 text

No content

Slide 97

Slide 97 text

@slobodan_ Source: https://cthiriet.com/blog/infinite-memory-llm

Slide 98

Slide 98 text

@slobodan_ Vector Databases

Slide 99

Slide 99 text

@slobodan_ Do you actually need a vector database?

Slide 100

Slide 100 text

@slobodan_ Popular Vector Databases •Pinecone, etc. •ElasticSearch, OpenSearch •Pgvector PostgreSQL extension •Or you can use S3 and similar!

Slide 101

Slide 101 text

@slobodan_ Security

Slide 102

Slide 102 text

@slobodan_ Should I host my own model?

Slide 103

Slide 103 text

The correct answer in 99.999% of cases. NO!

Slide 104

Slide 104 text

@slobodan_ All LLM platforms “We don't train our models with your data!” Every single one of them!

Slide 105

Slide 105 text

@slobodan_ Prompt injections

Slide 106

Slide 106 text

@slobodan_

Slide 107

Slide 107 text

@slobodan_ OWASP Top 10 for LLM •Prompt Injection •Insecure Output Handling •Training Data Poisoning •Model Denial of Service •Supply Chain Vulnerabilities Source: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Slide 108

Slide 108 text

@slobodan_ OWASP Top 10 for LLM •Sensitive Information Disclosure •Insecure Plugin Design •Excessive Agency •Overreliance •Model The" Source: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Slide 109

Slide 109 text

@slobodan_ What about AGI?

Slide 110

Slide 110 text

@slobodan_ Artiﬁcial General Intelligence (AGI)

Slide 111

Slide 111 text

@slobodan_ ChatGPT “AGI is like a Swiss Army knife for the brain, brilliantly juggling any task you throw at it—from cracking jokes to solving quantum physics puzzles!” Not an AGI yet :)

Slide 112

Slide 112 text

@slobodan_

Slide 113

Slide 113 text

@slobodan_ LLMs are not magic, but you can build amazing things using them.

Slide 114

Slide 114 text

No content

Slide 115

Slide 115 text

@slobodan_ Questions? Twitter: @slobodan_ Linkedin: sstojanovic I guess you can simply use ChatGPT, too.