Slide 1

Slide 1 text

LLM Development Landscape Kamolphan Liwprasert (Fon) MLOps Consultant, AIMET.tech Google Developer Expert - Cloud

Slide 2

Slide 2 text

แนะนําตัว

Slide 3

Slide 3 text

AIMET aimet.tech

Slide 4

Slide 4 text

LLM Development Landscape ✨ Overview ภาพรวมในการพัฒนาแอป LLM ✨ Concept ที่น่ารู้เกี่ยวกับ LLM ✨ Dev Application LLM อย่างไรได้บ้าง ✨ มี framework อะไรให้เลือกใช้บ้าง

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Overview ภาพรวม การพัฒนาแอป LLM

Slide 9

Slide 9 text

นิยาม LLM A large language model (LLM) is a computational model capable of language generation or other natural language processing tasks. https://en.wikipedia.org/wiki/Large_language_model

Slide 10

Slide 10 text

นิยาม Multimodal LLM Multimodal = characterized by several different modes of activity or occurrence. https://research.google/blog/multimodal-medical-ai/

Slide 11

Slide 11 text

เราเรียกใช้ AI Model อย่างไรได้บ้าง?

Slide 12

Slide 12 text

Model Serving Application 📱 💻 🌐 Model 🤖 API API = “Client - Server” Client Server

Slide 13

Slide 13 text

Application 📱 💻 🌐 Model On-Device AI = “Edge”

Slide 14

Slide 14 text

Challenge?

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Language Model APIs

Slide 17

Slide 17 text

🏆 LMSYS Chatbot Arena Leaderboard https://chat.lmsys.org/?leaderboard

Slide 18

Slide 18 text

Artificial Analysis: เว็บเปรียบเทียบ AI models https://artificialanalysis.ai/

Slide 19

Slide 19 text

Artificial Analysis: Quality vs Price https://artificialanalysis.ai/

Slide 20

Slide 20 text

Artificial Analysis: API Prices https://artificialanalysis.ai/

Slide 21

Slide 21 text

Services to Host Language Models

Slide 22

Slide 22 text

Why self-host LLM? 💲 Cost efficient in long term (ie. on-premise) → Need to tune the latency to make the model faster ⚙ Customization & fine-tuning → No lock-in to a particular model 🔒 Security compliance & data residency / privacy

Slide 23

Slide 23 text

Run LLM locally LlamaFile github.com/Mozilla-Ocho/llamafile Ollama ollama.com/ LM Studio lmstudio.ai/

Slide 24

Slide 24 text

LLM Development Frameworks

Slide 25

Slide 25 text

LangChain 🦜🔗 Python / JS library framework for developing applications powered by large language models (LLMs). https://www.langchain.com/langchain

Slide 26

Slide 26 text

LlamaIndex Turn your enterprise data into production-ready LLM applications. (Python / Typescript) https://www.llamaindex.ai/

Slide 27

Slide 27 text

Semantic Kernel from Microsoft Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. https://github.com/microsoft/semantic-kernel

Slide 28

Slide 28 text

It’s fine not using any of these frameworks 󰙤

Slide 29

Slide 29 text

RAG Concept :Retrieval Augmented Generator

Slide 30

Slide 30 text

RAG - Ask→ Retrieve from DB → Generate Answer

Slide 31

Slide 31 text

Document Search example Vector DB

Slide 32

Slide 32 text

Vector Database https://www.graft.com/blog/top-vector-databases-for-ai-projects

Slide 33

Slide 33 text

RAG vs Fine-tuning

Slide 34

Slide 34 text

Agentic Workflow Agentic = behaves like an agent

Slide 35

Slide 35 text

Why Agentic? https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns

Slide 36

Slide 36 text

Agentic Workflow https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns

Slide 37

Slide 37 text

Crew AI https://www.crewai.com/

Slide 38

Slide 38 text

https://github.com/microsoft/autogen

Slide 39

Slide 39 text

Azure: Copilot Studio GCP: VertexAI Agent Builder

Slide 40

Slide 40 text

Inference / Serving

Slide 41

Slide 41 text

Text Generation Inference https://huggingface.co/docs/text-generation-inference/index

Slide 42

Slide 42 text

vLLM = Model serving for LLM Easy, fast, and cheap LLM serving for everyone vLLM is fast with: ✅ State-of-the-art serving throughput ✅ Efficient management of attention key and value memory with PagedAttention ✅ Continuous batching of incoming requests ✅ Fast model execution with CUDA/HIP graph ✅ Quantization: GPTQ, AWQ, SqueezeLLM, FP8 KV Cache ✅ Optimized CUDA kernels https://github.com/vllm-project/vllm Throughput: Higher is better

Slide 43

Slide 43 text

Responsible AI

Slide 44

Slide 44 text

https://ai.google/responsibility/responsible-ai-practices/

Slide 45

Slide 45 text

Google's Secure AI Framework https://safety.google/cybersecurity-advancements/saif/

Slide 46

Slide 46 text

Responsible AI ✅ ตรวจสอบความถูกต้องเสมอ ✅ Human-centered Design ออกแบบสําหรับคนใช้ ⚠ ระวังเรื่อง Data Privacy ความเป็นส่วนตัวของข้อมูล ⚠ Biases and Fairness ทําให้มีความเป็นธรรมกับผู้ใช้

Slide 47

Slide 47 text

Resources

Slide 48

Slide 48 text

https://www.promptingguide.ai/

Slide 49

Slide 49 text

Sunday 3 November 2024 @ K+ Building Samyan Register now: bit.ly/devfest-cloud-bkk24 Saturday 26 October 2024 @ Cleverse Register now: bit.ly/technologista-2024 ฝาก event :) Technologista By PyLadies x Women Techmakers DevFest Cloud Bangkok By GDG Cloud Bangkok

Slide 50

Slide 50 text

LLM Development Landscape Kamolphan Liwprasert (Fon) MLOps Consultant, AIMET.tech Google Developer Expert - Cloud