Building and deploying LLM applications with Apache Airflow

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Apache Airﬂow Building and deploying LLM applications with Apache Airﬂow! ODSC West 31 Oct 2024

Slide 3

Slide 3 text

Kaxil Naik Apache Airﬂow Committer & PMC Member Senior Director of Engineering @ Astronomer @kaxil @kaxil @kaxil

Slide 4

Slide 4 text

30M Monthly Downloads The Community 3.1K Contributors 37K GitHub Stars 54K Slack Community

Slide 5

Slide 5 text

Chat about Chatbots IRL Business Leader: Shouldn’t we be doing something with LLMs? Intern: Hey I built this notebook with LangChain? Data Engineer: Okay, yeah let me put that into production.

Slide 6

Slide 6 text

■ Ingestion from several sources ■ Day 2 operations on data pipelines: changing data sources, network blips etc ■ Data preparation: data cleaning & transformation ■ Data privacy: redacting PII data, tracking data lineage for audits ■ Data freshness: timeliness, SLAs ■ Model deployment & monitoring ■ Experimentation & ﬁne-tuning: different models, LLMs, SLMs etc ■ Feedback Loops Going from “Idea to Production” with LLM Apps involves solving a lot of data engineering problems:

Slide 7

Slide 7 text

Source: https://python.langchain.com/docs/use_cases/question_answering/ Typical Architecture for Q&A use-case using LLM - Airﬂow for RAG Data Ingestion & Processing Retrieval Output Storage Splitting Document Loading Vectorstore Database PDFs URLs LLM Prompt Splits Relevant Splits Query Retrieval Augmented Generation (RAG)

Slide 8

Slide 8 text

Python Native The language of data scientists and ML engineers. Pluggable Compute GPUs, Kubernetes, EC2, VMs etc. Common Interface Between Data Engineering, Data Science, ML Engineering and Operations. Data Agnostic But data aware. Extensible Standardize custom operators and templates for common DS tasks across the organization. Monitoring & Alerting Built in features for logging, monitoring and alerting to external systems. Ingestion Extract and load data into vectordbs and other destinations Day 2 Ops Handle retries, dependencies, and all other day 2 ops associated with data pipelines Airﬂow is a Natural Fit… Document Parsing Decorator and pythonic interfaces for standard LLM tools

Slide 9

Slide 9 text

Let’s Talk About a Real Use Case

Slide 10

Slide 10 text

Problem Statement: We have customers, employees, and community members that ask questions about our product (Astro) and Airﬂow with answers that exist across several sources of documentation. How do we provide an easy interface for folks to get their questions answered without adding further strain to the team and Airﬂow Contributors?

Slide 11

Slide 11 text

ask.astronomer.io

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

■ Airflow gives a framework to load data from APIs & other sources into LangChain ■ LangChain helps pre-process and split documents into smaller chunks depending on content type ■ After content is split into chunks, each chunk is embedded into vectors (semantic representations) ■ Those vectors are written to Weaviate for later retrieval Data Ingestion, Processing, and Embedding Embed chunks Write to Weaviate Pre-process and split into chunks 🦜🔗 LangChain Docs (.md) files Slack Messages GitHub issues Docs (.md) files

Slide 16

Slide 16 text

RAG (Ingestion) as an Airﬂow DAG

Slide 17

Slide 17 text

[ 0.13450, 0.72421, 0.20943, 0.18699, 0.75932, 0.69794 Vector Embedding for Unstructured Data “Apache Airflow is a platform created by the community to programmaticall y author, schedule and monitor workflows.ˮ Chunk of unstructured text Embedding Model Vector

Slide 18

Slide 18 text

Users can interact with UI or Slack Bot; they both use the same API ■ Original prompt gets reworded 3x using gpt-3.5-turbo ■ Answer is generated by combining docs from each prompt and making a gpt-4 call ■ State is stored in Firestore and prompt tracing is done through LangSmith 🦜🔗LangChain User Asks a Question Web App Slack Bot Original Prompt Rewording 2 Rewording 1 Rewording 3 Reword to get more related documents Vector DB search with prompts Combine docs and make ﬁnal LLM call to answer 🦜 🔗 Prompt Orchestration and Answering

Slide 19

Slide 19 text

[ 0.13450, 0.72421, 0.20943, 0.18699, 0.75932, 0.69794 Using Embeddings Vector A [ 0.17450, 0.22621, 0.10643, 0.18699, 0.55932, 0.99794 Vector B Question Related Document

Slide 20

Slide 20 text

■ Airﬂow DAGs process feedback async to evaluate answers on helpfulness,, relevance, and publicness ■ If answer is good, it gets stored in Weaviate and can be used as a source for future questions ■ UI also shows the most recent good prompts on the homepage When a user submits feedback, it gets stored in Firestore and LangSmith for later use User Rates Answer 🦜 🔗 Fetch new runs: input, output, and user feedback Classify Q&A according to helpfulness, relevance, and public 🦜🔗 LangChain If good answer, write to vector DB to use in future answers If good answer, mark as good to show on Ask Astro homepage On schedule LLM & Product Feedback Loops

Slide 21

Slide 21 text

Running this in production meant: ■ Experimenting with different sources of data to ingest ■ Running the pipelines on a schedule and ad-hoc (new Airﬂow release) ■ Running the same workloads with variable chunking strategies ■ Needing to retry tasks due to ﬁnicky python libraries and unreliable external services ■ Giving different parts of the workload variable compute ■ Creating standard interfaces to interact with external systems

Slide 22

Slide 22 text

■ Experimenting with different sources of data to ingest ■ Running the pipelines on a schedule and ad-hoc (new Airflow release) ■ Running the same workloads with variable chunking strategies ■ Needing to retry tasks due to finicky python libraries and unreliable external services ■ Giving different parts of the workload variable compute ■ Creating standard interfaces to interact with external systems Running this in production meant: Which is what Airflow’s great at!

Slide 23

Slide 23 text

ask.astronomer.io github.com/astronomer/ask-astro

Slide 24

Slide 24 text

Ask Astro - A Reference Implementation

Slide 25

Slide 25 text

a16z’s Emerging LLM App Stack Orchestration (Python/DIY, LangChain, LlamaIndex, ChatGPT) APIs/Plugins (Serp, Wolfram, Zapier, etc.) App Hosting (Vercel, Steamship, Streamlit, Modal) Query Output Prompt Few-shot examples Contextual data Playground (OpenAI, nat.dev, Humanloop) Data Pipelines (Databricks, Airflow, Unstructured, etc.) Embedding Model (OpenAI, Cohere, Hugging Face) Vector Database (Pinecone, Weaviate, Chroma, pgvector) LLM Cache (Redis, SQLite, GPTCache) Logging/LLMops (Weights & Biases, MLflow, PromptLayer, Helicone) Validation (Guardrails, Rebuff, Guidance, LMQL) Proprietary API (OpenAI, Anthropic) Open API (Hugging Face, Replicate) Opinionated Cloud (Databricks, Anyscale, Mosaic, Modal, Runpod) Cloud Provider (AWS, GCP, Azure, Coreweave) LLM APIs and Hosting Gray boxes show key components of the stack, with leading tools / systems listed. Arrows show the flow of data through the stack. Contextual data provided by app developers to condition LLM outputs Prompts and few-shot examples that are sent to the LLM Queries submitted by users Output returned to users Legend

Slide 26

Slide 26 text

AskAstro has a few parts of this… Orchestration (Python/DIY, LangChain, LlamaIndex, ChatGPT) APIs/Plugins (Serp, Wolfram, Zapier, etc.) App Hosting (Vercel, Steamship, Streamlit, Modal) Query Output Prompt Few-shot examples Contextual data Playground (OpenAI, nat.dev, Humanloop) Data Pipelines (Databricks, Airflow, Unstructured, etc.) Embedding Model (OpenAI, Cohere, Hugging Face) Vector Database (Pinecone, Weaviate, Chroma, pgvector) LLM Cache (Redis, SQLite, GPTCache) Logging/LLMops (Weights & Biases, MLflow, PromptLayer, Helicone) Validation (Guardrails, Rebuff, Guidance, LMQL) Proprietary API (OpenAI, Anthropic) Open API (Hugging Face, Replicate) Opinionated Cloud (Databricks, Anyscale, Mosaic, Modal, Runpod) Cloud Provider (AWS, GCP, Azure, Coreweave) LLM APIs and Hosting Gray boxes show key components of the stack, with leading tools / systems listed. Arrows show the flow of data through the stack. Contextual data provided by app developers to condition LLM outputs Prompts and few-shot examples that are sent to the LLM Queries submitted by users Output returned to users Legend

Slide 27

Slide 27 text

Airﬂow is foundational to best practices for all of this. Data Governance ■ How do you account for private data? ■ How do you provide transparency into data lineage? Fine Tuning ■ Does it improve results? ■ How much does it cost? Feedback Loops ■ Semantic cache for correct responses ■ Ranking sources based on accuracy and ranking accordingly ■ Prompt clustering – what are people asking? …but there’s even more to consider.

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text