Gen AI using Airflow 3 | Airflow Summit 2024

Slide 1

Slide 1 text

Gen AI using Airﬂow 3

Slide 2

Slide 2 text

Introduction Kaxil Naik Airﬂow Committer & PMC Member Engineering Leader @ Astronomer Ash Berlin-Taylor Airﬂow Committer & PMC Member Engineering Leader @ Astronomer

Slide 3

Slide 3 text

The Changing AI Landscape Why New Solutions Are Needed!

Slide 4

Slide 4 text

Evolving AI Landscape Explosion of AI Models Increased Focus on Data Privacy & Control GPUs are easily accessible Cost Optimization Growing Complexity of AI Workflows Increasing Need for Experimentation

Slide 5

Slide 5 text

RAG Retrieval-Augmented Generation

Slide 6

Slide 6 text

What is RAG? Typical Architecture for Q&A use-case using LLM Data Store Retrieval Output Storage Splitting Document Loading Vectorstore Database PDFs URLs LLM Prompt Splits Relevant Splits Query

Slide 7

Slide 7 text

RAG (Ingestion) as an Airﬂow DAG Large data sets Unstructured Data Generate and Store Embeddings Dynamic Mapping for large number of incoming datasets (website content, directories of files, .) Reading, chunking, and Transformation Python libraries and frameworks for above Eg: Unstructured, LangChain, etc. Using AI providers: Open AI, Cohere, etc. Store into Weviate, PgVector, …

Slide 8

Slide 8 text

Ask Astro: Data Ingestion, Processing, and Embedding ■ Airflow gives a framework to load data from APIs & other sources into LangChain ■ LangChain helps pre-process and split documents into smaller chunks depending on content type ■ After content is split into chunks, each chunk is embedded into vectors (semantic representations) ■ Those vectors are written to Weaviate for later retrieval Embed chunks Write to Weaviate Pre-process and split into chunks 🦜🔗 LangChain Docs (.md) files Slack Messages GitHub issues Docs (.md) files

Slide 9

Slide 9 text

RAG (Ingestion) as an Airﬂow DAG

Slide 10

Slide 10 text

Challenges Python Dependencies Selective GPU Execution Dynamic model choice Supporting varied Python configurations and dependencies between tasks Keeping main execution on CPUs, only selectively call out to GPUs on remote clusters Change LLM model in response to cost/performance/new features

Slide 11

Slide 11 text

How Airflow 3 Helps

Slide 12

Slide 12 text

Solution part1 Task Execution Interface Python dependencies: - Different python dependencies for different tasks Cost-optimal Task Execution: - Data cleaning, Data transformation with CPUs - Model training w/ GPU as needed - less than 10% of tasks in a DAG

Slide 13

Slide 13 text

Current Airflow architecture DAG File Processor(s) Scheduler(s) Web Server Worker(s) Airflow Meta Database

Slide 14

Slide 14 text

Architectural decoupling: Task Execution Interface DAG File Processor(s) Scheduler Worker(s) Airflow Meta Database Web Server API Server Task SDK Task Execution Interface 3.0

Slide 15

Slide 15 text

Solution part2 common.llm Selective model choice: - Different model performance & accuracy - Complexity vs. Cost & response time tradeoff - Dynamic selection based on task requirements and constraints AI provider selection: - Based on execution environment (e.g., GPUs, CPUs) - Data security constraints for external vs local models

Slide 16

Slide 16 text

Solution part2 common.llm

Slide 17

Slide 17 text

Solution part2 common.llm

Slide 18

Slide 18 text

Example Inference as an Airﬂow DAG Rephrase the question Submit and get results Return results Use both original and re-phrased versions Query all versions of the question De-duplicate the results Optionally verify and rank the results Return results with sources

Slide 19

Slide 19 text

AI SQL Assistant: Inference Users enter a question in Natural language in the AI Assist Editor on the UI ■ Original prompt gets reworded 3x using gpt-3.5-turbo ■ DB Schema incl. table & column names & type is retrieved ■ Answer is generated by combining answers from each prompt and making a gpt-4 call 🦜🔗LangChain User Asks a Question Web App Original Prompt Rewording 2 Rewording 1 Rewording 3 Reword to get more related SQL queries Vector DB search with prompts DB DB schema + table & column names and col type Combine and make ﬁnal LLM call to answer

Slide 20

Slide 20 text

Challenges and upcoming enhancements Batch-triggered Dag Runs & Experimentation Dynamic model choice Synchronous DAG run Eliminate the execution date constraint Concurrent runs of the same DAG i.e. non-data-interval DAGs. commom.llm to dynamically change AI provider and model Inference DAGs return results upon completion Trigger API to support synchronous execution

Slide 21

Slide 21 text

Batch-triggered Dag Runs - Non-data-interval based: No reliance on execution dates or schedules. - Ad-hoc invocation via API calls for inference allowing multiple instances to be triggered by API calls at the same time. Enables Experimentation - Run the same DAG with different parameters simultaneously, independent of the execution date. - Ideal for AI/ML workflows like: - Experiment with multiple models for embedding - Retraining models - Experimenting a new data source for RAG - Hyperparameter tuning Solution part3 Ad-hoc Dag Runs

Slide 22

Slide 22 text

Data Assets - Dataset renamed to Data Asset to include Models, Reports, Embedding etc - Versioned Assets: Improved experiment tracking & Iterative changes - Enhanced UI support that allow visualization of “Data Asset Metadataˮ. - Example: RMSE value changes due to different parameters - Audit: Every version of data assets can be audited and compared across different experimental runs. Solution part4 Experimentation Tracking

Slide 23

Slide 23 text

Solution part5 Synchronous DAG run Consumer of Inference DAG runs need results: - Current model: Final Task in DAG to store results in Blob storage - Ideal to add API support for it - Will support long-running DAGs, since timing is unpredictable Example: - Laurel: Automated timekeeping - Does not require “real-time chatbot style responsesˮ Other examples: - Evaluation of mortgage applications

Slide 24

Slide 24 text

Solution part5 "Synchronous" DAG run

Slide 25

Slide 25 text

How Airﬂow 3 helps? Explosion of AI Models Increased Focus on Data Privacy & Control GPUs are easily accessible Cost Optimization Growing Complexity of AI Workflows Increasing Need for Experimentation common.llm Task Execution Interface common.llm Ad-hoc Dag Runs Data Assets Task Execution Interface Sync. DAG run

Slide 26

Slide 26 text

In Summary Many organizations already using Airflow for Gen AI applications We need your feedback as we add these capabilities into Airflow 3 Recruiting beta users: - Building Gen AI platforms and use cases Come speak at the next Airflow Summit about your use case on Airflow 3!