Slide 1

Slide 1 text

1 Integrating LLMs into Database Systems Education Kishore Prakash, Shashwat Rao, Rayan Hamza, Jack Lukich, Vatsal Chaudhari, Arnab Nandi

Slide 2

Slide 2 text

LLM-based services are taking over everything 4

Slide 3

Slide 3 text

LLM-based services are taking over education 5

Slide 4

Slide 4 text

LLMs taking over education 6 • Initial Reaction: ban immediately! • “New Calculator”… “Plagiarism” • Detect and penalize • Understandable: Assignments and Exams • Synthesis and Essay Questions • Multiple Choice Questions: B+ • Unsupervised / Take-homes?

Slide 5

Slide 5 text

“Banning ChatGPT” is not an option 7 • Too late: Pervasive use, variants • Readying students for an AI-enabled future • Onus is on educators to discover how to integrate LLMs into educational infrastructure

Slide 6

Slide 6 text

Where does an LLM fit into the education landscape? 8

Slide 7

Slide 7 text

Class Roles: Where does an LLM fit in? 9 • Instructor • Teaching Assistant • Textbook • Teaching Tools / Software / Autograder • Tutor

Slide 8

Slide 8 text

Intuition behind “Tutor” 10 with infinite resources, what would we give every student? a personal tutor who assists the student in their learning journey

Slide 9

Slide 9 text

Our Vision: DB Tutor 11 • Provide the students with an LLM-powered chat-based interface that prioritizes personalized learning • Leverage opportunities that are unique to database systems • Building such a system will take some thought and iteration

Slide 10

Slide 10 text

Why LLMs are not the best fit 12 • LLMs are designed and trained to get to the right answer as quickly and efficiently as possible • Getting to the right answer without explanations can impede learning`

Slide 11

Slide 11 text

DB Tutor: Challenges 13 • Bias in Responses • Students’ over-reliance, critical thinking • Cheating and Misuse • Data Privacy and Security • Sensitivity to prompting

Slide 12

Slide 12 text

Challenge: Bias in Responses 14 • LLMs have an inherent bias issue • Training data bias • Recency bias • Demographics bias • Use in learning: amplified effects • Fix training data, or model output

Slide 13

Slide 13 text

Challenge: Over-Reliance, Critical Thinking 15 • High convenience = pervasive use • Long-term dependency • Loss of independent skills • Impedes deeper understanding • Loss of critical thinking (especially ability to notice LLM errors)

Slide 14

Slide 14 text

Challenge: Cheating and Misuse 16 • “Super Tool” for Misuse • Easy to generate human-sounding content • Essay questions, multiple choice • Are take-home assignments still an option? • Detection is an arms-race • Previous Disruptions • Web search, Wikipedia, Calculators

Slide 15

Slide 15 text

Envisioned System Architecture 17 LLM INFRASTRUCTURE (So0ware and data we will set up) ! Course Materials Syllabus, Slides, Tests " LLM Llama v2 or GPT4 via API Virtual Tutor Portal (What the student interacts with) # Learning Outcomes Report $ Chatbot % Database SQLite DBMS Virtual Tutor Engine (So9ware we will build in this research ac;vity) & Data Analysis Engine ' Prompt Engineering

Slide 16

Slide 16 text

Elements of a DB Tutor 18 • Can we go beyond “ChatGPT for Database Education?” • What are some gaps we can fill?

Slide 17

Slide 17 text

Elements of a DB Tutor 19 • Implicit Query Execution • Data Personalization • Learning Outcomes Report • Visual Step Throughs • Pop Quizzes

Slide 18

Slide 18 text

Implicit Query Execution: NL 2 SQL 20 • LLMs hallucinate; let’s pipe all generated code against a runtime (Google Bard) • DBTutor: Before queries are shared with student, execute it against a sandboxed DB • Generate Synthetic Data and Schema • Use results (or errors) to improve query and explanations • Prompt: “What are some possible errors to anticipate with this query?” SQLite Prompt ⚡ SQL Annotated SQL Result Student

Slide 19

Slide 19 text

Data Personalization 21 • Students are more engaged when examples are personalized • Use LLMs to generate sample data that they can relate to Travis Kelsey (American Football) Queries Taylor Swift (Music) Queries

Slide 20

Slide 20 text

Learning Outcomes Report 22 Case Studies and Applications Entity-Relationship (ER) Model ER-to-Relational Model Relational Algebra Relational Calculus Functional Dependencies and Normalization SQL Object Relational Databases Embedded SQL Graphical User Interfaces Indexing and Query Optimization XML Active Databases Concurrency and Transaction Management ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Keep track and share what the student is learning; rewrite prompts to highlight gaps or assume knowledge

Slide 21

Slide 21 text

Takeaways • Standard LLMs are not designed for education and pose several challenges • Many unique integration opportunities in database systems education • LLM-powered “DB Tutor” that prioritizes student learning 23

Slide 22

Slide 22 text

24 Thank you