$30 off During Our Annual Pro Sale. View Details »

Integrating LLMs into Database Systems Education

Arnab Nandi
June 09, 2024
290

Integrating LLMs into Database Systems Education

Large Language Models (LLMs) have sparked a drastic improvement in the ways computers can understand, process, and generate language. As LLM-based offerings become mainstream, we explore the incorporation of such LLMs into introductory or undergraduate database systems education. Students and instructors are both faced with the calculator dilemma: while the use of LLM-based tools may “solve” tasks such as assignments and exams, do they impede or accelerate the learning itself? We review deficiencies of using existing off-the-shelf tools for learning, and further articulate the differentiated needs of database systems students as opposed to trained data practitioners. Building on our exploration, we outline a vision that integrates LLMs into database education in a principled manner, keeping pedagogical best practices in mind. If implemented correctly, we posit that LLMs can drastically amplify the impact of existing instruction, minimizing costs and barriers towards learning database systems fundamentals.

Full paper at https://arnab.org/files/Nandi_DataEd24_Integrating_LLMs_into_Database_Systems_Education.pdf

Arnab Nandi

June 09, 2024
Tweet

More Decks by Arnab Nandi

Transcript

  1. 1 Integrating LLMs into Database Systems Education Kishore Prakash, Shashwat

    Rao, Rayan Hamza, Jack Lukich, Vatsal Chaudhari, Arnab Nandi
  2. LLMs taking over education 6 • Initial Reaction: ban immediately!

    • “New Calculator”… “Plagiarism” • Detect and penalize • Understandable: Assignments and Exams • Synthesis and Essay Questions • Multiple Choice Questions: B+ • Unsupervised / Take-homes?
  3. “Banning ChatGPT” is not an option 7 • Too late:

    Pervasive use, variants • Readying students for an AI-enabled future • Onus is on educators to discover how to integrate LLMs into educational infrastructure
  4. Class Roles: Where does an LLM fit in? 9 •

    Instructor • Teaching Assistant • Textbook • Teaching Tools / Software / Autograder • Tutor
  5. Intuition behind “Tutor” 10 with infinite resources, what would we

    give every student? a personal tutor who assists the student in their learning journey
  6. Our Vision: DB Tutor 11 • Provide the students with

    an LLM-powered chat-based interface that prioritizes personalized learning • Leverage opportunities that are unique to database systems • Building such a system will take some thought and iteration
  7. Why LLMs are not the best fit 12 • LLMs

    are designed and trained to get to the right answer as quickly and efficiently as possible • Getting to the right answer without explanations can impede learning`
  8. DB Tutor: Challenges 13 • Bias in Responses • Students’

    over-reliance, critical thinking • Cheating and Misuse • Data Privacy and Security • Sensitivity to prompting
  9. Challenge: Bias in Responses 14 • LLMs have an inherent

    bias issue • Training data bias • Recency bias • Demographics bias • Use in learning: amplified effects • Fix training data, or model output
  10. Challenge: Over-Reliance, Critical Thinking 15 • High convenience = pervasive

    use • Long-term dependency • Loss of independent skills • Impedes deeper understanding • Loss of critical thinking (especially ability to notice LLM errors)
  11. Challenge: Cheating and Misuse 16 • “Super Tool” for Misuse

    • Easy to generate human-sounding content • Essay questions, multiple choice • Are take-home assignments still an option? • Detection is an arms-race • Previous Disruptions • Web search, Wikipedia, Calculators
  12. Envisioned System Architecture 17 LLM INFRASTRUCTURE (So0ware and data we

    will set up) ! Course Materials Syllabus, Slides, Tests " LLM Llama v2 or GPT4 via API Virtual Tutor Portal (What the student interacts with) # Learning Outcomes Report $ Chatbot % Database SQLite DBMS Virtual Tutor Engine (So9ware we will build in this research ac;vity) & Data Analysis Engine ' Prompt Engineering
  13. Elements of a DB Tutor 18 • Can we go

    beyond “ChatGPT for Database Education?” • What are some gaps we can fill?
  14. Elements of a DB Tutor 19 • Implicit Query Execution

    • Data Personalization • Learning Outcomes Report • Visual Step Throughs • Pop Quizzes
  15. Implicit Query Execution: NL 2 SQL 20 • LLMs hallucinate;

    let’s pipe all generated code against a runtime (Google Bard) • DBTutor: Before queries are shared with student, execute it against a sandboxed DB • Generate Synthetic Data and Schema • Use results (or errors) to improve query and explanations • Prompt: “What are some possible errors to anticipate with this query?” SQLite Prompt ⚡ SQL Annotated SQL Result Student
  16. Data Personalization 21 • Students are more engaged when examples

    are personalized • Use LLMs to generate sample data that they can relate to Travis Kelsey (American Football) Queries Taylor Swift (Music) Queries
  17. Learning Outcomes Report 22 Case Studies and Applications Entity-Relationship (ER)

    Model ER-to-Relational Model Relational Algebra Relational Calculus Functional Dependencies and Normalization SQL Object Relational Databases Embedded SQL Graphical User Interfaces Indexing and Query Optimization XML Active Databases Concurrency and Transaction Management ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Keep track and share what the student is learning; rewrite prompts to highlight gaps or assume knowledge
  18. Takeaways • Standard LLMs are not designed for education and

    pose several challenges • Many unique integration opportunities in database systems education • LLM-powered “DB Tutor” that prioritizes student learning 23