Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JKU Thesis, Project and Seminar

Avatar for Mahdi Khashan Mahdi Khashan
September 16, 2025

JKU Thesis, Project and Seminar

Avatar for Mahdi Khashan

Mahdi Khashan

September 16, 2025
Tweet

More Decks by Mahdi Khashan

Other Decks in Research

Transcript

  1. Motivations - Students use multiple educational contents for learning like

    youtube playlists - Those playlists are lengthy to watch - There are duplication in them - Having context around a topic enhances learning - Student populate knowledge bases from multiple sources - Knowledge-Graphs in MultiModal RAG is not well explored - Agent-based RAG, explore interactive self-guided feedback
  2. Methodologies Data Ingestion 1. Retrieve video + transcript + thumbnails/slides.

    2. Extract audio embeddings for lecture style emphasis. Multi-Modal Encoding 3. Text encoder for transcripts and video titles. 4. Visual encoder for slides and key frames. 5. Audio encoder for emphasis (intonation/important points).
  3. Methodologies Retriever (RAG) 1. Build a course knowledge base with

    per-lecture embeddings. 2. Enable retrieval at the video, section, or concept level. Generator (LLM) 3. Summarize at different granularities: ◦ Lecture-level summary ◦ Module-level overview ◦ Entire playlist synthesis 4. Explore a knowledge graph of concepts.
  4. Challenges - Evaluations - Classroom videos, How to learn whiteboard

    contents - Reasoning (coherence and contextual understanding)