JKU Thesis, Project and Seminar

AI Master’s Thesis, Practicum and Seminar Multimodal Educational Video Summarization
RAG Mahdi Khashan

Motivations - Students use multiple educational contents for learning like
youtube playlists - Those playlists are lengthy to watch - There are duplication in them - Having context around a topic enhances learning - Student populate knowledge bases from multiple sources - Knowledge-Graphs in MultiModal RAG is not well explored - Agent-based RAG, explore interactive self-guided feedback

Applications - Q&A - Summarize Playlists - Concept Knowledge Graph

Methodologies Data Ingestion 1. Retrieve video + transcript + thumbnails/slides.
2. Extract audio embeddings for lecture style emphasis. Multi-Modal Encoding 3. Text encoder for transcripts and video titles. 4. Visual encoder for slides and key frames. 5. Audio encoder for emphasis (intonation/important points).

Methodologies Retriever (RAG) 1. Build a course knowledge base with
per-lecture embeddings. 2. Enable retrieval at the video, section, or concept level. Generator (LLM) 3. Summarize at different granularities: ◦ Lecture-level summary ◦ Module-level overview ◦ Entire playlist synthesis 4. Explore a knowledge graph of concepts.

Challenges - Evaluations - Classroom videos, How to learn whiteboard
contents - Reasoning (coherence and contextual understanding)

References - A Comprehensive Survey on Multimodal Retrieval-Augmented Generation (https://aclanthology.org/2025.findings-acl.861.pdf)

JKU Thesis, Project and Seminar

JKU Thesis, Project and Seminar

Mahdi Khashan

More Decks by Mahdi Khashan

Other Decks in Research

Featured

Transcript

AI Master’s Thesis, Practicum and Seminar Multimodal Educational Video Summarization

Motivations - Students use multiple educational contents for learning like

Applications - Q&A - Summarize Playlists - Concept Knowledge Graph

Methodologies Data Ingestion 1. Retrieve video + transcript + thumbnails/slides.

Methodologies Retriever (RAG) 1. Build a course knowledge base with

Challenges - Evaluations - Classroom videos, How to learn whiteboard

References - A Comprehensive Survey on Multimodal Retrieval-Augmented Generation (https://aclanthology.org/2025.findings-acl.861.pdf)