,
Fardina Fathmiul Alam
Creative Commons Attribution 4.0 International license
Undergraduate data science education faces a scalability challenge: addressing a high volume of diverse student questions stemming from varying levels of prior knowledge, technical skills, and learning styles - while ensuring timely and accurate responses. Traditional solutions like manual replies or generic chatbots often fall short in terms of contextual relevance, speed, and efficiency. To tackle this, we introduce RAGent, a Retrieval-Augmented Generation (RAG) agent tailored for a university-level data science course at the University of Maryland. RAGent integrates course-specific materials - lecture notes, assignments, and syllabi - to deliver fast, context-aware answers while maintaining low computational overhead. A central innovation of RAGent is its query classification system, which categorizes student questions into: (i) directly answerable, (ii) relevant but unresolved (requiring instructor input), and (iii) irrelevant or out-of-scope. This system uses semantic similarity, keyword relevance, and dynamic thresholds to drive a targeted prompting strategy, enhancing response accuracy. Another key feature is RAGent’s self-learning loop, which continuously improves performance by integrating resolved queries into its knowledge base and flagging unresolved ones for review and retraining. This dual mechanism ensures both immediate adaptability and long-term scalability. We evaluate RAGent using standard NLP metrics (accuracy, precision, recall, F1-score) and report strong performance in filtering and answering student queries. In a user study with 125 students, over 94% expressed a desire to keep RAGent in the course, citing improved clarity and helpfulness. These results suggest that RAGent significantly enhances support in data science education by providing accurate, contextual responses and reducing instructor workload - offering a scalable, adaptive alternative to conventional support methods. Future work will explore deployment across additional courses and institutions to further validate the RAGent’s adaptability.
@InProceedings{vetluzhskikh_et_al:OASIcs.ICPEC.2025.8,
author = {Vetluzhskikh, Mariia and Alam, Fardina Fathmiul},
title = {{RAGent: A Self-Learning RAG Agent for Adaptive Data Science Education}},
booktitle = {6th International Computer Programming Education Conference (ICPEC 2025)},
pages = {8:1--8:10},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-393-5},
ISSN = {2190-6807},
year = {2025},
volume = {133},
editor = {Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Portela, Filipe and Sim\~{o}es, Alberto},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2025.8},
URN = {urn:nbn:de:0030-drops-240387},
doi = {10.4230/OASIcs.ICPEC.2025.8},
annote = {Keywords: RAG, Agent, Chatbot, Data Science, Education, Query Classification, Information Retrieval, LLM}
}