Using Learning Analytics & Machine Learning to Enhance Early Detection of At-Risk Students in Higher Educational Institutions

Lam, Tu2026-01-292026-01-292026-01-29http://hdl.handle.net/10393/51329https://doi.org/10.20381/ruor-31717The rapid transition to remote learning during the Coronavirus Disease of 2019 (COVID-19) pandemic significantly disrupted student-instructor interaction, contributing to increased dropout rates in higher education. According to the 2022 Canadian Student Wellbeing Study, 72% of students experienced reduced in-person engagement, and 40% considered withdrawing due to insufficient institutional support. In response, Learning Analytics (LA) has gained prominence as a data-driven approach to enhancing student engagement, academic performance, and retention. This thesis addresses two core challenges: the need for a scalable LA framework and the development of an effective Machine Learning (ML)-based LA to identify at-risk students. Despite growing interest in LA, many studies overlook foundational implementation challenges and the limited capabilities of existing LA tools, such as Brightspace’s Student Success System (S3). To address these gaps, this study begins with a review of LA adoption in global and Canadian contexts, highlighting that while over half of Canadian Higher Education Institutions (HEI) are engaging in LA initiatives, empirical evidence of impact remains scarce. The thesis proposes a comprehensive LA architecture tailored to The University of Ottwawa (uOttawa), emphasizing early risk detection and addressing functional gaps in current Learning Management System (LMS) tools. Building on this foundation, the core of this work involves the development of ML model for at-risk student prediction. Following the Cross-Industry Standard Process for Data Mining (CRISP-DM), six classification algorithms are evaluated across four temporal checkpoints in the academic term. The models incorporate features from the Student Information System (SIS) and Brightspace LMS, including demographic attributes, academic history, and time-aware performance metrics. Feature importance analysis reveals a dynamic shift from reliance on historical SIS data to LMS-derived grade indicators as the course progresses. Notably, time-based features—especially phase-specific grade averages—proved crucial for late-phase predictions. Among the models tested, Random Forest (RF) and Extreme Gradient Boosting (XGB) consistently achieved the highest recall and accuracy, identifying 75–91% of at-risk students with overall accuracy between 60–81%. This thesis offers three key contributions: (1) a critical review of LA adoption and implementation strategies; (2) a proposed institutional LA architecture to support predictive analytics; (3) an empirical methodology of comparing classification models and time-aware feature dynamics, as a foundation for a deployable Early Warning System (EWS) to inform proactive interventions for improving student success and retention.enAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Learning AnalyticsMachine LearningAt-risk Student PredictionHigher EducationAcademic PerformanceEarly Risk DetectionDatawarehousePredictive AnalyticsClassification AlgorithmsCross-Industry Standard Process for Data Mining (CRISP-DM)Feature Importance AnalysisLearning Management System (LMS)Using Learning Analytics & Machine Learning to Enhance Early Detection of At-Risk Students in Higher Educational InstitutionsThesis