Please use this identifier to cite or link to this item: https://hdl.handle.net/10321/4072
Title: Early prediction of students at risk in a virtual learning environment using ensemble machine learning techniques
Authors: Soobramoney, Ranjin 
Keywords: Students at Risk;Ensemble learning;Lazypredict;Machine Learning Algorithms;Virtual Learning Environment
Issue Date: 13-Dec-2021
Abstract: 
Students at risk (SAR) are those students who are considered to have a higher probability of
failing academically or dropping out of an academic programme. The literature reveals that
SAR is a global problem at Higher Education Institutions (HEIs). A high failure rate can not
only harm the reputation of the HEIs, but if left unchecked, can be detrimental to these HEIs.
The problem of identifying SAR is a pervasive and persistent one. However, early
identification of SAR will allow for timely and focused interventions, thereby reducing the
problem. Various techniques have been used by HEIs to identify SAR. The traditional
statistical approach is one such technique. One of the key challenges with this technique
however, is that it often requires a large amount of manual analysis of the data to predict SAR,
which in turn also makes early predictions of SAR more computationally challenging. To
overcome some of the challenges of the traditional statistical approach, machine learning-based
techniques have been proffered to predict SAR. Since machine learning (ML) models are based
on the input data rather than the underlying problem, they are expected to have better predictive
capabilities than traditional statistical models. Several ML-based techniques have been applied
to predict SAR with varying degrees of success. This study proposes the use of ensemble ML
techniques for early and accurate prediction of SAR using students’ demographic and weekly
online Virtual Learning Environment (VLE) data. Aggregating the predictions of a group of
ML classifiers is expected to provide a better generalization performance than each of the
individual classifiers on their own. The use of ensemble ML techniques for this study will
provide an improved solution to the problem of predicting SAR. To this end, this study focused
on training forty different ML predictive models, one for each week of the semester, using
twenty-five different ML classifiers. Each model was trained using students’ demographic data
combined with data from their weekly interactions with a VLE. Based on the training results,
four classifiers, namely AdaBoostClassifier, LGBMClassifier, RandomForestClassifier, and
XGBClassifier were selected as base learners for the ensemble classifier. Hyperparameter
optimization was performed using Random Search on each of the four classifiers. These
classifiers were then used to create a voting classifier ensemble for each of the forty weeks,
with 10-fold cross validation being used to evaluate the predictive models. The results show
that the voting classifier ensemble method outperformed the individual classifiers overall over
forty weeks and can thus provide an improved solution to the problem of predicting SAR.
Description: 
Submitted in fulfillment of the requirements for the Degree of Masters of Information and Communication Technology, Durban University of Technology, Durban, South Africa, 2021.
URI: https://hdl.handle.net/10321/4072
DOI: https://doi.org/10.51415/10321/4072
Appears in Collections:Theses and dissertations (Accounting and Informatics)

Files in This Item:
File Description SizeFormat
Soobramoney_R_2021_Redacted.pdf5.27 MBAdobe PDFView/Open
Show full item record

Page view(s)

494
checked on Dec 22, 2024

Download(s)

592
checked on Dec 22, 2024

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.