Boosting Student Dropout Prediction using Synthetic Minority Oversampling Technique and Binary Harris Hawks Optimization
This study introduces an enhanced framework for student dropout prediction by integrating Binary Harris Hawks Optimization (BHHO) with boosting algorithms and Synthetic Minority Over-sampling Technique (SMOTE). We evaluated the framework using two datasets: a multi-class dataset of 6,847 student records and a binary dataset of 1,044 instances. The methodology combines BHHO for hyperparameter optimization of boosting algorithms (AdaBoost, Gradient Boosting, XGBoost, and CatBoost) with SMOTE for addressing class imbalance. Experimental results demonstrate the framework's superior performance compared to other heuristic approaches (BAO and BGWO), with XGBoost-BHHO achieving 97.8% accuracy on multi-class prediction and 90.8% on binary classification. The models maintained high F1-scores (0.889 and 0.921), precision (94.1%), and recall (84.2%), with ROC analysis confirming robust performance (AUC ≈ 0.90). These findings represent a significant advancement in educational data mining, providing institutions with a reliable early warning system for identifying at-risk students and enabling timely interventions.