Title: Comparing random forest and AdaBoost with resampling for modeling imbalanced late payment tuition fee data
Abstract:This paper compares the Random Forest and AdaBoost classifier with resampling for modeling the imbalanced late payment tuition fee data. We utilize the Random Undersampling (RUS), Random Oversampling ...This paper compares the Random Forest and AdaBoost classifier with resampling for modeling the imbalanced late payment tuition fee data. We utilize the Random Undersampling (RUS), Random Oversampling (ROS), and Synthetic Minority Oversampling Technique (SMOTE) to have more balanced data. We used late payment tuition fee data of the IPB undergraduate program with regular admission from 2016 to 2018. The results showed that the best Random Forest classifier uses seven explanatory variables and 500 trees with Random Oversampling (ROS) method. The best AdaBoost classifier uses the optimal 80 iterations with Random Undersampling (RUS) method. The Random Forest-ROS and AdaBoost-RUS classifiers have ROC-AUC of 58.70% and 52.90%, respectively, indicating that the Random Forest-ROS classifier has better prediction than AdaBoost-RUS. The important variables for predicting the late payment tuition fee are the household's electric capacity, the father's income, and the number of children in the family.Read More