
1. INTRODUCTION
Failure to achieve educational goals negatively affects society as a whole and is a serious problem.
This problem can manifest itself most significantly during periods of drastic changes, one of which
was the introduction of distance learning during the COVID-19 pandemic. To quantify the influence of
this event on educational system, a variety of quantitative models based on modern statistical methods
in combination with Big Data approaches can be used, as has shown in Li et al. [2021].
Machine learning (ML) is one of the new and actively developing methods of analysis, combining
approaches that can "learn" based on the received data, which allows to perform a wide range of
different tasks. ML can be used to solve problems of detection, recognition, prediction, prediction,
diagnostics, and optimization.
A large number of huge datasets has been accumulated recently in educational system, which can be
used to analyze and then improve educational process, as was demonstrated by Park [2020]. For
example, Livieris et al. [2019] analyze a dataset consisting of performance of 3716 students in course
of Mathematics of the first 5 years of secondary school. They develop two semisupervised machine
learning algorithms to predict students’ performance in the final examinations and then evaluate
methods’ accuracy. Authors compare these two methods with supervised machine learning method and
as a result, these approaches outperform it, and the final accuracy exceeds 80%.
Jeslet et al. [2021] used well-known algorithms of machine learning Logistic Regression and Support
Vector Machine to predict whether student is eligible to acquire a degree or not. Authors analyzed
dataset of 1460 students’ final year’s results and obtained a model trained to 99.27% and 99.72%
accuracy. Also, Nuanmeeseri et al. [2022] analyzed dataset of 1650 university students’ academic
performance. As a result, after adjusting model’s parameters, authors achieved accuracy of 96.98%, so
their model outperformed other considered machine learning methods and can be effectively used to
evaluate significant academic performance factors in drastically changing period.
In our work, we study changes of academic performance of whole school grades in the framework of a
variety of machine learning methods with the following feature importance analysis to identify
significant parameters that affect academic performance the most after the introduction of distance
learning format due to the COVID-19 pandemic.
2. MACHINE LEARNING METHODS AND FEATURE
IMPORTANCE
2.1 MACHINE LEARNING TECHNIQUES
Hastie et al. [2009] introduce Machine learning as a set of mathematical techniques that give computer
algorithms an ability to learn. This methodology is based on the input and required output of the algorithms
and can automate the way how humans are able to carry out the task, as stated by Mnih et al. [2015].
Ensemble methods are groups of algorithms that use several machine learning methods at once and makes
correction of each other's errors. Bostanabad et al. [2016] define supervised learning as a type of algorithms
where the method is supplied with example inputs along with the required output, which then allows it to
learn a rule that maps inputs to outputs. Bengio et al. [2013] state that in unsupervised learning, on the
contrary, only the inputs are supplied, and the learning algorithm is required to determine the structure of
the input and perform according to unknown characteristics [10].
In this work we use supervised machine learning methods: Decision Tree, Gradient Boosting, K-nearest
neighbors (KNN) Regressor, Lasso Regression, Linear Regression and MultiLayer Perceptron neural
networks, Support Vector Regressor; and ensemble method: Random Forest.
https://doi.org/10.17993/3ctic.2022.112.136-144
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
138