Abstract Educational data mining has become an efective tool for exploring the hidden relationships in
educational data and predicting students’ academic achievements. This study proposes a new model
based on machine learning algorithms to predict the fnal exam grades of undergraduate students, taking
their midterm exam grades as the source data. The performances of the random forests, nearest
neighbour, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbour
algorithms, which are among the machine learning algorithms, were calculated and compared to predict
the fnal exam grades of the students. The dataset consisted of the academic achievement grades of
1854 students who took the Turkish Language-I course in a state University in Turkey during the fall
semester of 2019–2020. The results show that the proposed model achieved a classifcation accuracy of
70–75%. The predictions were made using only three types of parameters; midterm exam grades,
Department data and Faculty data. Such data-driven studies are very important in terms of establishing
a learning analysis framework in higher education and contributing to the decision-making processes.
Finally, this study presents a contribution to the early prediction of students at high risk of failure and
determines the most efective machine learning methods. Keywords: Machine learning, Educational data
mining, Predicting achievement, Learning analytics, Early warning systems Open Access © The Author(s)
2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format,
as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a
credit line to the material. If material is not included in the article’s Creative Commons licence and your
intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/. RESEARCH Yağcı Smart Learning Environments (2022) 9:11
https://doi.org/10.1186/s40561-022-00192-z Smart Learning Environments *Correspondence:
[email protected] Kırşehir Ahi Evran University, Faculty of Engineering and Architecture,
40100 Kırşehir, Turkey Yağcı Smart Learning Environments (2022) 9:11 Page 2 of 19 students’ asking
questions. In recent years, EDM has become an efective tool used to identify hidden patterns in
educational data, predict academic achievement, and improve the learning/teaching environment.
Learning analytics has gained a new dimension through the use of EDM (Waheed et al., 2020). Learning
analytics covers the various aspects of collecting student information together, better understanding the
learning environment by examining and analysing it, and revealing the best student/teacher
performance (Long & Siemens, 2011). Learning analytics is the compilation, measurement and reporting
of data about students and their contexts in order to understand and optimize learning and the
environments in which it takes place. It also deals with the institutions developing new strategies.
Another dimension of learning analytics is predicting student academic performance, uncovering
patterns of system access and navigational actions, and determining students who are potentially at risk
of failing (Waheed et al., 2020). Learning management systems (LMS), student information systems
(SIS), intelligent teaching systems (ITS), MOOCs, and other web-based education systems leave digital
data that can be examined to evaluate students’ possible behavior. Using EDM method, these data can
be employed to analyse the activities of successful students and those who are at risk of failure, to
develop corrective strategies based on student academic performance, and therefore to assist educators
in the development of pedagogical methods (Casquero et al., 2016; Fidalgo-Blanco et al., 2015). Te data
collected on educational processes ofer new opportunities to improve the learning experience and to
optimize users’ interaction with technological platforms (Shorfuzzaman et al., 2019). Te processing of
educational data yields improvements in many areas such as predicting student behaviour, analytical
learning, and new approaches to education policies (Capuano & Toti, 2019; Viberg et al., 2018). Tis
comprehensive collection of data will not only allow education authorities to make data-based policies,
but also form the basis of software to be developed with artifcial intelligence on the learning process.
EDM enables educators to predict situations such as dropping out of school or less interest in the
course, analyse internal factors afecting their performance, and make statistical techniques to predict
students’ academic performance. A variety of DM methods are employed to predict student
performance, identify slow learners, and dropouts (Hardman et al., 2013; Kaur et al., 2015). Early
prediction is a new phenomenon that includes assessment methods to support students by proposing
appropriate corrective strategies and policies in this feld (Waheed et al., 2020). Especially during the
pandemic period, learning management systems, quickly put into practice, have become an
indispensable part of higher education. While students use these systems, the log records produced
have become ever more accessible. (Macfadyen & Dawson, 2010; Kotsiantis et al., 2013; Saqr et al.,
2017). Universities now should improve the capacity of using these data to predict academic success and
ensure student progress (Bernacki et al., 2020). As a result, EDM provides the educators with new
information by discovering hidden patterns in educational data. Using this model, some aspects of the
education system can be evaluated and improved to ensure the quality of education.