Predictive Analytics of Academic Performance of Senior High School (SHS) Students: A Case Study of Sunyani Shs
Predictive Analytics of Academic Performance of Senior High School (SHS) Students: A Case Study of Sunyani Shs
1, 2 Department of Computer Science and Informatics, University of Energy and Natural Resources, Ghana
ABSTRACT
Due to the availability and increasing adoption of technology in learning
management systems, online admission systems, school management systems, and
educational databases have expanded in recent years.
Motivation/Background: Literature shows that these data contain vital and relevant
information that could be used to monitor and advise students’ so that their
performance could be enhanced. In this study, the random forest algorithm is
proposed to identify and examine the factors that influence students’ performance in
WASSCE. Also, predict the future performance of students in WASSCE.
Method: A total of one thousand five hundred and twenty students’ data were
selected from Sunyani SHS. The results revealed that demographic data (age and
gender) do not influence the performance of students in their final WASSCE.
Results: However, an accuracy of 89.4% with error metrics (RMSE) 0.001639 and
Received 03 January 2022 MAPE error of 0.001321 revealed that the proposed model could effectively predict
Accepted 01 February 2022
the performance of students in the WASSCE.
Published 21 February 2022
Corresponding Author Keywords: Learning Management, Demographic Data, Effectively Predict, Random
Adjei-Pokuaa Henrietta, Forest Algorithm
herttydonkor@[Link]
DOI 10.29121/ijetmr.v9.i2.2022.1088
Funding: This research received no 1. INTRODUCTION
specific grant from any funding agency in
the public, commercial, or not-for-profit The size of the educational database keeps proliferating every three
sectors. years and these databases contain useful information that can be effectively
Copyright: © 2022 The Author(s). This is employed to improve the academic performance of students Yadav and Pal
an open access article distributed under (2012). There is a growing focus on institutional data mining by
the terms of the Creative Commons administrators, educational planners, and managers due to the exponential
Attribution License, which permits growth of educational data. (Anuradha & Velmurugan, 2015). Data mining
unrestricted use, distribution, and
reproduction in any medium, provided
techniques have been applied in educational systems to increase the
the original author and source are understanding of the process of learning by concentrating on key issues such
credited. as identifying, mining, and evaluating parameters and variables which are
associated with the learning progress of students Yadav and Pal (2012).
Students’ academic performance is a critical factor in every educational
setting, especially in senior high learning institutes, because educational
institutions are rated based on the academic excellence achieved by their
students and academic staff Mohamed et al. (2016). Forecasting the academic
performance of students in the real world has many challenges, but this has
been made easy with the growth in information and communication
technologies granting access to a large amount of information that could
facilitate the critical decision-making process. Knowledge Discovery in
How to cite this article (APA): Henrietta, A. P., and Adekoya, A. F. (2022). Predictive Analytics of Academic Performance of Senior 64
High School (Shs) Students: A Case Study of Sunyani Shs. International Journal of Engineering Technologies and Management
Research, 9(2), 64-81. doi: 10.29121/ijetmr.v9.i2.2022.1088
Predictive Analytics of Academic Performance of Senior High School (Shs) Students: A Case Study of Sunyani Shs
Databases (KDD) also called Data Mining (DM) is the process of extracting hidden
patterns and discovering of associations between parameters in a massive amount
of data (Ahmad et al., 2015; Kaur et al., 2015). DM techniques have made numerous
achievements in areas such as marketing, education, engineering, finance,
healthcare, and sports by helping decision-makers in finding a solution to everyday
problems in these fields. This study focused on predicting final WASSCE score of SHS
students’ using random forest algorithm based on their Demographic Data and Past
Exams records. Undoubtedly, this investigation will lead to students’ performance
prediction based on their demographic data and previous final examinations score,
and this will ensure an immense benefit as follows: By knowing this information,
teachers will be able to take the needed actions of the learning activities in the
classroom to enhance students' skills. Enhance better arrangement of sitting
position in class to maintain a balanced education and promote students to student
interaction among fast and slow learners. Knowing the performance of students at
the early stage helps teachers and schools’ authorities to take charge and provide
the needed assistance to the students involved
General Factors Affecting the Academic Performance of Students: Literature
has shown that academic is affected by four main factors and that includes: the effect
of teaching and learning materials on students’ academic performance, the effect of
administrative power of organizational practices on students’ academic
performance, the consequence of teacher related factors on students’ academic
performance, and finally the effect of socioeconomic background on students’
academic performance. Teaching and learning resources, and environment have
great impact on students’ Academic Performance. According to Kieti (2017)
educational creativity of teachers and students is believed to be a dependent of
available facilities within the educational settlement or environments. Abundance
in teaching and learning facilities and resources such as physical facilities, which
include the habitable state-of-class and lecture-rooms, computer-room, equipped-
library, laboratories, better dining halls, qualified and experienced teachers,
teaching and learning materials (TLMs) contributes immensely to the success of
students. As stated by Kieti (2017), Tran et al. (2017), Tuen et al. (2019) sufficiency
current textbooks, learning videos and software, and other materials coincidental to
learning have a good reflection on students’ performance when these studying
materials are supplied or provided sufficiently. As proposed by (Bydovska &
Popelinsky, 2013, Li et al. (2013) concluded that an excellent illumination, clean and
fresh air, safe, quiet, and comfortable learning environs are essential factors for
positive achievement academically by students. It has been confirmed by (James et
al., 2015, Morrison, 2017, Tuen et al. (2019) that, encouraging physical exercises,
diminishing stress, decreasing air and noise pollution, and exposure to heat is
excellent teaching and learning atmosphere. Studies have presented those
organizational practices have effects on students’ academic performance such
practice include but not limited to instructions, leadership, and management Kieti
(2017), Buabeng-Andoh, 2015, Egbenya and Halm (2016) Academic scholars have
presented in literature that teacher-related factors such as the teacher’s
commitment, motivation, workload, and frequency of absenteeism (Affum-osei et
al., 2014) have effects on students’ academic performance.
On the other hand, socio-economic background of the student: the Socio-
Economic Status (SES) of a child is generally determined by merging parents’
occupational status, educational level, and income level. Research has confirmed
over and over that, SES affects students’ academic performance and that students’
who have a low SES earn lower exams and quiz scores and are more likely to be
International Journal of Engineering Technologies and Management Research 65
Adjei-Pokuaa Henrietta, and Adebayo F. Adekoya
school dropout (Al-Rahmi & Zeki, 2017; Oladejo et al., 2011; Seah et al., 2015). The
consequence of SES on students’ academic performance has been found to dominate
other educational influential factors, like parental involvement. There are highly
negative associations with low SES and academic performance of students. Other
scholars have argued that, economically some parents are less capable of paying for
the cost of education of their children at advanced levels and subsequently, their
children do not work at their maximum potentials (Nguyen, 2017). The home
environs have a remarkably significant part to play on the academic performance of
every child.
Application of Data Mining in Academic Performance Prediction: Knowledge of
students’ performance in advance is key in the educational settings Bhardwaj and
Pal (2011), Kaur et al., 2015; Ming et al., 2014). The performance of a student is
pivoted on many dynamics like personal, psychological, social, and other
environmental features. An effective and efficient way to achieve students’ academic
performance prediction is the use of the technique of Data Mining (DM) mostly
referred to as Knowledge Discovery in Databases (KDD) (Ahmad et al., 2015; Kaur
et al., 2015). DM techniques are applied in an extensive database to notice unseen
relationships and patterns helpful in decision making (Argiddi & Apte, 2012; Kaur
et al., 2015; Khasanah and Harwati (2017), Ramesh et al., 2013). DM can also be
employed in educational institutions to promote understanding of the pedagogical
and learning process to focus on discovery and evaluating the variables related to
students learning Bhardwaj and Pal (2011), Devasia et al. (2016). DM applies to
Artificial Intelligence (AI) and machine learning algorithms in discovering these
hidden details in the educational database.
Numerous researchers have applied machine learning (ML) algorithms for
predicting and detecting factors that influence student academic performance.
Literature shows that the academic success of students is dependent on one factor
or the other. Hence, there is no dependable agreement among different studies
Table 1 present details of input data for prediction students’ academic performance.
Here are some of these studies. The student performance prediction system
based on a decision support system (DSS) was proposed by (Dole & Rajurkar, 2014)
using the Naïve Bayes algorithm. A novel DSS framework for predicting students’
performance concerning their final examinations of a school was carried out in
(Livieris et al., 2015) using a hybrid machine learning algorithm. A hybrid classifier
consisting of three parallel algorithms, namely DT, KNN, and Aggregating One-
Dependence Estimators (AODE) in Pandey and Taruna (2016) was implemented for
accurate prediction of students’ performance. An assembling of three DT (C4-5, ID3,
and CART) algorithms were applied on the data of students studying engineering to
predict their performance in their final examinations in Yadav and Pal (2012).
Engineering students’ academic performance was carried out in Li et al. (2013)
by lump together machine learning algorithms namely principal component
analysis, K-means, and hierarchical clustering, and KNN and NB classifiers. The
intellectual and non-intellectual actions of students, together with demographic
information was used as input parameters to a predictive framework proposed by
Musso et al. (2013) to predict students’ academic performance based on Artificial
Neural Networks (ANN). A combination of NB, Multilayer Perceptron (MLP), and DT
algorithms was proposed in Osmanbegović et al. (2014), Osmanbegovic and Suljic
(2012) to predict students’ academic performance. A decision tree classifier was
implemented to predict the performance of students in Berhanu (2015).
1. Make a list of classes (C) c = {c1 , c2 ,..., ct } where t is the total class, for this study t = 5
2. For each ci in c , where i = (1, 2,..., t )
3. Make a list of all the students (population) (N) in the class ( ci ) (the list contains their index
numbers)
4. Assign a sequential number to each student (1,2,3…n) in the class ( ci ) . This represents the
sampling frame (percentage of students for a class). The list from which the study draws its random
sample
6. Select the sample (k) of ( ci ) using a random number generator (rand ()), based on the
sampling frame (population size of ( ci ) )
7. Associate every generated number in (A) with its (n) equivalent number and select the index
number of students in ( ci )
8. Repeat steps 3 to 7 for all ci in c
Training Techniques: The training technique adopted for the current study is a
supervised machine learning technique, where the intended input variables
(demographic and past terminal exams record) are entered into the network to
produce the required output variable (WASSCE score).
The RF is applied to the Train_D and the model learning the pattern hidden in
the dataset, we measure the accuracy and error metrics and determine if they are
within the accepted values. If they are, then the learned pattern is then applied to
the (Test_D) to make a prediction.
Random forest (RF): This is an ensemble-learning technique that integrates the
performance of several decision tree algorithms to predict or classify the value of a
variable Rodriguez-Galiano et al. (2015), Wakefield, 2013). RF can be used for both
classification and regression ML tasks. In the RF technique, many DTs are created,
with each observation fed into every single DT. The optimal result in each
observation is used as the final output. A different observation is served into all the
trees and a majority vote is computed for each classification model. An error
estimate is made for the cases, which were not used throughout the tree building.
This is known as OOB (Out-of-bag) error estimate, which is stated as a percentage.
When an RF receives an input of (x), where x is a vector consisting of variables of
different evidential features examined for a given training area, the RF builds
several regression trees (N) and averages the results. Therefore, that for N tress
{T(x)} ^N the RF regression predictor is given by equation Equation 1.
𝟏𝟏
𝒇𝒇𝑵𝑵 𝑵𝑵
𝒓𝒓𝒓𝒓 (𝒙𝒙) = ∑𝑵𝑵=𝟏𝟏 𝑻𝑻(𝒙𝒙) Equation 1
𝑵𝑵
Choosing Variables to Split On: grow unpruned regression tree using the steps
for each of the bootstrap samples: At the individual node, indiscriminately sample
K variables and select the most exceptional split among those variables (K) rather
than picking the most excellent split amid all predictors. This practice is sometimes
International Journal of Engineering Technologies and Management Research 70
Predictive Analytics of Academic Performance of Senior High School (Shs) Students: A Case Study of Sunyani Shs
(c_t) =
̂ ave (y_i│x_i∈R_t) Equation 4
Let a splitting feature j and split point s, and express the pair of half-planes
𝑅𝑅1 (𝑗𝑗, 𝑠𝑠) = �𝑋𝑋|𝑋𝑋𝑗𝑗 ≤ 𝑠𝑠� 𝑎𝑎𝑎𝑎𝑎𝑎 𝑅𝑅2 (𝑗𝑗, 𝑠𝑠) = �𝑋𝑋|𝑋𝑋𝑗𝑗 ≥ 𝑠𝑠� Equation 5
𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚𝑚𝑚𝑚𝑚
� ∑𝑥𝑥𝑖𝑖 ∈𝑅𝑅1(𝑗𝑗,𝑠𝑠)(𝑦𝑦𝑖𝑖 − 𝑐𝑐1)2 + ∑𝑥𝑥𝑖𝑖 ∈𝑅𝑅2(𝑗𝑗,𝑠𝑠)(𝑦𝑦𝑖𝑖 − 𝑐𝑐2)2 � Equation 6
𝑐𝑐 1 𝑐𝑐 2
When the best split is obtained, the dataset is partitioned into two resulting
segments and echo the splitting procedure on each of the two segments. This
splitting procedure is reiterated until a predefined ending criterion (threshold) is
satisfied, five was set as the threshold for this study.
1
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = � ∑𝑚𝑚
𝑣𝑣=1(𝑡𝑡𝑣𝑣 − 𝑦𝑦𝑣𝑣 )
2
Equation 7
𝑚𝑚
Mean absolute percentage error (MAPE): This index shows an average of the
absolute percentage errors, the lower the MAPE, the better.
tv is the actual value, yv is the predicted value produced by the model, and m is
the total number of observations. The correlation coefficient (R): This criterion
reveals that the strength of associations between actual-values and predicted-
values. The correlation coefficient has a range from 0 to 1, and a model with a higher
R means it has better performance.
∑𝑚𝑚
𝑣𝑣=1(𝑡𝑡𝑣𝑣 −𝑡𝑡̅)(𝑦𝑦𝑣𝑣 −𝑦𝑦�)
𝑅𝑅 = Equation 9
�∑𝑣𝑣=1(𝑡𝑡𝑣𝑣 −𝑡𝑡̅)2 .∑𝑚𝑚
𝑚𝑚 (
𝑣𝑣=1 𝑣𝑣 −𝑦𝑦
𝑦𝑦 �)2
Table 2 shows the students’ demographic information. It was observed that out
of one thousand five hundred and twenty (1520) students, a majority (n = 850;
55.9%) were females while 670 (44.07%) were males. This outcome shows that the
female students were 26% more than their male counterparts were. The majority
(n = 950; 55.9%) of the students were within the age group of 16 – 18 years, which
confirms the normal age for one to be in SHS, while 180 of them representing 11.8%
were found to be in the age group of 14-15 years and 390 representing (25.7%)
were found with 19 years or more.
However, the minimum score (0) and maximum score (80.02) for core maths
indicates that the performance of a candidate in core maths need more
improvement. This outcome affirms Ablakwa’s reports that a high percentage of
candidate fails in the core maths (Ablakwa, 2014).
Additional statistical analysis was performed on the students’ continuous
assessment scores. Table 4 shows a one-sample test of 95% confidence interval.
Again, the difference in mean values in students’ continuous assessment shows a
close margin in the general performance of students.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
CM 1520 0 80.02 66.2028 7.6342
EG 1520 44.33 81.2 66.3631 7.59182
IS 1520 46 84.25 63.672 6.46139
SS 1520 45.5 84.75 64.8148 7.96228
EL1 1520 39.17 87.6 63.8926 7.54045
EL2 1520 34.17 83 65.2059 6.88485
EL3 1520 46.5 84.8 66.0408 6.87008
EL4 1520 29.83 89.6 65.7409 7.50126
Valid N (listwise) 1520
Test Value = 0
95% Confidence Interval of the
Difference
t df Sig. (2- Mean Lower Upper
tailed) diff.
CM 153.664 1520 0 66.20277 65.3551 67.0505
EG 174.377 1520 0 63.52882 62.812 64.2456
IS 144.143 1520 0 64.62707 63.7449 65.5092
SS 150.192 1520 0 63.70774 62.8731 64.5423
EL1 166.222 1520 0 65.05242 64.2824 65.8224
EL2 169.554 1520 0 65.97631 65.2107 66.7419
EL3 153.81 1520 0 65.6108 64.7715 66.4501
EL4 168.268 1520 0 65.36828 64.6039 66.1326
also highly related to the candidate success in the final WASSCE. This affirms the
reason behind core math being one of the compulsory subjects in all areas of studies.
Figure 4 shows the grade analysis based on model prediction against actuals on
the test dataset. It was observed that a good number of the students records at least
A1 in one or more of the subject grades for the final WASSCE. Also, the closeness of
the model predicted to the actual grade confirms the accuracy of the proposed
model (see Table 5 and 4.7). Based, on the outcome it can be inferred that a good
predictive model can help school authority to ascertain the grade analysis of their
students before the WAASCE exams.
results. However, based on the accuracy record by the model on the test data, it is
hoped that the predicted results will not be far from right when these students take
their final exams.
Table 6 Model predicted values for 20 students yet to write their WASSCE
(gender and age) has no effects on the performance of students in the WASSCE. A
high correlation was found between Integrated Science (SI) and Social Studies (SS),
and the performance of students in the WASSCE. The study found that students that
score within (40-55) in core maths were likely not to perform well in the final
WASSCE. The performance of students in the WASSCE is highly predictable at an
accuracy of 89.4%.
ACKNOWLEDGEMENTS
People who contributed to the work but do not fit criteria for authorship should
be listed in the Acknowledgments, along with their contributions. It is advised that
authors ensure that anyone named in the acknowledgments agrees to being so
named. Funding sources that have supported the work should also be cited.
REFERENCES
Adejo, O. W., & Connolly, T. (2018). Predicting Student Academic Performance Using
Multi-Model Heterogeneous Ensemble Approach. Journal Of Applied
Research In Higher Education, 10(1), 61-75. Retrieved from
[Link]
Agrawal, H., & Mavani, H. (2015). Student Performance Prediction Using Machine
Learning. 4(03), 111-113. Retrieved from
[Link]
Ahmed, A. B. E. D., & Ibrahim, S. E. (2014). Data Mining : A Prediction For Student's
Performance Using Classification Method. World Journal Of Computer
Application And Technology, 2(2), 43-47. Retrieved from
[Link]
[Link]. (2010). Random Forest Algorithm. Retrieved from
Https://[Link]/Imgres?Imgurl=Https%3A%2F%[Link]
[Link]%2Fwp-Content%2Fuploads%2F2015%2F06%2Frandom-
[Link]&Imgrefurl=Https%3A%2F%[Link]%2
Fblog%2F2015%2F06%2Ftuning-Random-Forest-
Model%2F&Tbnid=Bldygobmf_Oqom&Vet=12ahukewibtpbop-
_Qahvtlkqkhbfndnuqmygieguiarc3aq.I&Docid=Gp-
Attuquayefio, Niiboi, S., & Addo, H. (2014). Using The UTAUT Model To Analyze
Students' ICT Adoption. International Journal Of Education & Development
Using Information & Communication Technology, 10(3), 75-86. Retrieved
from
Http://[Link]/Login?Url=Http://[Link]/Logi
[Link]?Direct=True&Db=Ehh&AN=97923459&Site=Ehost-Live
Berhanu, F. (2015). Students ' Performance Prediction Based On Their Academic
Record. International Journal Of Computer Applications, 131(5), 27-35.
Retrieved from [Link]
Bhardwaj, B. K., & Pal, S. (2011). Data Mining : A Prediction For Performance
Improvement Using Classification. International Journal Of Computer
Science And Information Security, 9(4), 136-140.
Bosson-Amedenu, S. (2017). Predictive Validity Of Mathematics Mock Examination
Results Of Senior And Junior High School Students' Performance In WASSCE
International Journal of Engineering Technologies and Management Research 78
Predictive Analytics of Academic Performance of Senior High School (Shs) Students: A Case Study of Sunyani Shs