See discussions, stats, and author profiles for this publication at: [Link]
net/publication/313066371
Educational Data Mining: A Literature Review
Chapter in Advances in Intelligent Systems and Computing · September 2017
DOI: 10.1007/978-3-319-46568-5_9
CITATIONS READS
31 11,614
2 authors:
Carla SOFIA Silva Jose Manuel Fonseca
Universidade Atlântica Institute for the Development of New Technologies
23 PUBLICATIONS 50 CITATIONS 172 PUBLICATIONS 955 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Education View project
Deep Sleep View project
All content following this page was uploaded by Jose Manuel Fonseca on 13 October 2017.
The user has requested enhancement of the downloaded file.
Educational Data Mining:
a literature review
Track2 - ARTIFICIAL INTELLIGENCE IN EDUCATION
Distributed Artificial Intelligence in education (DAIED) and Web-
based AIED systems
Carla Silva1, José Fonseca2
Centre of Technologies and Systems (CTS) of Uninova, Lisbon, Portugal
Abstract. The adoption of learning management systems in education has been increasing
in the last few years. Various data mining techniques like prediction, clustering and relationship
mining can be applied on educational data to study the behavior and performance of the
students. This paper explores the different data mining approaches and techniques which can be
applied on Educational data to build up a new environment give new predictions on the data.
This study also looks into the recent applications of Big Data technologies in education and
presents a literature review on Educational Data Mining and Learning Analytics.
Keywords: EDM, Prediction, Clustering, Relationship Mining, learning management
systems
1 Introduction
A lot of research is going nowadays on the data-mining field. Educational Data
Mining is a major research field also known as EDM. It aims at devising and using
algorithms to improve educational results and explain educational strategies for
further decision making. This paper discusses some of the data mining algorithms
applied on education related areas. These algorithms are applied to extract knowledge
from educational data and study the attributes that can contribute to maximize the
performance. In fact, learning initially started in the class room and was based on
behavioral, cognitive and constructivist models [1],[2]. Behavioral models rely on
observable changes in the behavior of the student to assess the learning outcome.
Cognitive models are based on the active involvement of teacher in the learning
process. In the constructivist models, the students have to learn on their own from the
available knowledge. A new termed “Connectivism” which is characterized as the
“amplification of learning, knowledge and understanding through the extension of
personal network” appeared in the recent years. According to Siemens, learning is no
longer an internal, isolated activity [2],[3]. It is considered to be an act in a network of
1 Carla Silva - [Link]@[Link]
2
José Manuel Fonseca – jmrf@[Link]
nodes which improves the learning experience of students and reduces the need for
the direct involvement of a Professor. Actually, traditional learning environments
have gradually mutated into community based learning environments [4].
1.1 Data Mining, a concept and a challenge
Educational data mining can be defined as “An emerging discipline concerned with
developing methods for exploring the unique types of data that come from educational
settings and using those methods to better understand students, and the settings which
they learn in”[5]. EDM is the process of transforming raw data compiled by
educational systems in useful information that can be used to take informed decisions
and answer research questions. But the development of data mining and analytics in
the field of Education was fairly late, compared to other fields. However, the
challenge for educational data mining of online learning is due to its specific features
on data. While many types of data have sequential aspects, the distribution of
educational data over time has unique characteristics; for instance, a skill may be
encountered many times during a school year, but separated over time and in the
context of quite different activities [6]. Additionally, educational data mining methods
have been successful at modeling a range of phenomena relevant to student learning
in online intelligent systems and models are achieving better accuracy every year and
are being validated to be more generalizable over time. There are important aspects
that need to be discussed to justify the unique development for educational data,
which is the growing realization that not all key information is stored in one data
stream; the improvement in model quality, driven by continuing improvements in
methodology and the importance in existence that there are more published examples
of detectors than there are of detectors being used to drive intervention, like Ellucian
[6][7] which provided Professors with reports of whether students were at risk of
dropping or failing a course, and scaffolded Professors in how to intervene, leading to
better outcomes for learners. Research in education [8] has resulted in several new
pedagogical improvements. Computer-based technologies have transformed the way
we live and learn. Today, the use of data collected through [6] these technologies is
supporting a second-round of transformation in all areas and learning with different
achievements.
Data mining is a powerful new technology with great potential to help Schools and
Universities focus on the most important information in the data they have collected
about the behavior of their students and potential learners [9]. Data mining involves
the use of data analysis tools to discover previously unknown, patterns and
relationships in large data sets. These tools can include statistical models,
mathematical algorithms and machine learning methods. These techniques are able to
discover information within the data that queries and reports can't effectively reveal.
1.2 Literature Review
Many investigations have been carried out to demonstrate the importance of the "Data
Mining" techniques in education, demonstrating that this is a new concept for the
purpose of extracting valid and accurate information about the behavior and
effectiveness in the learning process [10][11].
In the field of education techniques "Data Mining" has also been used to analyze the
curriculum and subject of the current research topics, as well as to analyze the
students performance [12]. There have been several investigations made under this
proposed study object. For example, Bhardwaj used the Naïve Bayes algorithm to
predict student performance based on 13 variables [13]. The results were used to build
a model that is used to predefine the students who are at risk of failure and thus
activate a guidance and counseling program. Varghese, Tommy and Jacob [14] in
their research used the "K means" algorithm to cluster 8000 students based on five
variables (input average in the University average scores of the tests / exams, average
scores of papers, seminars notes and notes the work by frequency). The results
showed a strong relationship between attendance and student performance. Gulati and
Sharma [15] claim that knowledge through analysis by "Data Mining" can improve
the education system in orientation, student performance and organizations
management. Ayesha Mustafa [16] directed a study on evaluation, taking into account
the evolution of learning and analysis of tests at the beginning and end of the courses.
Bresfelean [17] conducted a study based on students’ results and how ease of these
can be provided. Cortez and Silva [18] conducted a research on the education system
in Portugal and the results showed that a good and accurate prediction can be
achieved. This is established by development tools that help improve the management
of education in schools and the effectiveness of learning, which is a very important
return. According to Sun [9], the result of the relationship between assessment and
learning is an important tool to monitor and guide a quality education. Noaman and
Al-Twijri [19] published a recent study applied to the entry requirements of the
University of Saudi Arabia. They used algorithms and with techniques they have
developed and a model that fits the public and the variables that describe it. They took
into account input admission to the frequency of notes in previous education,
admission notes and even the characteristics that describe the needs of the University.
Some studies show the impact of the use of Moodle by applying Data Mining [20].
Sun [9] describes the different data mining techniques that can be applied to promote
student learning on digital platforms. Aslam and Ashraf [21] used clustering
algorithm to provide a model of student learning. Some investigations [22] discussed
how data worked for Data improving the education system and enface knowledge in
the classroom. Vince Kellen in his case study, described the implementation of a
structured analysis tool for Data Mining - SAP's HANA at the University of
Kentucky, which estimates a value "k-score" for each student. This value will
determine the involvement and subsequent guidance for good student performance.
Grafsgaard, Wiggins, Boyer, Wiebe, and Lester [23] developed a system that
recognizes facial expressions based on frustration or understanding of students in the
classroom. They also used algorithms to detect unspoken behaviors and associate
them to the knowledge acquired. Seong Jae Lee [24] describe also a record for the use
of human behavior prediction models.
2 Approaches of data mining in educational data
Data mining is the field of computer science that aims to find out different potential
factors and patterns to help decision making.
Fig. 1. Intelligent System Model for Educational Data Mining
The model in Fig.1 intends to design the Educational Data Mining. In this way,
Data Mining can facilitate Institutional Memory. Data Mining [25], also popularly
known as Knowledge Discovery in Databases, refers to extracting or “mining"
knowledge from large amounts of data. An educational system typically has a large
number of educational data. This data [26] may be students’ data, teachers’ data,
alumni data, resource data, etc. EDM focuses on the development of methods for
exploring the unique types of data that come from an educational context. These data
come from several source, including data from traditional face-to-face class room
environment, educational software, online courseware, etc.
Data mining techniques are used to operate on large volumes of data to discover
hidden patterns and relationships helpful for decision-making. Various algorithms and
techniques such as Classification, Clustering, Regression, Artificial Intelligence,
Neural Networks, Association Rules, Decision Trees, Genetic Algorithm, Nearest
Neighbor method etc., are used for knowledge discovery from databases.
2.1 Clustering
Clustering can be defined as the identification and classification of objects into
different groups, or more precisely, the partitioning of a data set into subsets (clusters)
so that the data in each subset (ideally) share some common trait of similar classes of
objects (figure2).
Fig.2. Example3 of K means clustering using R
2.2 Classification
Classification models describe data relationships and predict values for future
observations (Figure 3). Classification is the task of learning a target function that
maps each attribute set X to one of the predefined class labels Y. There are different
classification techniques, namely Decision Tree based Methods, Rule-based Methods,
Memory based reasoning, Neural Networks, Naïve Bayes and Bayesian Belief
Networks, Support Vector Machines. In classification [26] test data is used to
estimate the accuracy of the classification rules. If the accuracy is acceptable, the
rules can be applied to the new data tuples. The classifier-training algorithm uses
these pre-classified examples to determine the set of parameters required for proper
discrimination.
Fig.3. Classification as a task
3 [27] Silva presented a cluster analysis of outcomes and incomes in education on Europe
2.4 Predication
Regression techniques (figure 4) can be adapted for predication [25]. Regression
analysis can be used to model the relationship between one or more independent
variables and dependent variables. In data mining, independent variables are attributes
already known and response variables are what we want to predict. Unfortunately,
many real-world problems are not simply prediction. Therefore, more complex
techniques (e.g., logistic regression, decision trees, or neural nets) may be necessary
to forecast future values.
Fig.4. Prediction as a task
3 Future Scope
There are increasing research interests in using data mining in education. This new
emerging domain, called Educational Data Mining, concerns with developing
methods that discover knowledge from data originated from educational
environments. Data mining is a tremendously vast area that includes employing
different techniques and algorithms for pattern finding. This paper is just a simple
review to this emerging field and aims to highlight the importance of its study.
4 References
[1] F. Bell, “Connectivism: Its place in Theory-informed research and innovation
in technology-enabled learning,” Int. Rev. Res. Open Distance Learn., vol. 12,
no. 3, pp. 98–118, 2011.
[2] G. Siemens, “Connectivism: A Learning Theory for the Digital Age,” Int. J.
Instr. Technol. Distance Learn., vol. 1, pp. 1–8, 2014.
[3] G. Siemens, “Connectivism,” A Learn. Theory Digit. Age http//www.
elearnspace. org/Articles/connectivism. htm, no. 2000, pp. 1–15, 2004.
[4] T. J. Ertmer, Peggy A; Newby, “Behaviorism, cognitivism, and
constructivism: Connecting yesterday’s theories to today’s contexts,”
Perform. Improv. Q., vol. 26, no. 2, pp. 43–71, 2013.
[5] S. A. Kumar and M. . Vijayalakshmi, “A Novel Approach in Data Mining
Techniques for Educational Data,” 3rd Int. Conf. Mach. Learn. Comput.
(ICMLC 2011) A, no. Icmlc, pp. 152–154, 2011.
[6] R. S. Baker, “Educational Data Mining : An Advance for Intelligent Systems
in Education,” AI Educ., pp. 78–82, 2014.
[7] R. S. Baker and P. S. Inventado, “Educational data mining and learning
analytics,” in Learning Analytics: From Research to Practice, Springer New
York, 2014, pp. 61–75.
[8] K. Sin and L. Muthu, “Application of big data in education data mining and
learning analytics-A literature review,” Ictact J. Soft Comput. Spec. Issue Soft
Comput. Model. Big Data, vol. 5, no. 4, pp. 1035–1049, 2015.
[9] H. Sun, “Research on Student Learning Result System based on Data
Mining,” Int. J. Comput. Sci. Netw. Secur., vol. 10, no. 4, pp. 203–205, 2010.
[10] M. Ramaswami and R. Bhaskaran, “A CHAID Based Performance Prediction
Model in Educational Data Mining,” Int. J. Comput. Sci. Issues, vol. 7, no. 1,
pp. 10–18, 2010.
[11] M. Ramaswami and R. Bhaskaran, “A Study on Feature Selection Techniques
in Educational Data Mining,” J. Comput., vol. 1, no. 1, pp. 7–11, 2009.
[12] A. Permata Alfiani and F. Ayu Wulandari, “Mapping Student’s Performance
Based on Data Mining Approach (A Case Study),” Ital. Oral Surg., vol. 3, pp.
173–177, 2015.
[13] V. Kumar, “An Empirical Study of the Applications of Data Mining
Techniques in Higher Education,” Int. J. Adv. Comput. Sci. Appl., vol. 2, no.
3, pp. 80–84, 2011.
[14] J. Varghese, Bindiya M; Tomy, Jose; Unnikrishnan, A; Poulose, “Clustering
Student Data to Characterize Performance Patterns,” Int. J. Adv. Comput. Sci.
Appl., pp. 138–140, 2010.
[15] P. Gulati and A. Sharma, “Educational data mining for improving educational
quality,” Int. J. Comput. Sci. Inf. Technol. Secur., vol. 2, no. 3, pp. 648–650,
2012.
[16] S. Ayesha, T. Mustafa, A. Raza Sattar, and M. I. Khan, “Data mining model
for higher education system,” Eur. J. Sci. Res., vol. 43, no. 1, pp. 24–29,
2010.
[17] V. P. Breşfelean, “Analysis and predictions on students’ behavior using
decision trees in weka environment,” in Proceedings of the International
Conference on Information Technology Interfaces, ITI, 2007, pp. 51–56.
[18] P. Cortez and A. Silva, “Using Data Mining To Predict Secondary School
Student Performance,” 5th Annu. Futur. Bus. Technol. Conf., vol. 2003, no.
2000, pp. pp. 5–12, 2008.
[19] M. I. Al-Twijri and A. Y. Noaman, “A New Data Mining Model Adopted for
Higher Institutions,” Procedia Comput. Sci., vol. 65, no. Iccmit, pp. 836–844,
2015.
[20] C. Romero, S. Ventura, and E. García, “Data mining in course management
systems: Moodle case study and tutorial,” Comput. Educ., vol. 51, no. 1, pp.
368–384, 2008.
[21] S. Aslam and I. Ashraf, “Data Mining Algorithms and their applications in
Education Data Mining,” Int. J., vol. 7782, pp. 50–56, 2014.
[22] A. P. K. H. Rashan, Data Mining Applications in the Education Secto. 2011.
[23] J. F. Grafsgaard, J. B. Wiggins, K. E. Boyer, E. N. Wiebe, and J. C. Lester,
“Predicting Learning and Affect from Multimodal Data Streams in Task-
Oriented Tutorial Dialogue,” Proc. 7th Int. Conf. Educ. Data Min., no. Edm,
pp. 122–129, 2014.
[24] S. J. Lee, Y. Liu, and Z. Popović, “Learning Individual Behavior in an
Educational Game : A Data-Driven Approach,” Proc. 7th Int. Conf. Educ.
Data Min., no. Edm, pp. 114–121, 2014.
[25] F. Ranadive and A. Z. Surti, “Hybrid Agent Based Educational Data Mining
Model for Student Performance Improvement,” no. 4, pp. 45–47, 2014.
[26] M. Swamy and M. Hanumanthappa, “Predicting academic success from
student enrolment data using decision tree technique,” Int. J. Appl. Inf. Syst.,
vol. 4, no. 3, pp. 1–6, 2012.
[27] C. Silva, “Does Education matter? Vocational education and social mobility
strategies in young people of Barcelona and Lisbon. A comparative study.,”
ULHT, 2014.
View publication stats