ABSTRACT

Educational Data Mining is a leading area for high quality research that mines data sets to answer educational research questions that shed light on the learning process. It is a new trend in the data mining and Knowledge Discovery in Databases (KDD) field that focuses in mining useful patterns and discovering useful knowledge from the educational information systems. One of the application areas of Educational data mining is analysis and prediction of student’s academic performance. The vision of any higher educational institution is to improve the quality of managerial decisions and to impart quality of education. Good prediction of student’s success in higher learning institution is one way to reach the highest level of quality in higher education systems. The need to identify low performing students at the beginning of the learning process and offer academic advice will enhance their academic performance and, further improve the overall educational quality. Measuring academic performance of students is challenging since students academic performance hinges on diverse factors such as personal and academic related factors. This research explores multiple factors theoretically assumed to affect students’ performance in higher education, and find a qualitative model which best classifies and predicts postgraduate students’ performance based on related factors. Two existing techniques, K-Nearest Neighbour (KNN) and Naïve Bayes predicted students’ performance; but were however limited in the accuracy of predication, and a hybrid model K-Bay which is a combination of K-NN and Naïve Bayes was proposed. The dataset used for the analysis includes student’s attributes like academic grades, demographic attributes, work related attributes, social attributes and school related attributes. Questionnaires were used for collecting data from the students. Educational data mining technique and Object Oriented Analysis and Design Methodology (OOADM) and Knowledge Discovery in Database (KDD) were adopted. In building the classification model, student data set consisting of 499 different instances with 33 different attributes were implemented on the algorithms. Analysis and ranking of factors/attributes affecting students’ performance were achieved using Correlation Based Feature Selection (CFS) in which five highly influencing factors were selected. The results were evaluated and compared for better accuracy of prediction. Using all the attributes, the system realized an accuracy of 95.92% as against the single classifiers; Naïve Bayes and KNN which had an accuracy of 69.39% and 71.43/% respectively; Execution time for the new model was 0.134 seconds while KNN and Naive Bay was 0.357 and 0.18 seconds respectively. Using only the highly influencing attributes, the system realized an accuracy of 99% as against the single classifiers; Naïve Bayes and KNN which had an accuracy of 75.51%and 59.18% respectively. Hence K-Bay prediction model produced a more reliable and accurate system for students’ academic performance.

CHAPTER ONE

INTRODUCTION

Background of the Study

Academic performance or “academic achievement” is the extent to which a student, teacher or institution has attained their short or long-term educational goals. Commonly measured through examination or continuous assessment, there is however no general agreement on how it is best evaluated or which aspects are most important. Academic achievement plays an important role in determining the worth of graduates who will be responsible for the social and economic growth of the country.

Educational Data Mining (EDM) is a new discipline, focusing on studying the methods and creating models to utilize educational data, using those methods to better understand students and their performance. Educational data mining has become a vital need for academic institutions to improve the quality of education. Kumar et al., 2011 analyzed it as the process of transforming raw data compiled by educational systems to useful information that can be used to take informed decisions and answer research questions. Educational data mining methods have been successful at modeling a range of research relevant to student learning in online intelligent systems. Models also achieve better accuracy every year and are being validated to be more generalizable over time. Research in education has resulted in several new pedagogical improvements. Computer-based technologies have transformed the way we live and learn. Today, the use of data collected through these technologies is supporting a second-round of transformation in all areas and learning with different achievements.

Baker and Yacef (2009) summarized the four goals of educational data mining:

Predicting students’ future learning behavior by creating student models that categorizes a students characteristics or states that make up the students’ knowledge, motivation, meta- cognition and attitudes
Discovering or improving knowledge domain models that explains the interrelationship between a knowledge in a domain and the materials that characterize the content to be learned

Studying the most effective pedagogical support for students learning that can be achieved through learning systems.
Establishing empirical evidences to support pedagogical theories, framework and educational phenonena to determine core influential components of learning to enable the designing of better learning systems.

Educational Data Mining involves the application of data mining techniques to the following educational problems

Providing Feedback for Supporting Instructors
1. Student modeling,
1. Detecting Student Behavior
1. Predicting Student’s Performance
1. Recommendations for Students
1. Grouping students
1. Constructing Courseware
1. Planning and Scheduling
1. Students Social Network Analysis

Predictive modeling is the general concept of building a model that is capable of making predictions. Typically, such a model includes a machine learning algorithm that learns certain properties from a training dataset in order to make those predictions. Predictive modeling can be divided further into two sub areas: Regression and pattern classification. Regression models are based on the analysis of relationships between variables and trends in order to make predictions about continuous variables, e.g., the prediction of the maximum temperature for the upcoming days in weather forecasting. In contrast to regression models, the task of pattern classification is to assign discrete class labels to particular observations as outcomes of a prediction.

Predictive modeling requires four components; the methodology followed to deploy the model, the data mining techniques adopted to build the model, input attributes used by the model and the performance metrics used to evaluate the system.

Academic performance prediction involves analysis and involvement of educational data mining techniques for the purpose of predicting student’s performance. Based on prediction results, if the student needs are fulfilled timely, then the overall result and performance will increase year by year. The success of any educational institute depends upon the success of the students of the Institution. Student’s performance prediction and its analysis are essential for improvement in various attributes of students like final grades, attendance etc. This prediction helps teachers in identification of weak students and to help them improve in the studies. Improvement of student performance and enhancement of quality of education is of utmost importance for all educational institutions.

For the purpose of performance analysis and prediction, important attributes and previous records of students are gathered. Subsequently, various data mining techniques are applied to get deeper insights and predictions. Data mining techniques refers to the algorithms used in extracting data from a large repository of data. In recent years, various data mining techniques have been used such as Naïve Bayes, Decision tree, Nearest neighbor, support vector machine, neural networks, outlier’s detections and advanced statistical techniques. These techniques are applied on the student data in order to get information, to help in decision support systems, and pattern extracting etc.

Universities focus on the most important information in the data they have collected about the behavior of their students and potential learners. Data mining involves the use of data analysis tools to discover previously unknown, patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms and machine learning methods. These techniques are able to discover information within the data that queries and reports can’t effectively reveal.

Academic advising is a decision-making process by which students realize their maximum educational potential through communication and information exchanges with an advisor. The purpose of academic advising is to assist students in the development of meaningful educational and career goals. Academic advisors assist students in developing educational plans that will help them achieve their life goals. Academic advisors at the university level provide information about academic progress and degree requirements, and carefully review students’ academic and educational needs, performance, and challenges.

The differential students’ performance in tertiary institutions is a source of great concern and research interest to the higher education managements, government, parents and other stakeholders because of the importance of education to national development. There is need to extract useful information from the available students’ large dataset and inform academic policies on how best to improve student retention rates, allocate teaching and support resources, or create intervention strategies to mitigate factors that affect student performance (Kuyoro et al., 2013). Maximizing the potential of students, providing evidence of delivering value for money to the bodies that fund them, and performing up to expectation is very crucial to tertiary institutions. Most institutions are often judged by the quality of the awards they provide; for instance, the more honours level graduates a course provides, the better the course is perceived to be. This provides additional quest for institutions to take proactive steps to investigate students’ data with a view of finding useful information that can aid planning activities, decision making and students’ intervention strategies. It is necessary to carefully measure student outcomes or expected outcomes that may provide evidence as to whether student potential is being realized against some benchmarks (Kuyoro et al., 2013)

Student’s academic performances are affected by many factors, like personal, socio-economic and other environmental variable (Baradwaj 2011). Knowledge about these factors and their effect on student performance can help in managing their effect. According to Ventura and Romero, (2011), poor performance of students in tertiary institutions has been partly traced to poor academic background and wide range of other predictors, including personality factors, intelligence, gender, academic achievement, previous college achievements, and demographic data. Many researchers have come to some interesting conclusion as to which of these predictors has impacted students’ academic performance in tertiary institutions. There is a growing interest and concern in many countries about the problem of school failure and the determination of its main contributing factors. This problem has been referred to as “the one hundred factors problem”. Different predictors including gender, personality factors, intelligence, aptitude tests, academic achievement, previous college achievements, and demographic data have been identified in literature as contributors to students’ academic performance.

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn, identify patterns, make decisions and improve from experience without being explicitly programmed. The primary aim of machine learning is to develop computer

programs that can access data and use it learn for themselves without human intervention or assistance. Machine learning is divided into supervised, unsupervised, semi supervised and reinforcement leaning

Supervised learning is the machine learning task of inferring a function from labeled training data. Labeled data is a dataset that contains both the input and the output data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. Unsupervised learning is the training of machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. Semi- supervised learning is a class of machine learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. Reinforcement learning is a kind of machine learning in which artificial intelligent agents attempt to find the optimal way to accomplish a particular goal, or improve performance on a specific task. As the agent takes action that goes toward the goal, it receives a reward. The overall aim: predict the best next step to take to earn the biggest final reward.

Classification is a supervised learning approach in which the computer program learns from the data input given to it and then uses this learning to classify new observation. This data set may simply be binary class (like identifying whether the person is male or female or that the mail is spam or non-spam) or it may be multi-class too (identifying if an element is high, low or medium).

Regression is a supervised learning approach used to predict a continuous value. A regression problem is when the output variable is a real value, such as “dollars” or “weight”. Predicting

prices of a house given the features of house like size, price etc is also one of the common examples of regression.

Hybrid machine learning systems combine or integrate different machine learning models. Since each machine learning method works differently and exploits a different part of problem (input) space, usually by using a different set of features, their combination or integration usually gives better performance than using each individual machine learning or decision-making model alone. Hybrid models can reduce individual limitations of basic models and can exploit their different generalization mechanisms.

Intelligent agents are autonomous entities which act upon an environment using sensors and acts upon it through actuators or effectors (Woolridge, 2002). A human agent has eyes, ears, and other organs which work for sensors and hand, legs, vocal tract work for actuators. A robotic agent can have cameras, infrared range finder, NLP for sensors and various motors for actuators. Agents are task-oriented, active, modeled to perform specific tasks and capable of autonomous action and decision-making. When combining multiple agents in one system to solve a problem, the system becomes a Multi-Agent System (MAS). These systems are comprised of agents that solve problems individually that are simpler than the overall problem. They can communicate and assist each other in achieving larger and more complex goals. Agents and data mining can work together to achieve required target.

Data mining and intelligent agents have emerged as two fields with immense potential for research. Every intelligent agent is self-sufficient, acting independently within its boundary while collaborating with other agents to perform the assigned task efficiently. The ability of agents to learn from their experience complements the data mining process. Agent mining helps to overcome the challenges faced by data mining in a distributed heterogeneous environment. Data mining agents perform various functions of data mining. It is increasingly significant to develop better methods and techniques to organize the data for better decision-making processes (Albashiri, 2010). The distributed nature of agent mining brings several advantages to data mining such as autonomy, scalability, reliability, security, interactivity and high speed (Fariz et al., 2015). Agents can be used to automate the various tasks like data selection, data cleansing, and data pre-processing, to perform classification, clustering and knowledge representation. As an emerging area, a lot of research can be performed in this field.

A data mining agent is a pseudo-intelligent computer program designed to find out specific types of data, along with identifying patterns among those data types. These agents are typically used to detect trends in data, alerting organizations to paradigm shifts so effective strategies can be implemented to either take advantage of or minimize the damage from alterations in trends. In addition to reading patterns, data mining agents can also “pull” or “retrieve” relevant data from databases, alerting end-users to the presence of selected information.

In the last few years, agent technology has come to the forefront in the software industry because of the advantages that Multi-agent systems have in complex and distributed environments. Multi- agent systems (MAS) are commonly intended as computational systems where several autonomous entities called agents, interact or work together to perform some tasks. In MAS, communication enables the agents to exchange information on the basis of which they coordinate their actions or cooperate with each other and this is done through Agent Communication Languages (ACL). (Yasser et al., 2015)

Designing a predictive model requires a data set that has the essential attributes to predict future academic performance. After careful consultation with relevant experts, questionnaire that contains most relevant attributes regarding academic attainments of the students was designed. The survey consists of 38 questions amongst which spanned demographic/socio economic questions, academic related question, work related questions and social related questions.

1.2 Statement of the Problem

Text Box: Provision of quality education to students and to improve the quality of managerial decisions is the major objective in any academic institution. Information is requested from students from time to time and the institutions data bank is updated periodically with such information. These data bank are however hardly used to improve decision making. The discovered knowledge can be used to offer a helpful and constructive recommendations to the academic planners in higher education institutes to enhance their decision making process, to improve students’ academic performance and trim down failure rate, to better understand students’ behavior, to assist instructors, to improve teaching and many other .
Some of the problems highlighted include:

Difficulty in selecting the best machine learning classification model for classifying student’s academic performance with a significant accuracy rate.Identifying the main key indicators that could help in creating the classification model for predicting students’ dissertation project grades

Selecting the right variables/attributes for correct prediction

Using the right predictive technique and tools to discover hidden characteristics for early identification of “at risk” students and help them.

The problem of accurate student performance prediction is still a challenging task due to various issues and many other factors are involved in it. This work thereby addresses the capabilities of educational data mining in improving the quality of decision making process in higher learning institution by proposing the model of student performance predictor.

1.3 Aim and Objectives of the Study

The aim of this study is to develop a predictive model for classifying students’ academic performance.

The objectives that are considered here are to:

Identify and analyze the different factors that assumed to affect students performance, identifying those which have the biggest impact on their academic performance.
Provide a platform for classification of students’ performance into High, Medium or Low category and offer academic advising based on their performance.
Provide an effective database that can query student’s academic standing and their personal attributes.
Increase the efficiency of the prediction system and reduce execution time.

1.4 Significance of the Study

The system offers enormous benefits to the following users:

Lecturers/ Academic Advisors: The prediction model will help teachers and tutors identify weak and strong students so teachers can lay more emphasis on instructions and procedures when dealing with the weak students so as to enhance overall academic

performance. An academic advisor can refer to the prediction results when giving advice to students who perform poorly in their studies so that preventive measures can be taken much earlier.

Department and Faculty: Curriculum committees can use prediction results to guide changes to the curriculum and evaluate the effects of those changes.. In addition, an instructor can further improve his/her teaching and learning approach, as well as plan interventions and support services for weak students.
- University: Academic Performance is an important factor people consider before applying for Postgraduate Studies. An institution that is known for producing low performance postgraduates is at risk of having low intakes. The need for Prediction Performance System comes up as this will help in the early prediction of weak students and help them to focus on their weak areas. The result from academic performance prediction can also be used to formulate policy that students who have no tendency of doing well in school be discovered at early stage of academic pursuit, thereby preventing continuous waste of human and material resources on such non-productive students or suggesting Departments that they could fit into
- Parents/Guardian/partners: Results have shown that Parents/Guardian and Partners have effects on the academic performance of Students. The Study helps to analyse the influence of family background on student’s performance predictions. Attributes such as size of family, encouragement/motivation, from parents/spouses/siblings, highest qualification of sponsor and other factors will help determine those factors that affect performance. This will help in proffering solutions to those problems
- Government: Recent multi-country case studies have highlighted the capacity of quality assurance in higher education to support nation-building in multiple ways, ranging from promoting a more open and transparent society to supporting economic goals and increasing graduate employability. Education is vital for economic development.

1.5 Scope of the Research

The scope of this research is to create a students’ performance prediction model by using psychometric factors of students as variable predictors and hybridizing Naïve Bayes and K- Nearest Neighbour as Classifiers. The sample data of this research came from student academic databases and the surveyed intrinsic motivation and behavior of Postgraduate students of Accountancy Department, Nnamdi Azikiwe University Awka. The scope of the research is limited to the investigation of the effects of a student’s prior achievement, domain-specific prior knowledge, and learning progression on their academic performance in the Masters course. Demographic factors, academic and Work Related Factors and Social were included in constructing the predictive model.

1.6 Limitations of the Study

Although the research has achieved its aims, there were some unavoidable limitations that should be discussed

The work deals with supervised learning. This means that you have to manually calculate the class for all the data used in the training point, thereby taking a lot of computational time.
Extra care had to be taken in calculating the training data because if we give an input which is not from any of the classes in the training data, then the output may be a wrong class label.
In assigning the classes using Naive Bayes algorithm, care had to be taken because if the category of any categorical value is not seen in the training data set, the model assigns a zero probability to the category and then prediction cannot be made.
Lack of easy access to reputable journals – the researcher’s limited access to certain journals considered reputable and from trusted reliable online libraries was a serious impediment to this study. Journals like those of IEEE and Elsevier, in the IEEE and Science Direct digital libraries respectively were not possible to access except with full subscription and payment. Some of the papers belonging notable libraries which were accessed had only their abstract available and provided vague information about their content.
Reliability and Validity of the information filled by the students in the questionnaire could not be ascertained.

1.6.1 Definition of Terms

Agents: Agents are sophisticated computer programs that act autonomously on behalf of their users, across open and distributed environment to solve a growing number of complex problems.

Agent Communication Language (ACL): ACL is a language that provides a set of application- independent primitives that allow an agent to state its intention in an attempt to communicate with other agents.

Data Mining: The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems

Educational Data Mining: An emerging discipline concerned with developing methods for exploring the unique types of data that come from educational settings and using those methods to better understand students, and the settings which they learn in

Artificial Intelligence (AI): Intelligence exhibited by any manufactured system

Distributed Artificial Intelligence (DAI): Distributed Artificial Intelligence is a subfield of AI research dedicated to the development of solutions for complex problems that are not easily solvable with classic algorithmic programs. There are three main streams in DAI research: Parallel program solving, Distributed program solving, and Agent-based problem solving

Intelligent Agents: is a program that gathers information or performs some other service without your immediate presence and on some regular schedule.

Machine Learning are computer programs that can learn from experience with respect to some class of tasks and performance measure.

Multi-Agent Systems (MAS):Multi-Agent Systems (MAS) are systems composed of multiple agents

Model: An abstract and simplified representation of a given reality, either already existing or just planned

This material content is developed to serve as a GUIDE for students to conduct academic research

DEVELOPING A PREDICTIVE MODEL FOR CLASSIFYING STUDENT’S ACADEMIC PERFORMANCE

NOT THE TOPIC YOU ARE LOOKING FOR?

Project 4Topics Support Team Are Always (24/7) Online To Help You With Your Project

Chat Us on WhatsApp » 09132600555

DO YOU NEED CLARIFICATION? CALL OUR HELP DESK:

09132600555 (Country Code: +234)

YOU CAN REACH OUR SUPPORT TEAM VIA MAIL: [email protected]

CHOOSE YOUR CURRENCY

DEVELOPING A PREDICTIVE MODEL FOR CLASSIFYING STUDENT’S ACADEMIC PERFORMANCE

DESIGN OF MODEL FOR OBJECT-ORIENTED DATABASE AND CLASS OBJECT COMMUNICATION IN DATA STORAGE SYSTEM

DEVELOPMENT OF A HYBRID MODEL FOR ENHANCED BUSINESS INTELLIGENCE PROCESS

Search For Your Project Topic

Choose Project Department

Navigate Site

Welcome to Project 4Topics

CHOOSE YOUR CURRENCY

DEVELOPING A PREDICTIVE MODEL FOR CLASSIFYING STUDENT’S ACADEMIC PERFORMANCE

Amount: ₦5,000.00 |

Format: Ms Word |

1-5 chapters |

CHAPTER ONE

INTRODUCTION

1.2 Statement of the Problem

1.3 Aim and Objectives of the Study

1.4 Significance of the Study

1.5 Scope of the Research

1.6 Limitations of the Study

1.6.1 Definition of Terms

NOT THE TOPIC YOU ARE LOOKING FOR?

Project 4Topics Support Team Are Always (24/7) Online To Help You With Your Project

Chat Us on WhatsApp » 09132600555

Related Project Topics :

DESIGN OF MODEL FOR OBJECT-ORIENTED DATABASE AND CLASS OBJECT COMMUNICATION IN DATA STORAGE SYSTEM

DEVELOPMENT OF A HYBRID MODEL FOR ENHANCED BUSINESS INTELLIGENCE PROCESS

Search For Your Project Topic

Choose Project Department

Navigate Site

Follow Us

Welcome to Project 4Topics