CHOOSE YOUR CURRENCY


DEVELOPMENT AND CALIBRATION OF A BASIC SCIENCE ACHIEVEMENT TEST USING THE TWO-PARAMETER LOGISTIC MODEL OF ITEM RESPONSE THEORY (IRT)

Amount: ₦5,000.00 |

Format: Ms Word |

1-5 chapters |



Abstract

Assessment is a critical component of the educational process. Tests are used as instruments for assessment. The purpose of this study was to develop  and calibrate a basic science achievement test using the two-parameter IRT model. Eight research questions and three hypotheses were formulated to  guide the study. These include, what are the estimates of item difficulty parameter of the BSAT and what are the ability estimates of the students using the 2-parameter logistic model. Three hypotheses were tested at 0.05 level of significance. One of them is that there is no significant fit between the estimates of item parameter and the 2-parameter logistic model. The design of the study was instrumentation research design. The population comprised of all JSS III students in government owned secondary schools in Enugu State in the 2009/2010 academic session. The population comprised  of 31,205 JSS III students during the 2009/2010 academic session in Enugu State. A sample of 3119 students was drawn from the population. The instrument for the study consisted of 30 multiple choice items on basic science which was developed by the researcher. The validity and reliability of the instrument were established according to the two-parameter IRT model. The  instrument had a reliability coefficient of 0.8189. The research questions were answered using logistic item response model and the maximum likelihood parameterization procedure as worked out in the Bilog-MG program while the hypotheses were tested at 0.05 level of significance  using the chi-square goodness-of-fit test and analysis of variance (ANOVA) statistic.  The results of the study showed among  others that all the item  parameter estimates and person parameter estimates were within the  acceptable range, all the items except one showed fit to the two parameter IRT model and there was variation in  the mean ability estimates of students in the different schools and different local government areas. The results of the study have far reaching implications for counselors/teacher counselors,  test developers and exam bodies. It could help exam bodies in generating large pool of items with known item parameters which may be stored in item banks and retrieved at will and in administering computerized adaptive  testing. Teachers of different schools could generate items of known parameters which would suit the academic standard of students of similar  abilities. The study recommended among other things that test developers and other experts in measurement and evaluation should adopt the use of IRT methods in test construction and analysis to avoid taking erroneous decisions about student’s achievement.

CHAPTER ONE

INTRODUCTION

Background of the Study

Assessment is a critical component in the educational practice. Tests are instruments for measurement. Majority of the assessment practices were based on the classical test theory (CTT) developed during the 1920’s. Schumacker (2005) noted that the history of CTT was conceived after the following three ideas were conceptualized, the presence of errors of measurement; error as a variable, and correlation and how to index it. It was Charles Spearman in 1904 who figured out how to correct a correlation coefficient for attenuation due to measurement error and how to obtain the index of reliability needed in making the correction (Schumacker, 2005).   Classical Test Theory assumes that each person has a true score, T, which would be obtained if there were no errors in measurement. A persons true score can be defined as the expected number – correct score over an infinite number of independent administration of the test. Unfortunately, test users never observe, a person’s true score, only an observed score, X.   It is assumed that the observed score equals the true score plus some error.

This is why Schumacker (2005) stated that CTT is an estimation of the early 20th  century approaches to measuring differences which introduces three measurement concepts; test score or observed score, true score and error score. The  observed score  is  considered to  be  composed of  a  true  score  and  a

measurement error. Classical test analysis postulates linking observed test score

 (T), (latent  

(X) to the sum of the true score       1

unobservable score i.e. not directly

observed but can be inferred from score) and error score (E) i.e. X = T + E (Courville, 2004; Wiberg, 2004 & Schsumacker, 2005).  Classical Test Theory is concerned with the relations among the three variables, X, T and E in the population.

According to Wiberg (2004) and Schumacker (2005), several benefits are obtainable through the application of good instructional objectives and item writing using classical test analysis. First, analysis can be performed with small representative samples of examinees. Secondly, CTT analysis employs relative simple mathematical procedures and model parameter estimations are conceptually simple.  Again, classical test analysis is often referred to as a “weak model” because the assumptions are easily met by traditional testing procedures and is relatively easy to interpret.

However,  CTT  has  several  important  limitations.  According  to  Stage (2003) and Wiberg (2004), one of them is that examinee characteristics and test characteristics cannot be separated and each can only be interpreted in the context of the other.  The item difficulty and item discrimination are both sample

dependent and group dependent because the values of these indices depend on the group of examinees from which they have been obtained. Another shortcoming is that observed and true test scores are dependent because they rise and fall with changes in test difficulty. There is also the assumption of equal errors of measurement for all examinees. This is a shortcoming whereby CTT presumes that  the  variance of  errors  of  measurement is  the  same  for  all examinees. That is, in a school assessment, measurement error is assumed to be the same for all the examinees. For CTT, there is only one overall index of standard error of measurement in a testing situation. The same individual tested in two different samples may obtain two different errors of measurement and estimates of true score. Also, Douglas (1990) stated that the variant nature of the  indices used  to  describe  the  item  parameters is  a  major  limitation  of CTT.Another limitation of CTT is that it considers the total raw score obtained by the examinee on  the test  without taking  into consideration the examinee’s answer pattern on the test. This tends not to reveal why an examinee did not succeed on a particular test item. The examinee’s answer pattern is important because it may reveals the strategies used by the examinee in answering each of the items. This kind of information is important in a classroom situation where the teacher needs to understand how the examinees have answered each of the items.

Choppin (1985) and EI-Korashy (1995) observed that CTT is no longer considered valid for ensuring objectivity in measurement while Andrich (1988),

EI-Korashy (1995) and Stage (1998b) admit that Item Response Theory (IRT) is a model that can ensure objectivity in the measurement of achievement. Objective measurement yields scores that show the relationship between the empirical test performance and unobservable trait underlying the test performance. According to Baker (2001), a new test theory developed that is conceptually more powerful than the CTT, based upon items rather than test scores is known as Item Response Theory. Stage (1998b) also reported that during the last decades, a new measurement system, item response theory, has been developed and has become an important complement to CTT in the design and evaluation of tests.

Item response theory is a class of measurement models that offers a variety of methods not only to measure latent properties but also to assess and improve the quality of such measurement. The models have the ability to choose test items appropriate to the examinees level of proficiency during testing and each item of a set of items measures the underlying trait or traits (Verstralen, Bechger and Maris, 2001). Item response theory assumes that each examinee responding to a test item possesses some amount of the underlying ability. Each of these underlying attributes, most often referred to as latent traits or abilities is assumed to vary continuously along a single dimension usually denoted θ.

The most distinct feature of IRT is that it adopts explicit models for the

probability of each possible response to an item so its alternative name, probabilistic test theory, may be a more apt one (Partchev, 2004). Item response

theory derives the probability of each response as a function of the latent trait and  some item parameters. Latent trait  refers to  a  latent continuum or  a dimension which  all  individuals are  mapped  on,  based  on  their  pattern  of responses on a set of categorical variables (Rost & Langeheine, 1997; Courville,

2004 & Partchev, 2004). The relationship between the observable and unobservable abilities is described by a mathematical function. Thus, one can consider each examinee to have numerical value, a score that places him or her somewhere on the ability scale. At each ability level, there will be a certain probability that an examinee with that ability will give a correct response to the item.

The three most commonly used IRT models are the one-parameter, two- parameter and three-parameter logistic models (Baker, 2001; Gruijter & Kamp,

2002). The one-parameter logistic model (1PLM) contains only the item difficulty

parameter (denoted as b). The two-parameter logistic model (2PLM) contains both the item difficulty parameter (designated as b) and the item discrimination parameter (denoted as a). The three parameter logistic model (3PLM) acknowledges a chance response designated as c. This was called a pseudo- guessing parameter by Stage (1998a, 1998b and 2003). If probability is plotted as a function of ability, the result would be a smooth S-shaped curve which describes the relationship between the probability of correct response to an item and the ability scale. This S-shaped curve is called item characteristic curve (ICC).

Item characteristic curves constitute the cornerstones of IRT models because  they  express  the  assumed  relationship  between  an  individual’s probability of passing a given item and his level of ability. Each item in a test will have its own ICC (Baker, 2001; Verstralen, Bechger & Maris, 2001; Yu, 2007). The types of information contained in an item may include the degree to which the item discriminates among individuals of differing levels of ability (the discrimination parameter, a); the difficulty parameter b and the guessing parameter, c. The two technical properties used in describing the ICC are the difficulty of the item which describes where the item functions along the ability scale (i.e. an easy item functions among the low ability examinees while the hard item functions among the high ability examinees), and the discrimination which differentiates between examinees having abilities below the item location and those having abilities above the item location. The steeper the curve, the better the item can discriminate. The flatter the curve, the less the item is able to discriminate (Verstralen, Bechger & Maris, 2001).

Item response theory has some advantages. Stage (1998b and 2003) stated that one great advantage of IRT is the item parameter invariance and that the property of invariance of ability and item parameters is the cornerstone of IRT. Shermis and Chang (1997) also said that IRT can be used to calculate sample invariant estimates of item difficulty and in the measurement of change over several occasions. If one picks different samples and estimates the Item Characteristic Curves (ICCs), he or she should get the same values of a, b and c;

that is, the same ICC. This should happen because the same expected values should be obtained. The sample-based nature of all the estimation procedures used to establish reliability co-efficient is a weakness of the classical test model which the invariance property of IRT has been able to solve. With IRT, it is possible to develop a classroom achievement test that can provide an index of standard error for each persons ability and each item.

This therefore, makes it possible to select achievement test items that will

make maximum contributions to the test design and the management efficiency at a particular ability level.  Guiton and Ironsen (1983) opine that a more precise confidence statement can be made than the conventional one overall standard error of measurement for the total test computed in the CTT model. This means an improvement in test security. Also, with IRT it is possible to have a new test that entirely consists of previously administered items with item characteristics estimated on a common scale, i.e., items selected from an item bank. This pre- equating of items fastens the process of scoring (Gruijter and Kamp, 2002). If guessing occurs and is highly likely, as seen in structured response items, then a model with a pseudo-chance parameter is used and this can only be done using IRT. Item response theory model is a measurement model which takes individual differences into consideration while evaluating learning outcomes. It is a model that contains person parameter and ability estimates which can be used to measure individual differences.

One of the major assumptions of IRT is the unidimentionality assumption (Torsten & Postlethwaite, 1995 & Svend & Christensen, 2002).  This assumption states that each item in a test measures a single or unidimensional ability or trait.   Most IRT models assume that only a single latent trait underlies performance on an item.   This is so because most tests are constructed to measure a single trait, for example, verbal ability.   The assumption of unidimentionality implies that the responses to different items are independent given the latent trait. Another assumption is the assumption of local independence. It states that an examinee’s responses to different items in a test are statistically independent (Ponocny, 2002).  This means that an examinee’s response on one item must not affect his or her responses on other items in the test, that is, the content of an item should not provide clues to the answers of other test items.  The third assumption is that each item can be described with an ICC (Stage, 2003).

Livingston   (2004)   explained   that   although   the   assumptions   and applications of IRT is complex, both conceptually and procedurally, their practical advantages are sufficient to warrant their usage. Also, Lord (1977) stated that IRT models have been advocated as possible improvements over the traditional methods, and argued from theoretical considerations that traditional measurement methods are not appropriate for developing items that should be placed in an item bank whereas IRT methods have such capacity.  This is an improvement in the measurement and assessment of learning outcomes.

Proper classroom assessment can provide diagnostic information for all students. Therefore, there is need to improve assessment so that the objectives of the curriculum which have defined clearly the important constructs that reflect what all students should know and should be able to do can be achieved. McDonnell (2004) and Sireci (2007) noted that clarification and articulation of content standards are the foundation of good assessment in order to maximize the measurement of achievement and that quality instruction requires continuous interaction among instruction, curriculum and assessment. One of the important means of  achieving this  interaction is  through the use  of  tests.  Tests are instrument used for measurement. Tests are given for many reasons in the educational system. They are designed to meet the accountability requirement of the teaching and learning process in order to give better information about the performance of students. They are used to assign grades to students as requirements for graduation from school, scholarships or eligibility to secure jobs. They play vital roles in making teaching and learning effective. Sireci (2007) opined that educational tests, if developed carefully, used properly and interpreted appropriately have  enormous utility  such  as  improving  students learning.

In many situations like in testing of achievement in the classroom, teacher made tests are used in assessing ability, achievement, performance or whatever. Mehrens and Lehmann (1987), Stiggins and Bridgeford (1985) and Harbor-Peters (1999) found that teachers lack understanding of measurement. Hills (1991)

noted that teachers lack sufficient training in test development, fail to analyze tests, do not establish reliability or validity, do not use test blue print, weight all content equally, rarely test above the basic knowledge level and use tests with grammatical and spelling errors. Classroom teachers’ tests are simplistic and depend upon short answers, true-false and other easily prepared and scored items. Their multiple choice items often have flaws, especially in distractors (Sireci, 2007).

The selection of item types and tests format should be based on the kinds of skills to be measured and not on a personal inclination to a particular test format. If the aim is to determine whether an examinee can write an essay, then an essay or free response format is clearly more appropriate than a multiple choice format. Bay-Borelli, Rozunick, Way and Weisman (2010) warned that policy makers must have a clearly defined vision regarding what the test is designed to measure and the purposes for which the resulting data regarding student performance will be used before test design and item development issues can be meaningfully addressed. This explains, for instance, that a test developed to provide classroom teacher information about an individual student achievement for the purpose of informing instruction would have a very different structure than would a test designed to inform judges about whether students have demonstrated sufficient mastery of skills to meet the passing standards for a  given  course.  The  test  coverage,  modalities  in  which  the  test  will  be

administered, the organization of the test and the purpose of the assessment must be taken into consideration.

Achievement tests are used in measuring the proficiency and mastery of

general and specific areas of knowledge. They are also used in education in measuring the effectiveness of instruction and learning. Test specifications to be developed in the classroom must ultimately describe how student achievement relative to the curriculum standards will be measured. The test specification will form the base of the assessment and characterize the content boundaries of the assessment programme. A clearly articulated set of test specifications contribute to improved reliability and validity of the test instruments thereby setting the groundwork for reliable standard setting (Bay-Borelli, Rozunick, Way and Weisman, 2010). These point to the fact that without a clear vision of the test and its parameters, it is not possible to develop achievement test items or build tests that will fairly and reliably measure students learning. If the instrument used in measuring achievement is faulty, erroneous decisions on issues affecting students may be made. Accurate decisions are more likely to be made when they are based on accurate information concerning the teaching and learning process.

The assessment of students achievement is therefore a complex process that must meet many technical and policy requirements (Reckase, 2010). The technical requirements are many because the assessment programme must give accurate measures of achievement on constructs that depend on the content area. The technical requirements are further driven by the nature of the uses of

the results. There is  also a  natural desire  to  answer questions about how students learn from educational activities. Item response theory scales are often preferred for representing amount of achievement because the scales do not have fixed limits. Alagoz (2005) noted that this reduces the floor and ceiling effects that can occur for tests scored using percent correct or similar scoring methods.  For all these to work, the psychometric properties of the items must be known.

Evaluation of the effectiveness of teaching and learning has raised a number of questions concerning the psychometric properties of classroom achievement tests.  This raises the question of whether an achievement test like a basic science achievement test provides satisfactory psychometric properties of the concept it was designed to measure.  Basic science is a three year course for junior secondary schools.    It  is  a  course  that  integrates all  the necessary scientific facts into a broader conceptual framework which stresses the ways understanding of science can enrich and enlighten students in their daily lives. It helps students to understand the over-whelming number of facts and terms in science and gives them an insight of the underlying principles of chemistry, biology and physics and their relevance to life.  The extent to which test scores, like scores from basic science tests, can be placed  on the same scale so that test items of appropriate difficulty are adopted for each examinee depends on the psychometric properties of the test. In order to develop parallel achievement tests and equate the test scores, the psychometric properties of the test must be

given due consideration. Any attempt at testing is therefore preceded by a calibration study (Partchev, 2004).

Partchev defines calibration as a process whereby the items are given to a

sufficient number of test persons whose responses are used to estimate the item parameters. Crocker and Algina (1986) noted that item calibration is a part of the larger topic of IRT. They described person-free item calibration as the process by which the parameters of large numbers of items can be estimated even though each item is not answered by every examinee. Item parameters differ depending on the IRT model used, but all include items difficulty level. The two-parameter model adds item discrimination while the three-parameter in addition has a pseudo-chance or guessing parameter.

The goal of item calibration is to develop a pool or bank of items which are on the same scale. The advantages of item bank are most evident in computer-based testing. There are two major applications of a calibrated item pool (Crocker and Algina, 1986). The first is when fixed-form tests are offered continuously on computer. Test assembly using calibrated items enables the calculation of the passing score for the test form to be completed. In contrast, traditional common-item equating requires collection of a number of candidate responses over a period of time in order to use the statistics from the administrations to determine the passing score for the test. The second major use of a pool of calibrated items is for adaptive testing. In adaptive testing, a candidate ability estimate is obtained after administration of a small number of

initial items and subsequent item selections are targeted to the candidates estimated ability. One advantage of this is that if the test pool is large, overlap of items between individual tests is minimal. Each candidate gets a somewhat different test, but the items are selected to meet a target test characteristic curve instead of  being  selected based on  the candidates estimated ability. Partchev (2004) noted that different examinees can get different items and yet obtain comparable estimates of ability. As a result, tests can be tailored to the needs of the individual while still providing objective measurement. With calibration, new items get into the mix by presenting them as unscored pretest items. Once sufficient candidate responses have been obtained, the pretest items are taken offline and calibrated. Then, they can be introduced into the item pool as scored items.

Hendrickson and Kolen (2003) said that large scale testing organizations

are increasingly considering the use of item response theory models for test development,   scoring   and   equating,   especially   as   they   ponder   the implementation of computerized testing options.   Also, Sireci and Berberoglu (2001) noted that adapted tests are being  increasingly used to access the knowledge and skills of individuals from other cultures and who speak different languages. McCutcheon and  Hagenaars (1997) used  latent  class  models to analyze response on performance towards protesters and legal demonstrators for America and Dutch. The results suggest substantial similarity in latent structure of American and Dutch attitudes towards public protest. Ene (2005) also applied

Rasch Model in assessing the attitude of students towards Biology in Senior

Secondary Schools. All these researches involved calibration of items.

Item response theory is useful in test construction because it places item difficulty  and  student  performance  on  the  same  scale,  tells  how  much information each item or item score level contributes to the test, describes differential item functioning for each group of students, helps test developer create parallel test forms, helps test developer pick items targeted to particular student achievement levels. Item response theory is also useful for test scoring and interpretation because it describes the amount of measurement error in each score, provides statistically optimal item weights that produce the most accurate scores and by placing item and student performance on the same scale, can facilitate   standard   setting   and   be   used   in   criterion-referenced   score interpretation.

Statement of the Problem

Achievement tests are administered for many reasons in the educational system. They are used to measure the level of proficiency, mastery and understanding of knowledge. They are also used to measure the effectiveness of instruction and learning, evaluate the educational process and redesign instructional programmes. Classroom teachers and state examination bodies like the states exam development centres are required to assess students’ achievement and the results are used to hold schools and states accountable for

all their students’ performance in the examinations. Tests are the instruments for this assessment.

Each classroom teacher generates his or her own items and uses them for

assessment. At the state level, classroom teachers from different schools are called upon annually to generate items that will be used for assessing students’ achievement at the junior secondary school certificate examinations. The items are kept in an item bank or pool and retrieved and administered without the calibration of the item parameters. Therefore, variations and errors occur from tests  administered  by  individual  teachers  (teacher-made  tests)  in  different schools because the item parameters and true ability of students cannot be guaranteed. The item parameters of each item and true ability of each students can be determined using IRT. Any attempt at testing using IRT is preceded by a calibration study whereby the items are given to a sufficient number of test persons whose responses are used to estimate the item parameters (Partchev,

2004 and Kirkpatrick, 2005). When the model used in calibration is appropriate and the estimates of the item parameters are reasonably accurate, the testing will have attractive properties, one of which is that examinees may get different items and yet obtain comparable estimates of ability. As a result, tests can be tailored  to   the   needs  of   the   individual   while  still   providing  objective measurement. Classical test theory has some disadvantages which make it’s use in developing achievement tests very poor. One of such disadvantages is the variant nature of the indices used in describing the item parameters. There is

also the sample based nature of the estimation procedures used to establish reliability co-efficient in CTT. It does not ensure objectivity in measurement. Performance of students on individual items is not also considered because it is test oriented rather than item oriented.

For proper evaluation of the effectiveness of teaching and learning, important information regarding the psychometric properties of each item to be included  in  the  final  form  of  tests  administered  to  students  should  be determined. Such tests should take into consideration the extent of the variance of the probability of success as a function of ability and other parameters represented by item difficulty, item discrimination and the degree of guessing. This can be done through the calibration/parameterization of test items. Therefore, this study developed and calibrated a basic science achievement test using the two-parameter logistic IRT model.

Purpose of the Study

The purpose of this study was to develop and calibrate a basic science achievement test (BSAT) using the two-parameter IRT model. Specifically, the study sought to:

1.      Estimate the item logits of the BSAT?

2.      Calibrate the item difficulty parameter (i.e., b-parameter) of the BSAT

items

3.      Calibrate the item discrimination parameter (i.e.,  a-parameter) of the

BSAT items

4.      Determine whether the estimates fit the two parameter logistic model

5.       Calibrate the ability estimates of the students using the two parameter logistic model

6.       Determine the means of the ability estimates of the students in the different schools

7.       Determine the mean ability estimates of the students in the different local government areas

8.      Determine the standard error of measurement of each item

9.      Determine the standard error of measurement of each student

Significance of the study

This study is of immense significance to teachers, principals, guidance counselors, test developers, ministry of education, examination bodies and curriculum planners. Teachers are able to construct and select items whose psychometric properties are known so that individual student’s achievement can be  estimated.  This  helps  in  making  accurate  decisions  about  the  issues concerning student’s performance.

It also help principals and teachers to apply the knowledge of tailored

testing in their work.  In tailored testing, examinees are administered with test items that are matched to their ability and the scores can still be placed on the

same scale.  This helps in the selection of items, so that items of appropriate difficulty are administered to each examinee.  This also has implication for test security because different people get different tests. Also, teachers can estimate examinee abilities, identify their strengths and weaknesses with regard to the trait being measured.  This diagnosis helps the teacher to remedy some of the problems of the examinees and evaluate them in terms of how much underlying ability they possess.   Also, comparison among examinees can be made for purposes of assigning grades, awarding scholarships etc. Guidance counselors are equiped with a better knowledge of the educational preferences, abilities and strengths and weaknesses of the students so that the counselors can guide them properly.   This in turn helps the students in making proper choice of their educational career.

Test developers, Exam development centers (EDC), other examination

bodies like the National Examinations Council (NECO) and West African Examinations Council (WAEC); and the Ministry of Education would also benefit from this study through helping them in generating items whose psychometric properties are known. These items can be included in the item bank and can be retrieved when needed. Thus, valid and reliable items to be used in assessing the effectiveness of teaching and subsequently the evaluation of the curriculum and the programmes of learning can be generated.

To the Curriculum planners, the benefit from this research is through knowledge of the level of ability of the students which can be used to help them

plan a better curriculum that attracts students to learn better. Parents and the larger society also benefit when the students have a positive attitude towards learning.  This enables the society have people who would be able to handle not only their own problems but that of the society in general.   The theoretical significance of this study is important because there is need for information bordering on IRT application in testing in schools in order to make teaching and learning more effective.   It also provides a foundation for studies on IRT in tertiary institutions. Finally, this research provides a base for other researches in application of the principles of measurement using IRT.

Scope of the study

This study was delimited to test development and calibration of a basic science achievement test using the two-parameter logistics IRT model. The model was used to estimate the logits of the items, the difficulty and discrimination parameters.   The study also sought to explore the validity and reliability of the instrument by finding out the extent to which the estimates fit the two parameter logistic model.  Ability of the students was also calibrated. The standard errors of measurement were estimated for both the items and the ability estimates. There was a comparison of the means of the ability estimates based on the two-parameter logistic model.  The content areas of the BSAT that were covered include nervous system  and sense organs, elements, compounds

and mixtures; reproductive health, continuity of the family, and metals and non- metals.

Research Questions

1.    What are the estimates of item logit of the BSAT items?

2.    What are the estimates of item difficulty parameter of the BSAT items?

3.    What are the estimates of item discrimination parameter of the BSAT items?

4.    What are the ability estimates of the students using the two-parameter logistic model?

5.    What are the mean ability estimates of the students in the different schools

used?

6.    What are the mean ability estimates of the students in the different local government areas?

7.    What are the estimates of the standard error of measurement for each of the basic science achievement test items?

8.    What  are  the  estimates  of  standard  errors  of  measurement for  each examinee ability?

Hypotheses

The following null hypotheses were tested at 0.05 level of significance

H01:   There is no significant fit between the estimates of item parameters and the two-parameter logistic model.

H02:  There  is  no  significant  difference  among  the  mean  ability  estimates obtained from the different schools. HO3:  There is no significant difference among the mean ability estimates of the students in the different local government areas.



This material content is developed to serve as a GUIDE for students to conduct academic research


DEVELOPMENT AND CALIBRATION OF A BASIC SCIENCE ACHIEVEMENT TEST USING THE TWO-PARAMETER LOGISTIC MODEL OF ITEM RESPONSE THEORY (IRT)

NOT THE TOPIC YOU ARE LOOKING FOR?



Project 4Topics Support Team Are Always (24/7) Online To Help You With Your Project

Chat Us on WhatsApp »  09132600555

DO YOU NEED CLARIFICATION? CALL OUR HELP DESK:

   09132600555 (Country Code: +234)
 
YOU CAN REACH OUR SUPPORT TEAM VIA MAIL: [email protected]


Related Project Topics :

Choose Project Department