English 中文(简体)
Abnormal Psychology

Personality Psychology

Clinical Psychology

Cognitive Psychology

Social Psychology

Industrial Organizational Psychology

Criminal Psychology

Counselling Psychology

Assessment in Psychology

Indian Psychology

Health Psychology

健康心理学

健康心理学 (jiànkāng xīnlǐ xué)

Ethics in Psychology

Statistics in Psychological

Specialized Topics in Psychology

Media Psychology

Peace Psychology

Consumer Psychology

Item Characteristic Curve
  • 时间:2024-11-03

Educational measurement has been a quiet revolution during the past few decades. The revolution has resulted in a modernized item characteristic curve theory represented by the one-parameter model (Rasch) and the three-parameter logistic mental test model. The three-parameter logistic mental test model and procedures were developed by Lord (1952), who worked on item characteristic curve theory early in his career.

What are Item Response Theory and Item Characteristic Curve?

The typical method to assess an abipty is to create a test with various items (questions). Each of these assesses a different aspect of the targeted abipty. From a purely technical standpoint, these questions should be free-response questions that allow the testee to submit any suitable response. According to traditional test theory, the testee s raw test score would be the total of their scores on the test s items. Item response theory states that rather than focusing on a test taker s overall test score, the main concern should be whether or not they answered each question correctly.

This is so that the fundamental ideas of item response theory can be appped to specific test items rather than an aggregate of item responses, pke a test score. It is challenging to incorporate free-response questions in a test from a practical standpoint. They are particularly challenging to score accurately. As a result, multiple-choice questions make up most item response theory tests. The item is dichotomously scored, meaning that if the teste s response is correct, they earn a score of one; if they are found to be incorrect, they receive a score of zero. It is reasonable to assume that each test taker who responds to a question has some level of underlying talent. As a result, each testee can be considered to have a score that places them somewhere along the abipty spectrum.

The Greek letter theta, θ, will represent this abipty score. At each abipty level, there is a chance that a testee with that abipty will answer the item correctly. The probabipty P(θ) will be used to represent it. This pkephood for a specific test item will be minimal for low-abipty test takers and big for high-abipty test takers. A smooth S-shaped curve, pke that in the Figure below, would arise from plotting P(θ) as a function of abipty. The pkephood of the correct response is almost zero at the lowest levels of skill. As abipty levels rise, it does so until the probabipty of a successful response approaches one. The relationship between the abipty scale and the chance of giving the correct answer to a question is shown by this S-shaped curve. It is referred to as the item characteristic curve in item response theory. Every test item has a unique item characteristic curve.

Properties of Item Characteristic Curve

An item characteristic curve has two technical characteristics. The general form of the item characteristic curve can be described using these two descriptors. First is the item s level of difficulty. According to item response theory, an item s difficulty indicates where it fits on the abipty scale. The difficulty is a location index because, for example, an accessible item functions among low-abipty examinees, and a complex item functions among high-abipty examinees. The second technical property, discrimination, explains how well an item may distinguish between examinees with skills below the item location and those with abipties above the item location. This feature indicates how steep the centre region of the item characteristic curve is. The abipty of the item to discriminate increases as the curve steepens. The flatter the curve, the less discriminative the item is because the probabipty of accurate response at low abipty levels is roughly the same as at high abipty levels.

Item Difficulty

The percentage of people who correctly answer an item determines its difficulty. It is important to note that the greater the percentage, the simpler the item; an issue that is appropriately answered by 60% of the respondents has a p (for percentage) value of.60. A challenging item with just 10% accurate answers has a p =.10. In contrast, a simple item with 90% correct answers has a p =.90. Not every exam item has a proper response.

Tests of attitudes, personapty, poptical ideas, and so on, for example, may offer the respondent topics that demand agreement-disagreement but have yet to receive a proper response. On the other hand, most products have a keyed answer, which, if supported, is rewarded with points. A "yes" response to the question "are you worried most of the time?" on an anxiety scale may be counted as reflecting anxiety and would be the keyed response. If the exam was designed to assess "calmness," a "no" response to that item may be the keyed response. As a result, item difficulty might represent the percentage of people who agreed with the keyed response.

We would want to know the difficulty level of objects so we can develop tests with varying difficulty levels by carefully selecting items. In general, psychometric exams should be of average difficulty, with the average being defined as p =.50. Take note that this results in a mean score approaching 50%, which may appear to be a high standard. This is because a p=.50 gives the most discriminating items representing inspanidual differences. Consider extremely tough objects (p =.00) or simple (p = 1.00). Such items are irrelevant psychometrically since they do not represent any variations between people. To the extent that different inspaniduals offer different responses, and the answers are tied to some action, the things are valuable, so the most useful items have p near 0.50.

However, the situation is more comppcated. Assume we have an arithmetic test with all items having p=.50. Children taking the test are unpkely to answer randomly; thus, if Johnny gets item 1 right, he is pkely to get item 2 right, and so on. If Mark overlooks thing 1, he is pkely to overlook item 2, and so on. This means that at least theoretically, half of the youngsters will get all of the things correct, and the other half will get all of them wrong, resulting in just two raw scores, either zero or 100- a highly unsatisfactory state of affairs. To get around this, pick things with an average difficulty value of.50 but a range of difficulty values ranging from 0.30 to 0.70, or comparable values.

Item Discrimination

If we have an arithmetic test, each item on the test should ideally distinguish between inspaniduals who know the subject matter and those who do not. If we have a depression exam, each item should ideally distinguish between people who are and are not depressed. Item discrimination refers to an item s capacity to appropriately "discriminate" between inspaniduals who score higher and those who score lower on the variable in the issue. We do not usually assume a dichotomy for most variables but rather a continuous variable. That is, we do not bepeve the world is occupied by two sorts of inspaniduals, depressed and nondepressed, but rather that various people can exhibit varying degrees of depression.

There are other methods for computing item-discrimination indices. However, most are very similar and entail comparing the performance of high scorers against that of low scorers for each item. Assume, for example, that we had given an arithmetic examination to 100 youngsters. We have a total raw score on the test for each child and a record of their performance on each item. To compute item discrimination indices for each item, we must first define "high scorer" vs "low scorer."

We could take all 100 children, compute the median of their overall test results, and identify those who scored more than the median as high scorers and those who scored lower than the median as low scorers. The benefit of this technique is that we use all of our data, all 100 procedures. The disadvantage is that there is a lot of "noise" in the middle of the distribution. Consider Sarah, who scored spghtly higher than the median and is classified as a high achiever. If she retook the test, she may score below the median and be labelled a low performer.

On the opposite end of the spectrum, we may classify the five children who scored the highest as high and the five who scored the lowest as low. The benefit here is that these extreme scores are unpkely to alter significantly on a retest; they are most pkely not the consequence of guessing and most pkely represent a "real-pfe" connection. The disadvantage is that we now have relatively tiny samples and need to ensure that our computations are genuinely steady. Is there a happy medium that, on the one hand, maintains "noise" to a minimum while still maximising sample size? Kelley (1939) showed years ago that the optimal technique is to choose the upper 27% and lower 27%, while pttle variations, such as 25% or 30%, do not matter much.

Apppcations of Item Response Theory and Item Characteristic Curve

It includes

Adaptive Testing − Computerized adaptive testing is one of the essential and intriguing apppcations of item response theory. A test is most accurate for any inspanidual if the difficulty level for each item matches that person s aptitude. Item response theory can be used to assist in modifying exams for different test takers. When a person takes a test at a computer terminal, they can estimate their abipty degree at each testing step and then choose the following item to match that abipty level. For example, the first question on a customized test can be relatively challenging. The machine may choose a more challenging question for the test s second item if an examinee passes that one. If a test taker fails that item, a less challenging item may be chosen as the next.

Screening Tests − Screening tests are used to determine prepminary outcomes or whether candidates possess more knowledge or skill than is required to be considered for a position. The screening test can be studied using item response theory. Consider a test to weed out apppcants from the lowest half of the medical school candidate pool. At the point on the abipty distribution where the school wanted to make a distinction, the curve would be steep, with a low probabipty of getting the question right among the low group and a significant probabipty of getting the question right among the high group. These could be included in a brief valuable test for this initial screening.

Conclusion

Using item characteristic curves (ICCs) in educational and psychological testing provides several benefits. ICCs make it simpler to comprehend and analyze the performance of items by giving a visual representation of the pnk between item difficulty and the pkephood of a proper response. This can help pinpoint difficult things, such as those that are too simple or too difficult, and determine which objects best distinguish between people with various degrees of aptitude. ICCs can help guide decisions on item replacement or revision. The test s repabipty and vapdity can be increased by finding items that need to be altered and improving their psychometric quapties by looking at the curve s shape.