Total score. Establish test performance standards


Scaling Test Results

Stevens (1946) identified 4 levels of measurement scales, differing in the degree to which the estimates belonging to them retain the properties of the set of real numbers. These are the scales:

Nominal (or nominative, naming scale)

ordinal

Interval

Relationship scale.

Interpretation of test results

In tests with norm-oriented interpretation the main task is to determine the comparative place of each of the tested in the general group of subjects. Obviously, the place of each subject depends on the background of which group he is evaluated. The same result can be classified as fairly high if the group is weak, and rather low if the group is strong. That is why it is necessary, if possible, to use norms that reflect the results of the test by a large representative sample of subjects.

In tests with criteria-oriented interpretation the task is to compare the educational achievements of each student with the volume of knowledge, skills and abilities planned for assimilation. In this case, a specific area of ​​content is used as an interpretive frame of reference, and not one or another sample of subjects. The main problem is the establishment of a passing score that separates those who have mastered the material being tested from those who have not mastered it.

Establish test performance standards

To eliminate the dependence of the interpretation on the results of other test participants, special test performance standards are used, and thus, the primary score of an individual test subject is compared with the test performance standards. Norms - this is a set of indicators that are established empirically based on the results of a test by a well-defined sample of subjects. The development and procedures for obtaining these indicators are normalization process(or standardization) test. The most common norms are the mean and standard deviation of a set of individual scores. Correlation of the primary score of the subject with the performance standards allows you to establish the place of the subject in the sample used to standardize the test.

Types of scales used to convert primary scores

The most famous conversions of primary scores are:

Percentile rank, reflecting the percentage of subjects in the normative group, whose results are lower or equal to a given value of the primary score;

Linear Z-assessment, defined as the ratio of the individual deviation of the test score to the standard deviation for the group of subjects;

Estimates that are a linear transformation z-assessments (T-scale, standard IQ scores, etc.);

Stanine and wall scales, which are obtained by dividing the scale of primary points into various intervals.

Percentile rank scale

Percentiles allow you to set the rank of the primary indicator of the subject in the normative group. The percentile rank corresponding to a given primary score shows the percentage of subjects in the normative sample whose results are not higher than the given primary score.

Percentiles should not be confused with percentages representing the percentage of tasks correctly completed by the subjects of the group. Unlike the last - primary - the percentile is a derived indicator indicating the share of the total number of subjects in the group.

In addition to the convenience associated with ease of interpretation, percentile ranks have significant drawbacks. The scale of percentile ranks is non-linear, i.e. in different areas of the primary score scale, an increase of 1 point may correspond to different increases on the percentile scale. Therefore, percentiles not only do not reflect, but even distort the real differences in the result of the test.

Therefore, the use of percentiles is rather limited. Due to their convenience and simplicity, they are mainly used in norm-oriented tests for self-assessment of students' knowledge, reporting the results to the students themselves and their parents.

Z-scale

Converts individual results to a standard scale with an overall average score and a common measure of variance. Z- evaluation i-th the student is found by the formula:

Where primary score i-th the subject; - average of individual scores N test group ( i=1,2,…,N); is the standard deviation for the set of primary scores.

Z-scale is standard with zero mean and unit standard deviation. With its help, you can bring the scores of students obtained on various tests to one form that is convenient for comparison.

Value Z-estimation is equal to the distance between the considered primary score and the average value of the estimates for the group, expressed in standard deviation units: within how many standard deviations is the subject's primary score below or above the group mean.

Z-scores, with rare exceptions, take values ​​from the interval (-3, +3). Being convenient for scientific analysis in the process of developing new tests, the Z-scale is inconvenient for practical use in assessing the knowledge of the subjects of the group. Z-scores can take on fractional and negative values, which are difficult to work with in calculations and difficult to interpret for test users. Rounding Z-scores to integer values ​​is not always acceptable, because the main purpose of creating tests is to identify differences in the preparation of the subjects. Negative Z-score values, indicating results below the average for the group of tested students, also cause certain inconveniences - they will cause obvious rejection among the students who receive them. In general, all this makes the Z-score inconvenient for reporting results to the subjects and forces the use of special conversion methods for scoring students.

Z-score transformations

Z-score transformations aim to translate them into values ​​that are easier to write down and explain. In this case, the transformation used must be linear in order to preserve the shape of the distribution of Z-scores. The general formula for such a transformation is

Z1 = M+ ? Z ,

Where Z 1 is the converted estimate, M is the new mean value (the mean value of the scores after the transformation), - new standard deviation. Different transformations have different meanings M and . Here are some of the most well-known transformations of Z-scores.

T-scale(McCall, 1939, for reporting children's performance on mental ability tests). The average value is chosen M = 50 and standard deviation? = 10. We get: Z 1 \u003d 50 + 10 Z

CEEV scale(ETS, to notify prospective students of college entrance examination scores). The average value is chosen M = 500 and standard deviation? = 100. We get: Z 1 \u003d 500 + 100 Z

IQ scale(Weshler, 1939, for interpreting adult intelligence scores). The average value is chosen M = 100 and standard deviation? = 15. We get: Z 1 \u003d 100 + 15 Z

Scales of stalines and walls

Sometimes when reporting results, scales consisting of single integers are used, for example, from 1 to 9 or from 1 to 10. This is convenient for reporting test results, because. such scales are obviously simple.

Dividing the normal distribution into 9 intervals results in a Stanine scale with 9 standard units. On this scale, the mean is 5 and the standard deviation is approximately 2. When evaluating subjects on any test with any number of items, the worst 4% scores are assigned a stanine of 1 and the best scores a stanine of 9. The next worst and best scores are 7% assign stanines 2 and 8, respectively. The next 12% of the results are stannins 3 and 7. The next 17% are assigned stannins 4 and 6, and finally, 20% of the average results correspond to stannin 5.

In the wall scale, often called the Cattell scale, the entire array of results is divided into 10 parts with an interval of 0.5 standard deviation. In the wall scale, the arithmetic mean is taken to be 5.5, and the distance between two adjacent standard units is 0.5.

Sometimes an eleven-point scale is obtained from the Stanine scale by identifying one percent of the strongest and weakest subjects and assigning them the maximum and minimum scores, respectively.

Setting a passing score

There are many methods for establishing a passing score in criterion-oriented testing. All methods are divided into absolute and relative. Almost all methods involve experts in the procedure for determining the passing score. Let's take a look at some of the known methods.

Job Centered Methods

Nedelsky method(1954) - for closed tasks.

Each expert must analyze all the tasks and cross out for each task the numbers of answers that the minimally competent subject will be able to refuse. For each task, the expert indicates the reciprocal of the number of remaining answers. For example, if in a task with five answers the expert crossed out two, then he will indicate the number 1/3 for this task. Then all these reciprocals are summed up. The resulting number can be considered as a probable assessment of the minimum competent subject by this expert. Then the scores of all experts are averaged.

Angoff method(1971). Experts are asked to imagine a group of minimally competent subjects and, for each item, estimate the proportion of subjects in this group who answered the item correctly. (This is the same as estimating the probability that a minimally competent subject will answer the task correctly.) These probabilities are added for each expert and averaged over all experts.

Ebel method(1972). This method uses a 2D grid to categorize each task. Experts are asked to divide all tasks according to difficulty (three levels of difficulty are offered - the task is easy, medium difficulty, difficult) and by the relevance of its content (4 levels of relevance are offered - essential, important, acceptable, controversial). Thus, all tasks are laid out in the cells of this grid. Then the experts must evaluate how the minimally competent subject will perform the tasks in each cell, i.e. indicate the percentage of the number of tasks in the cell that he must answer correctly.

Subject Centered Methods(Nedelsky, 1954; Zieky and Livingston, 1977)

Contrasting group method

The experts agree on what is the result of performing the test at the level of minimum competence. Then the experts divide all the subjects into two groups - competent and incompetent (excluding those who, in their opinion, are on the border). Next, graphs of the distribution of points for each of the groups are plotted on one drawing. The point of intersection of the graphs is taken as a passing score.

Boundary group method

In contrast to the previous method, experts are asked to identify subjects who, in their opinion, are on the border between two contrasting groups that differ in competence. The median of the distribution of scores of the selected group is taken as the passing score.

Critics of this approach point out that the establishment of a passing score based on the performance of the test by the subjects does not correspond in essence to the main goal of criterion-oriented testing, since this approach is not related to the content of the test.

Standardization

- unification, bringing to the same standards the procedure and test scores. Thanks to the standardization of the methodology, the comparability of the results obtained for different subjects is achieved and it becomes possible to express test scores in indicators relative to the standardization sample.

1) Standardization - processing and regulation of the procedure for conducting, unification of instructions, survey forms, methods for recording results, conditions for conducting the survey, characteristics of the contingents of subjects. The strict periodicity of the examination procedure is a prerequisite for ensuring the reliability of the test and determining the test norms for evaluating the results in the examination.

2) Standardization - transformation of the normal rating scale into a new scale based not on the quantitative values ​​of the studied indicator, but on its relative place in the distribution of results in the sample of subjects.

Stages of standardization

Stage 1. Creation of a uniform testing procedure.

It consists of determining the moments of the diagnostic situation.

· Testing conditions (room, lighting, and other external factors).

· The content of the instruction and the features of its presentation (tone of voice, pauses, speed of speech, etc.).

· The presence of standard stimulus material (for example, Rorschach cards).

· Time limits for this test.

· Standard form for performing this test.

· Accounting for the influence of situational factors on the process and test results.

· Accounting for the influence of the diagnostician's behavior on the process and test result

· Accounting for the influence of the experience of the subject in testing.

Stage 2. Creation of a uniform evaluation of test performance. FROM standard interpretation of the obtained results and preliminary standard processing. At this stage, the obtained indicator is compared with the norm for performing this test for a given age.

Stage 3. Determination of test performance standards. Norms are developed for different ages, professions, genders, etc.

z-score

The most common transformations of primary estimates are centering and normalization through standard deviations. The normalization procedure consists in the transition to other units of measurement. The normalization function is usually Z-score (standard indicator), which expresses the deviation of an individual result X in units proportional to the standard deviation.

More widespread in psychodiagnostics are standard indicators calculated on the basis of a linear and non-linear transformation of primary indicators distributed according to a normal or close to normal law. In such a calculation, a z-transform of the estimates is performed. To determine the z-standard score, determine the difference between the individual primary result and the mean for the normal group, and then divide this difference by the δ of the normal sample.

X - raw score (number of tasks completed)

Mx - the average value of completed tasks for the entire sample

δ - standard deviation (in foreign psychology SD)

Mathematician Carl Gauss proposed a function describing the normal distribution. Normal distribution equation plot - symmetrical unimodal bell curve (or bell curve ).

Let's call the arithmetic mean Mx, and the standard deviation δ (sigma small). With a normal distribution, all the studied quantities are within Mx ± 5 δ.

Within Mx ± δ is 68.26%, the remaining 31.74% are located symmetrically at 15.87

Within Mx ± 2 δ is 95.44%

And within Mx ± 3 δ is 99.72%

PERCENTILES

Percentile is the percentage of individuals from the standardization sample who score below the given primary indicator. The percentile scale can be considered as a set of rank gradations with the number of ranks 100 and counting from the 1st rank, corresponding to the lowest result;

50th percentile ( R 50 )corresponds to the median of the distribution of results

Percentiles should not be confused with regular percentages. The latter represent the proportion of correct solutions out of the total number of test items in an individual result. Ranks R 1 and R 100 receive, respectively, the lowest and highest results from those observed in the sample, however, these ranks can correspond to far from zero (no correct solutions) or absolute (all solutions are correct) indicators. For example, with a total of 120 tasks, the minimum result corresponding to the first rank can be 6 correct solutions, while the maximum result corresponding to the rank R 100 , will be 95 correctly solved tasks. This situation is observed, for example, when evaluating speed tests.

The main disadvantage of percentile scales is the uneven units of measurement. In a normal distribution, individual variables are closely grouped at the center of the distribution and disperse as they move away from the edges. Therefore, equal frequencies of cases near the center correspond to shorter intervals along the x-axis, located at the edges of the distribution of estimates. Percentiles show the relative position of each subject in a normal sample, but not the magnitude of the difference between results. This creates some inconvenience in the interpretation of individual results. Thus, the difference in primary indicators corresponding to the interval R 70 R 80, can be 10 points, and the difference in the number of correct solutions in the range of ranks R 50R 60, - only 1 - 3 points.

However, percentile estimates also have a number of advantages. They are easily accessible to the understanding of users of psychodiagnostic information, are universal in relation to various types of methods and are easily calculated.

Statistical norms

BUT. Statistical norms. Boundary values ​​on the scale of test scores, formed on the basis of the frequency distribution of test scores in the standardization sample. As a rule, these boundary values ​​separate a fixed percentage of subjects from the sample: (decile), 25 (quartile), 50 (median). In a normal distribution, the statistical norm is described using parameters (mean plus/minus sigma, or standard deviation). Statistical norms serve to make a "comparative decision" and do not provide information for making "normative decisions"

B. Age norms - private versions of psychodiagnostic norms collected for children of different ages.

AT. Criteria norms - diagnostic norms, in which the correspondence between the test scores on the scale of the measured property and the level of the criterion indicator is specified. In the case of criterion behavior, criterion norms indicate the probability of occurrence of criterion behavior for a given value of the test score.

G. school norms are developed on the basis of tests of school achievements or tests of school abilities.

D. professional standards. Are established on the basis of tests for various professional groups.

E. Local norms . They are established for narrow categories of people who differ in the presence of a common feature - age, gender, geographical area, socioeconomic status.

AND. national norms. Developed for representatives of a given nation or country as a whole.

STANAINS

An example of a non-linear converted to a standard scale is the Stanine scale (English standard nine - standard nine), where the scores take values ​​from 1 to 9, M = 5, δ = 2

The Stanine scale is becoming more and more widespread, combining the advantages of standard scale indicators and the simplicity of percentiles. Primary indicators are easily converted to stalines. To do this, the subjects are ranked in ascending order of results and from them they form groups with the number of persons proportional to certain frequencies of assessments in the normal distribution of test results.

WALLS

When transforming grades into a scale stans (from the English standsrt ten - standard ten) a similar procedure is carried out with the only difference that ten standard intervals lie at the base of this scale.

Scaling Test Results

Stevens (1946) identified 4 levels of measurement scales, differing in the degree to which the estimates belonging to them retain the properties of the set of real numbers. These are the scales:

Nominal (or nominative, naming scale)

ordinal

Interval

Relationship scale.

Interpretation of test results

In tests with norm-oriented interpretation the main task is to determine the comparative place of each of the tested in the general group of subjects. Obviously, the place of each subject depends on the background of which group he is evaluated. The same result can be classified as fairly high if the group is weak, and rather low if the group is strong. That is why it is necessary, if possible, to use norms that reflect the results of the test by a large representative (from French sample of subjects.

In tests with criteria-oriented interpretation the task is to compare the educational achievements of each student with the volume of knowledge, skills and abilities planned for assimilation. In this case, a specific area of ​​content is used as an interpretive frame of reference, and not one or another sample of subjects. The main problem is the establishment of a passing score that separates those who have mastered the material being tested from those who have not mastered it.

Establish test performance standards

To eliminate the dependence of the interpretation on the results of other test participants, special test performance standards are used, and thus, the primary score of an individual test subject is compared with the test performance standards. Norms - this is a set of indicators that are established empirically based on the results of a test by a well-defined sample of subjects. The development and procedures for obtaining these indicators are normalization process(or standardization) test. The most common norms are the mean and standard deviation of a set of individual scores. Correlation of the primary score of the subject with the performance standards allows you to establish the place of the subject in the sample used to standardize the test.

Types of scales used to convert primary scores

The most famous conversions of primary scores are:

Percentile rank, reflecting the percentage of subjects in the normative group, whose results are lower or equal to a given value of the primary score;

Linear Z-assessment, defined as the ratio of the individual deviation of the test score to the standard deviation for the group of subjects;

Estimates that are a linear transformation z-assessments (T-scale, standard IQ scores, etc.);

Stanine and wall scales, which are obtained by dividing the scale of primary points into various intervals.

Percentile rank scale

Percentiles allow you to set the rank of the primary indicator of the subject in the normative group. The percentile rank corresponding to a given primary score shows the percentage of subjects in the normative sample whose results are not higher than the given primary score.

Percentiles should not be confused with percentages representing the percentage of tasks correctly completed by the subjects of the group. Unlike the last - primary - the percentile is a derived indicator indicating the share of the total number of subjects in the group.

In addition to the convenience associated with ease of interpretation, percentile ranks have significant drawbacks. The percentile rank scale is non-linear, i.e., in different areas of the primary score scale, an increase of 1 point can correspond to different increases on the percentile scale. Therefore, percentiles not only do not reflect, but even distort the real differences in the result of the test.

Therefore, the use of percentiles is rather limited. Due to their convenience and simplicity, they are mainly used in norm-oriented tests for self-assessment of students' knowledge, reporting the results to the students themselves and their parents.

Z- scale

Converts individual results to a standard scale with an overall average score and a common measure of variance. Z- evaluation i- th the student is found by the formula:

where primary score i- th the subject; OCRUncertain203"> is the standard deviation of the set of primary scores.

Z-scale is standard with zero mean and unit standard deviation. With its help, you can bring the scores of students obtained on various tests to one form that is convenient for comparison.

Value Z-estimation is equal to the distance between the considered primary score and the average value of the estimates for the group, expressed in standard deviation units: within how many standard deviations is the subject's primary score below or above the group mean.

Z-scores, with rare exceptions, take values ​​from the interval (-3, +3). Being convenient for scientific analysis in the process of developing new tests, the Z-scale is inconvenient for practical use in assessing the knowledge of the subjects of the group. Z-scores can take on fractional and negative values, which are difficult to work with in calculations and difficult to interpret for test users. Rounding Z-scores to integer values ​​is not always acceptable, since the main purpose of creating tests is to identify differences in the preparation of the subjects. Negative Z-score values, indicating results below the average for the group of tested students, also cause certain inconveniences - they will cause obvious rejection among the students who receive them. In general, all this makes the Z-score inconvenient for reporting results to the subjects and forces the use of special conversion methods for scoring students.

TransformationsZ-estimates

Z-score transformations aim to translate them into values ​​that are easier to write down and explain. In this case, the transformation used must be linear in order to preserve the shape of the distribution of Z-scores. The general formula for such a transformation is

Z1= M+ σ Z,

where Z1 is the converted estimate, M is the new mean value (the mean value of the scores after the transformation), - new standard deviation. Different transformations have different meanings M and . Here are some of the most well-known transformations of Z-scores.

T-scale(McCall, 1939, for reporting children's performance on mental ability tests). The average value is chosen M = 50 and standard deviation σ = 10. We get: Z1=50 + 10 Z

CEEV scale(ETS, to notify prospective students of college entrance examination scores). The average value is chosen M = 500 and standard deviation σ = 100. We get: Z1=500 + 100 Z

Scale IQ(Weshler, 1939, for interpreting adult intelligence scores). The average value is chosen M = 100 and standard deviation σ = 15. We get: Z1=100 + 15 Z

Scales of stalines and walls

Sometimes when reporting results, scales consisting of single integers are used, for example, from 1 to 9 or from 1 to 10. This is convenient for reporting test results, since such scales have obvious simplicity.

Dividing the normal distribution into 9 intervals results in a Stanine scale with 9 standard units. On this scale, the mean is 5 and the standard deviation is approximately 2. When evaluating subjects on any test with any number of items, the worst 4% scores are assigned a stanine of 1 and the best scores a stanine of 9. The next worst and best scores are 7% assign stanines 2 and 8, respectively. The next 12% of the results are stannins 3 and 7. The next 17% are assigned stannins 4 and 6, and finally, 20% of the average results correspond to stannin 5.

In the wall scale, often called the Cattell scale, the entire array of results is divided into 10 parts with an interval of 0.5 standard deviation. In the wall scale, the arithmetic mean is taken equal to 5.5, and the distance between two adjacent standard units is 0.5. from known methods.

Job Centered Methods

MethodNedelsky(1954) - for closed tasks.

Each expert must analyze all the tasks and cross out for each task the numbers of answers that the minimally competent subject will be able to refuse. For each task, the expert indicates the reciprocal of the number of remaining answers. For example, if in a task with five answers the expert crossed out two, then he will indicate the number 1/3 for this task. Then all these reciprocals are summed up. The resulting number can be considered as a probable assessment of the minimum competent subject by this expert. Then the scores of all experts are averaged.

MethodAngoff(1971). Experts are asked to imagine a group of minimally competent subjects and, for each item, estimate the proportion of subjects in this group who answered the item correctly. (This is the same as estimating the probability that a minimally competent subject will answer the task correctly.) These probabilities are added for each expert and averaged over all experts.

Methodebel(1972). This method uses a 2D grid to categorize each task. Experts are asked to divide all tasks according to difficulty (three levels of difficulty are offered - the task is easy, medium difficulty, difficult) and by the relevance of its content (4 levels of relevance are offered - essential, important, acceptable, controversial). Thus, all tasks are laid out in the cells of this grid. Then the experts must evaluate how the minimum competent subject will complete the tasks in each cell, i.e. indicate the percentage of the number of tasks in the cell that he must answer correctly.

Subject Centered Methods(Nedelsky, 1954; Zieky and Livingston, 1977)

Contrasting group method

The experts agree on what is the result of performing the test at the level of minimum competence. Then the experts divide all the subjects into two groups - competent and incompetent (excluding those who, in their opinion, are on the border). Next, graphs of the distribution of points for each of the groups are plotted on one drawing. The point of intersection of the graphs is taken as a passing score.

Boundary group method

In contrast to the previous method, experts are asked to identify subjects who, in their opinion, are on the border between two contrasting groups that differ in competence. The median of the distribution of scores of the selected group is taken as the passing score.

Critics of this approach point out that the establishment of a passing score based on the performance of the test by the test subjects does not correspond essentially to the main goal of criterion-oriented testing, since this approach is not related to the content of the test.

Assessment of the physical development of children on a scale Z - score

An integral part of any program for studying the health and nutrition of children, both at the population level and in assessing the nutrition and health of the individual, is to track the anthropometric parameters of children in comparison with standard growth curves. The World Health Organization recommends a method for assessing the nutritional status of children based on the use of indicators of total body measurements (length and body weight). Evaluation of anthropometric data consists in calculating the number of standard deviations (Co or s) by which the studied indicator of body weight or length differs from the median of the standard population (WHO international standards are calculated on data from a study of anthropometric parameters of children in the USA and Great Britain). The calculated standard deviation is called Z - score or Z-score.

Anthropometric data of each child are characterized by their Z - score. If the child's anthropometric data is less than the median of the standard, then the Z - score will have a negative value, if the indicators are above the median, then the Z - score will be positive.

the value Z-score calculated for three indicators:

1. Body weight for age - Mt / V,

2. Body length for age - Dt / V,

3. Body weight for body length - Mt / Dt.

The indicator Mt / Dt is used only at the age of up to 10 years for girls and up to 11.5 years for boys.

For diagnostics, the boundary values ​​of SD are determined, which allow us to distinguish the following options for the estimated indicators:

- low (n), characterizing insufficient DT and MT - are set at values ​​of CO less than -2;

- high (in), characterizing excess DT and MT - are set at values ​​of CO more than +2;

— normal (nm)- is set at CO values ​​in the range from -2 to +2;

The length-for-age index characterizes linear growth and assesses long-term growth retardation, i.e. Z-score less than -2 may indicate chronic malnutrition resulting in stunted growth.

Z-score body weight for body length reflects body proportions or harmonious development, and it is very sensitive to acute malnutrition.

Z-score body weight for age is sensitive to acute malnutrition and reflects the child's current or recent malnutrition.

To process anthropometric data and calculate WHO indices, a special computer program ANTHRO v.1.01, 1990 has been developed and distributed free of charge. The program automatically takes into account the child's age in months. In practice, when using the program, it is necessary to register the date of birth and the date of the examination of the child.

For a group or population of children, the value of the group Z - score can be calculated and statistically estimated. The value of Z - score in the standard population is equal to zero. The more the Z-score in the study population differs from zero, the greater the difference between the study group of children and the reference population. The value of the group Z - score can be used for comparative analysis of children's contingents and in the health monitoring system.

Derived indicators can be subdivided according to their purpose. Some of them serve to determine the level of training achieved in a certain scale, while others - to establish the relative position of the subject in a certain normative group. In particular, percentiles serve to solve the second problem, allowing you to establish the rank of the primary indicator of the subject in the normative group. The rank of the indicator in percentiles is determined by the percentage of subjects from the standardization sample whose results are not higher than this primary indicator. The process of constructing a percentile scale consists in determining the percentile ranks of the primary indicators of the normative group.

The method of constructing the percentile scale can be considered on a small example of the results of a test performed by a group of 25 subjects, although, of course, it is unlikely to meet such a sample in practice. Usually the construction of these scales is performed on large arrays. Let, for example, 25 students tested in one of the subjects received the primary results presented in Table. 7.2:

Table 7.2. Test results

The first row in Table 7.2 contains the observed scores of the subjects in the sample, ordered from lowest to highest (left to right). Usually, for large groups, simple ordering is ineffective and it is more convenient to use grouped data, which involves the introduction of grades for individual groups (see Section 5.2 for details).

The second line presents the results of counting the number of subjects who have the same test score. Each element of the second row shows the number of repetitions of the score and is therefore called the frequency of the observed raw scores of the subjects. If the frequencies are summed from left to right, then the values ​​of the accumulated (cumulated) frequencies will be obtained. The cumulated frequencies are the sum of the frequencies observed at or below that score. For example, there are 9 subjects who scored 7 or lower because the Cumulative Frequencies for a score of 7 is the number 9.



The calculation of percentile ranks for filling the fourth row of the table is complicated by the need to determine the actual boundaries of the confidence interval (see Section 5.5) containing the true score of each subject in the sample. The actual length of the interval depends on the value of the standard error of the measurement. However, 0.5 units of raw score are usually used to define the boundaries of the interval. In this case, if the subject received a score of 5, the true value of his score lies in the range from 4.5 to 5.5, i.e. (4.5; 5), and the numbers 4.5 and 5.5 are called the lower and upper bounds of the unit interval of estimates, respectively.

The concepts of "upper" and "lower" boundaries are used to construct a scale of percentile ranks, assuming a uniform distribution of the results of the subjects within the confidence interval. For example, when calculating the percentage

Table 7.3. Building a Percentile Rank Scale

rank for a test score of 5, it is assumed that the results of two subjects are located on the interval (4.5; 5.5) evenly (Table 7.3).

Most likely, one result will be below the point corresponding to 5, and one will be above this point. Thus, among the subjects whose true score is less than 5, we can include three students, of which one has a score of 3, the second - a score of 4 and the third - one of two who received a score of 5, which in percentage terms will be (3/25) 100% \u003d 12%. This is the percentile rank corresponding to 5, which provides a convenient interpretation of the student's results: 12% of students from the normative sample completed 5 or fewer test items. In accordance with the definition introduced earlier, the 12th percentile in a group of 25 subjects is 5. Referring to the data obtained in the third column of Table. 7.2, we can definitely say that the primary result of 5 points is poor, since it exceeds the results of only 12% of the subjects of the standardization sample. This is a concrete and easily perceived result, which is convenient, first of all, for students when comparing achievements on a number of tests. A primary outcome that is below any of the standardization sample scores has a zero percentile rank. A score higher than any other in the sample will receive a percentile rank of 100. Of course, neither a rank of zero nor a rank of 100% indicates zero or absolute knowledge of the controlled subject.

It is possible to solve the inverse problem when it is necessary to determine r-th percentile, or rather, the point below which lie R % results . For determining p-th percentile, you need to complete 5 steps, which are obtained using the table. 7.4 and are given in table. 7.5.

Table 7.4. Relationship between raw scores and frequencies

Table 7.5.Determination of percentiles

Step Calculation step Calculation example
Calculation (rl)/100%, where P- cumulative frequency in the assessment group
Determination of the actual lower bound L category of estimates containing the result of the 1st step
Subtraction of accumulated k L frequencies (cum.f) from the result of the 1st step (determination of frequencies lying below (rp)/100%)
Determination of the fraction of the interval of discharges that lies under the frequency (rp)/100%
Adding the results of the 4th step to the results of the 2nd step. Final Formula

Percentiles should not be confused with percentages representing the percentage of tasks correctly completed by the subjects of the group. Unlike the latter, the primary percentile is a derived indicator indicating the share of the total number of subjects in the group.

In addition to the convenience of ease of interpretation, percentile ranks have two significant drawbacks. The first is that percentile ranks are values ​​of an ordinal scale, since they show the relative position of each individual in the normative sample, and do not reveal the difference between the results of individual subjects of the group. The second drawback aggravates the first one to a certain extent - percentiles not only do not reflect, but even distort the real differences in the test result. This is due to the peculiarities of the distribution of percentiles, which has a rectangular character. The distribution of primary indicators differs significantly from a rectangular one and approaches a normal curve for good norm-oriented tests. In this regard, small deviations from the mean at the center of the distribution of observed outcomes are greatly increased by percentiles, while relatively large deviations at the edges of the bell curve will be compressed.

The disadvantages mentioned are the main reason why the use of percentiles is rather limited. Due to their convenience and simplicity, they are mainly used in tests for self-assessment of students' knowledge.

Z-SCALE

The simplest method for identifying the place of the result of the i-th student (X) in comparison with the results of others is based on calculating the deviation of the Xi score from the average value of the X scores for a group of tested students. The deviation is found by calculating the difference X-X f If the difference X-Xt> Oh, then the result 1st student above the group average. A negative difference value indicates a result below the mean. x.

Since the arithmetic averages obtained for different tests and in different groups differ significantly, the problem of comparability of deviations arises. The same score X t in a weak group it may be above average, in a strong group it may be much lower. In addition, the deviation scale turns out to be stretched differently depending on the length of the test.

A convenient means of overcoming the noted difficulties is the conversion of individual results into a standard Z-scale with an overall average score and a common measure of score variation. In general, the construction of standard scales is carried out by linear or non-linear transformations of raw scores. In a linear transformation, standard scores express the deviation of individual scores from the mean raw score in units proportional to the standard deviation of the distribution. In the latter case, the scaled result of the i-th student is found by the formula

where X.- raw score of the i-th subject; X- the average value of the individual scores of the test subjects of the group (i= 1,2,..., N); Sx- the standard deviation over the set of raw scores calculated using the formula (see section 5.2).

Due to the fact that from each initial value X ( subtracted x, the same ^ is subtracted from the average of the original scores. Therefore, the arithmetic mean of the difference X-X i(/ = 1,2,..., N), obtained for the group of students being tested is equal to zero. This statement is quite convincingly illustrated by the example of calculating the average value of the differences X- X i for a matrix of test results of 10 subjects (section 5.2). The sum of the differences is equal to zero:

Similarly, it is easy to show that the standard deviation over the set of values ​​is 1. Thus, the Z-score is standard with zero mean and one standard deviation. With its help, it is possible to bring the scores of students obtained on various tests to one form convenient for comparison by normalizing individual results.

For the example above, the scores of 10 subjects on the Z-scale are obtained by dividing the calculated differences by the standard deviation of 2.6:

It is interesting to compare the obtained scaled results with the raw scores of 10 subjects (Table 7.6).

Table 7.6. Comparative results

Subject number i Job number X i Xi-X Z i
0,38
-0 -1,14
-4" -1,52
-1,52
-1 -0,38
-1 -0,38
-1 -0,38
-1,52
0,38
X=5 Sx=2.6 Amount = 0 Z=0 S z =\

When using tests that have passed the long-term stage of standardization and have stable estimates of general parameters, the conversion of raw scores to the Z-scale is carried out according to the formula

where M and a x - general arithmetic mean and general variance, respectively.

Obviously, for a raw score exactly equal to the mean, the Z-score goes to zero. Negative Z values ​​indicate below-average performance, while positive Z-values ​​indicate good performance, above the group average of raw scores.

Z-scores are especially useful in the case of a normal distribution of primary scores, when all Z values ​​generally vary between -3 and +3. Sometimes they try to expand the variation interval and take into account all scores ranging from -5 to +5, which, no doubt, is meaningless, since the values ​​at the ends of the interval are determined with a very large measurement error.

The undoubted advantage of the Z-scale is the common arithmetic mean and the overall measure of data variation, which makes it possible to achieve comparability of results across different tests. However, in addition to the obvious advantages, there are also disadvantages. Being convenient for scientific analysis in the process of developing new tests, the Z-scale is inconvenient for practical use in assessing the knowledge of the subjects of the group. This is primarily due to the fact that Z values ​​often have to be calculated to multiple decimal places, since the average of individual scores is rarely an integer. Since identifying differences in test preparation is the main purpose of test design, it is easy to understand that rounding Z-scores is not always acceptable, as it can nullify the initial differences in individual scores and thereby reduce the differentiating effect of the test.

The effect of reducing the differentiating ability of the test as a result of rounding Z-scores can be illustrated by an example of the data in Table. 7.6. The results of the second and third subjects differing before rounding Z 2 = -1.14 and Z 3 = -1.52 turn after it into the same scores Z 2 = Z 3 = - 1.

Certain inconveniences are caused by negative values ​​of the Z-score, indicating results below the average for the group of tested students. It is clear that in the practice of control, negative values ​​of Z-scores will cause obvious rejection among the students who received them. In general, all this makes the Z-score inconvenient for reporting results to the subjects of the group and forces the use of special conversion methods for grading students.