How to determine the level of significance in the Pearson test. Solving problems for testing statistical hypotheses

Purpose of criterion χ 2 - Pearson's criterion The criterion χ 2 is used for two purposes: 1) to compare the empirical distribution of a trait with the theoretical one - uniform, normal or some other; 2) to compare two, three or more empirical distributions of the same feature. Description of the criterion The χ 2 criterion answers the question of whether different values ​​of a feature occur with the same frequency in the empirical and theoretical distributions or in two or more empirical distributions. The advantage of the method is that it allows comparing the distributions of features presented in any scale, starting from the scale of names. In the simplest case of the alternative distribution "yes - no", "married - did not allow marriage", "solved the problem - did not solve the problem", etc., we can already apply the criterion χ 2 . The greater the discrepancy between two comparable distributions, the greater the empirical value of χ 2 . Automatic calculation of χ 2 - Pearson's criterion To automatically calculate χ 2 - Pearson's criterion, it is necessary to perform two steps: Step 1. Specify the number of empirical distributions (from 1 to 10); Step 2. Enter the empirical frequencies in the table; Step 3. Get an answer.

The advantage of the Pearson criterion is its universality: it can be used to test hypotheses about various distribution laws.

1. Testing the hypothesis of a normal distribution.

Let a sample of a sufficiently large size be obtained P with a lot of different variant values. For the convenience of its processing, we divide the interval from the smallest to the largest of the values ​​of the variant by s equal parts and we will assume that the values ​​of the options that fall into each interval are approximately equal to the number that specifies the middle of the interval. Having counted the number of options that fell into each interval, we will make the so-called grouped sample:

options……….. X 1 X 2 … x s

frequencies…………. P 1 P 2 … n s ,

where x i are the values ​​of the midpoints of the intervals, and n i is the number of options included in i th interval (empirical frequencies).



Based on the data obtained, it is possible to calculate the sample mean and sample standard deviation σ B. Let us check the assumption that the general population is distributed according to the normal law with parameters M(X) = , D(X) = . Then you can find the number of numbers from the volume sample P, which should be in each interval under this assumption (that is, theoretical frequencies). To do this, using the table of values ​​of the Laplace function, we find the probability of hitting i-th interval:

,

where a i and b i- borders i-th interval. Multiplying the resulting probabilities by the sample size n, we find the theoretical frequencies: p i =n p i.Our goal is to compare empirical and theoretical frequencies, which, of course, differ from each other, and to find out whether these differences are insignificant, do not disprove the hypothesis of the normal distribution of the random variable under study, or are they so large that they contradict this hypothesis. For this, a criterion is used in the form of a random variable

. (20.1)

Its meaning is obvious: the parts are summed up, which are the squares of the deviations of the empirical frequencies from the theoretical ones from the corresponding theoretical frequencies. It can be proved that, regardless of the real distribution law of the general population, the distribution law of the random variable (20.1) at tends to the distribution law (see lecture 12) with the number of degrees of freedom k = s - 1 – r, where r is the number of parameters of the estimated distribution estimated from the sample data. The normal distribution is characterized by two parameters, so k = s - 3. For the selected criterion, a right-handed critical region is constructed, determined by the condition

(20.2)

where α - significance level. Therefore, the critical region is given by the inequality and the acceptance area of ​​the hypothesis is .

So, to test the null hypothesis H 0: the population is normally distributed - you need to calculate the observed value of the criterion from the sample:

, (20.1`)

and according to the table of critical points of the distribution χ 2 find the critical point using the known values ​​of α and k = s - 3. If - the null hypothesis is accepted, if it is rejected.

2. Testing the hypothesis of uniform distribution.

When using the Pearson test to test the hypothesis of a uniform distribution of the general population with an assumed probability density

it is necessary, having calculated the value from the available sample, to estimate the parameters a and b according to the formulas:

where a* and b*- estimates a and b. Indeed, for a uniform distribution M(X) = , , from where you can get a system for determining a* and b*: , whose solution is expressions (20.3).

Then, assuming that , you can find the theoretical frequencies using the formulas

Here s is the number of intervals into which the sample is divided.

The observed value of the Pearson criterion is calculated by the formula (20.1`), and the critical value is calculated from the table, taking into account the fact that the number of degrees of freedom k = s - 3. After that, the boundaries of the critical region are determined in the same way as for testing the hypothesis of a normal distribution.

3. Testing the hypothesis about the exponential distribution.

In this case, dividing the existing sample into intervals of equal length, we consider a sequence of options equidistant from each other (we assume that all options that fall into i-th interval, take a value coinciding with its middle), and their corresponding frequencies n i(number of sample options included in i– th interval). We calculate from these data and take as an estimate of the parameter λ value . Then the theoretical frequencies are calculated by the formula

Then, the observed and critical values ​​of the Pearson criterion are compared, taking into account that the number of degrees of freedom k = s - 2.

Consider the application inMSEXCELPearson's chi-square test for testing simple hypotheses.

After receiving experimental data (i.e. when there is some sample) usually a distribution law is chosen that best describes the random variable represented by the given sampling. Checking how well the experimental data are described by the chosen theoretical distribution law is carried out using consent criteria. null hypothesis, there is usually a hypothesis that the distribution of a random variable is equal to some theoretical law.

Let's first look at the application Pearson's goodness-of-fit test X 2 (chi-square) in relation to simple hypotheses (the parameters of the theoretical distribution are assumed to be known). Then - , when only the distribution form is specified, and the parameters of this distribution and the value statistics X 2 are estimated/calculated on the basis of the same samples.

Note: In English-language literature, the application procedure Pearson's goodness-of-fit test X 2 has a name The chi-square goodness of fit test.

Recall the procedure for testing hypotheses:

  • based samples value is calculated statistics, which corresponds to the type of hypothesis being tested. For example, to use t-statistics(if not known);
  • subject to truth null hypothesis, the distribution of this statistics known and can be used to calculate probabilities (for example, for t- statistics this is );
  • calculated based on samples meaning statistics compared with the critical value for the given value ();
  • null hypothesis rejected if the value statistics greater than critical (or if the probability of getting this value statistics() less significance level, which is the equivalent approach).

Let's spend hypothesis testing for different distributions.

Discrete case

Suppose two people are playing dice. Each player has their own set of dice. Players take turns rolling 3 dice at once. Each round is won by the one who rolls more sixes at a time. The results are recorded. One of the players, after 100 rounds, had a suspicion that the bones of his opponent were not symmetrical, because. he often wins (often throws sixes). He decided to analyze how likely such a number of opponent's outcomes are.

Note: Because 3 dice, then you can roll 0 at a time; one; 2 or 3 sixes, i.e. random variable can take 4 values.

From the theory of probability, we know that if the cubes are symmetrical, then the probability of sixes falling out obeys. Therefore, after 100 rounds, the frequencies of sixes can be calculated using the formula
=BINOM.DIST(A7,3,1/6,FALSE)*100

The formula assumes that the cell A7 contains the corresponding number of dropped sixes in one round.

Note: Calculations are given in example file on sheet Discrete.

For comparison observed(Observed) and theoretical frequencies(Expected) convenient to use .

With a significant deviation of the observed frequencies from the theoretical distribution, null hypothesis about the distribution of a random variable according to a theoretical law, should be rejected. That is, if the opponent's dice are not symmetrical, then the observed frequencies will be "significantly different" from binomial distribution.

In our case, at first glance, the frequencies are quite close and it is difficult to draw an unambiguous conclusion without calculations. Applicable Pearson's goodness-of-fit test X 2, so that instead of the subjective statement "significantly different", which can be made on the basis of comparison histograms, use a mathematically correct statement.

Let us use the fact that law of large numbers observed frequency (Observed) with increasing volume samples n tends to the probability corresponding to the theoretical law (in our case, binomial law). In our case, the sample size n is 100.

Let's introduce test statistics, which we denote by X 2:

where O l is the observed frequency of events that the random variable has taken certain acceptable values, E l is the corresponding theoretical frequency (Expected). L is the number of values ​​that a random variable can take (in our case it is equal to 4).

As can be seen from the formula, this statistics is a measure of the closeness of the observed frequencies to the theoretical ones, i.e. it can be used to estimate the "distances" between these frequencies. If the sum of these "distances" is "too large", then these frequencies are "substantially different". It is clear that if our cube is symmetrical (i.e. applicable binomial law), then the probability that the sum of "distances" will be "too large" will be small. To calculate this probability, we need to know the distribution statistics X 2 ( statistics X 2 calculated based on random samples, so it is a random variable and, therefore, has its own probability distribution).

From a multidimensional analog Moivre-Laplace integral theorem it is known that for n->∞ our random variable X 2 is asymptotically with L - 1 degrees of freedom.

So if the computed value statistics X 2 (the sum of the “distances” between frequencies) will be more than a certain limit value, then we will have reason to reject null hypothesis. As in checking parametric hypotheses, the limit value is set via significance level. If the probability that the statistic X 2 will take a value less than or equal to the calculated ( p-meaning) will be less significance level, then null hypothesis can be rejected.

In our case, the statistic value is 22.757. The probability that the X 2 statistic will take a value greater than or equal to 22.757 is very small (0.000045) and can be calculated using the formulas
=XI2.DIST.PX(22,757;4-1) or
=XI2.TEST(Observed; Expected)

Note: The CH2.TEST() function is specifically designed to test the relationship between two categorical variables (see ).

The probability of 0.000045 is significantly less than usual significance level 0.05. So, the player has every reason to suspect his opponent of dishonesty ( null hypothesis about his honesty is denied).

When applied criterion X 2 care must be taken to ensure that the volume samples n was large enough, otherwise the approximation of the distribution would be invalid statistics X 2. It is usually considered that for this it is sufficient that the observed frequencies (Observed) are greater than 5. If this is not the case, then low frequencies are combined into one or joined to other frequencies, and the total probability is assigned to the combined value and, accordingly, the number of degrees of freedom decreases X 2 -distribution.

In order to improve the quality of application criterion X 2(), it is necessary to reduce the partitioning intervals (increase L and, accordingly, increase the number degrees of freedom), however, this is prevented by a restriction on the number of observations that fall into each interval (d.b.>5).

continuous case

Pearson goodness-of-fit test X 2 can be applied in the same way in the case of .

Consider some sampling, consisting of 200 values. Null hypothesis States that sample made from .

Note: Random variables in sample file on sheet Continuous generated using the formula =NORM.ST.INV(RAND()). Therefore, new values samples are generated every time the sheet is recalculated.

Whether the available data set is adequate can be visually assessed.

As you can see from the diagram, the sample values ​​fit pretty well along the straight line. However, as in for hypothesis testing applicable Pearson's goodness-of-fit test X 2 .

To do this, we divide the range of variation of a random variable into intervals with a step of 0.5. Let us calculate the observed and theoretical frequencies. We calculate the observed frequencies using the FREQUENCY() function, and the theoretical ones - using the NORM.ST.DIST() function.

Note: As for discrete case, it is necessary to ensure that sample was quite large, and more than 5 values ​​fell into the interval.

Calculate the statistics X 2 and compare it with the critical value for a given significance level(0.05). Because we divided the range of variation of a random variable into 10 intervals, then the number of degrees of freedom is 9. The critical value can be calculated by the formula
\u003d XI2.INV.RH (0.05; 9) or
\u003d XI2.OBR (1-0.05; 9)

The chart above shows that the statistic value is 8.19, which is significantly higher criticalnull hypothesis is not rejected.

Below is on which sample assumed an unlikely value, and on the basis of criteria Pearson's consent X 2 the null hypothesis was rejected (despite the fact that the random values ​​were generated using the formula =NORM.ST.INV(RAND()) providing sampling from standard normal distribution).

Null hypothesis rejected, although visually the data are quite close to a straight line.

As an example, let's also take sampling from U(-3; 3). In this case, even from the graph it is clear that null hypothesis must be rejected.

Criterion Pearson's consent X 2 also confirms that null hypothesis must be rejected.

In some cases, the researcher does not know in advance by which law the observed values ​​of the trait under study are distributed. But he may have good enough reasons to assume that the distribution is subject to one or another law, for example, normal or uniform. In this case, the main and alternative statistical hypotheses of the following form are put forward:

    H 0: the distribution of the observed feature is subject to the distribution law A,

    H 1: the distribution of the observed feature differs from A;

where as A one or another distribution law can act: normal, uniform, exponential, etc.

Testing the hypothesis about the proposed distribution law is carried out using the so-called goodness-of-fit criteria. There are several acceptance criteria. The most universal of them is Pearson's - criterion, since it is applicable to any kind of distribution.

-Pearson's criterion

Usually empirical and theoretical frequencies differ. Is the discrepancy random? The Pearson criterion answers this question, however, like any statistical criterion, it does not prove the validity of the hypothesis in a strictly mathematical sense, but only establishes its agreement or disagreement with the observational data at a certain level of significance.

So, let the statistical distribution of feature values ​​be obtained from the volume sample, where are the observed feature values, are the corresponding frequencies:

The essence of the Pearson criterion is to calculate the criterion according to the following formula:

where is the number of digits of the observed values, and are the theoretical frequencies of the corresponding values.

It is clear that the smaller the difference , the closer the empirical distribution is to the empirical one, therefore, the smaller the value of the criterion, the more reliably it can be argued that the empirical and theoretical distributions are subject to the same law.

Pearson's criterion algorithm

The Pearson criterion algorithm is simple and consists of the following steps:

So, the only non-trivial action in this algorithm is the determination of theoretical frequencies. They, of course, depend on the law of distribution, therefore - for different laws are defined differently.

The agreement criterion for testing the hypothesis about the distribution law of the random variable under study. In many practical problems, the exact distribution law is unknown. Therefore, a hypothesis is put forward about the correspondence of the existing empirical law, built on observations, to some theoretical one. This hypothesis requires statistical verification, the results of which will either confirm, or refuted.

Let X be the random variable under study. It is required to test the hypothesis H 0 that this random variable obeys the distribution law F(x). To do this, you need to make a sample of n independent observations and use it to build an empirical distribution law F "(x). To compare the empirical and hypothetical laws, a rule called the goodness of fit is used. One of the most popular is K. Pearson's chi-square goodness of fit.

It calculates the chi-square statistic:

,

where N is the number of intervals according to which the empirical distribution law was built (the number of columns of the corresponding histogram), i is the number of the interval, p t i is the probability that the value of the random variable falls into the i-th interval for the theoretical distribution law, p e i is the probability that the value of the random variable falls into i th interval for the empirical distribution law. It must obey the chi-square distribution.

If the calculated value of the statistic exceeds the chi-square distribution quantile with k-p-1 degrees of freedom for a given significance level, then the hypothesis H 0 is rejected. Otherwise, it is accepted at a given significance level. Here k is the number of observations, p is the number of estimated parameters of the distribution law .

Pearson allows you to test the empirical and theoretical (or other empirical) distributions of one feature. This criterion is mainly applied in two cases:

To compare the empirical distribution of a trait with a theoretical distribution (normal, exponential, uniform, or some other law);

To compare two empirical distributions of the same trait.

The idea of ​​the method is to determine the degree of divergence of the corresponding frequencies n i and ; the greater this discrepancy, the greater the value

Sample sizes must be at least 50 and the sums of frequencies must be equal

Null hypothesis H 0 = (two distributions practically do not differ from each other); alternative hypothesis - H 1 = (the discrepancy between the distributions is significant).

Here is a scheme for applying the criterion for comparing two empirical distributions:

Criterion - a statistical criterion for testing the hypothesis that the observed random variable obeys some theoretical distribution law.


Depending on the value of the criterion , the hypothesis can be accepted or rejected:

§ , the hypothesis is fulfilled.

§ (falls into the left "tail" of the distribution). Therefore, the theoretical and practical values ​​are very close. If, for example, a random number generator is checked that generated n numbers from a segment and the hypothesis is: the sample is distributed uniformly on , then the generator cannot be called random (the randomness hypothesis is not fulfilled), because the sample is too evenly distributed, but the hypothesis is satisfied.

§ (falls into the right "tail" of the distribution) the hypothesis is rejected.

Definition: Let a random variable X be given.

Hypothesis: With. in. X obeys the law of distribution.

To test the hypothesis, consider a sample consisting of n independent observations of r.v. X: . Based on the sample, we construct an empirical distribution of r.v. X. Comparison of the empirical and theoretical distributions (assumed in the hypothesis) is carried out using a specially selected function - the goodness-of-fit criterion. Consider Pearson's goodness-of-fit test (criterion):

Hypothesis: X n is generated by the function .

Divide into k non-overlapping intervals ;

Let be the number of observations in the j-th interval: ;

The probability of an observation falling into the j-th interval when the hypothesis is fulfilled;

- the expected number of hits in the j-th interval;

Statistics: - Chi-squared distribution with k-1 degrees of freedom.

The criterion is wrong on samples with low-frequency (rare) events. This problem can be solved by discarding low-frequency events, or by combining them with other events. This method is called Yates' correction.

Pearson's goodness-of-fit test (χ 2) is used to test the hypothesis that the empirical distribution corresponds to the expected theoretical distribution F(x) with a large sample size (n ≥ 100). The criterion is applicable for any kind of function F(x), even with unknown values ​​of their parameters, which usually takes place when analyzing the results of mechanical tests. This is where its versatility lies.

The use of the χ 2 criterion involves dividing the range of sample variation into intervals and determining the number of observations (frequency) n j for each of e intervals. For the convenience of estimating the distribution parameters, the intervals are chosen to be of the same length.

The number of intervals depends on the sample size. Usually accepted: at n = 100 e= 10 ÷ 15, at n = 200 e= 15 ÷ 20, at n = 400 e= 25 ÷ 30, at n = 1000 e= 35 ÷ 40.

Intervals containing less than five observations are combined with neighboring ones. However, if the number of such intervals is less than 20% of their total number, intervals with a frequency of n j ≥ 2 are allowed.

The Pearson test statistic is the value
, (3.91)
where p j is the probability that the random variable under study falls into the j-th interval, calculated in accordance with the hypothetical distribution law F(x). When calculating the probability p j, one must keep in mind that the left border of the first interval and the right border of the last must coincide with the borders of the region of possible values ​​of the random variable. For example, with a normal distribution, the first interval extends to -∞, and the last - to +∞.

The null hypothesis about the compliance of the sample distribution with the theoretical law F(x) is checked by comparing the value calculated by formula (3.91) with the critical value χ 2 α found from Table. Application VI for significance level α and number of degrees of freedom k = e 1 - m - 1. Here e 1 - number of intervals after merging; m is the number of parameters estimated from the considered sample. If the inequality
χ 2 ≤ χ 2 α (3.92)
then the null hypothesis is not rejected. If the indicated inequality is not observed, an alternative hypothesis is accepted that the sample belongs to an unknown distribution.

The disadvantage of Pearson's goodness-of-fit criterion is the loss of some of the initial information associated with the need to group the observation results into intervals and combine individual intervals with a small number of observations. In this regard, it is recommended to supplement the verification of the correspondence of distributions by the χ 2 criterion with other criteria. This is especially necessary with a relatively small volume samples (n ≈ 100).

The table shows the critical values ​​of the chi-squared distribution with a given number of degrees of freedom. The desired value is at the intersection of the column with the corresponding probability value and the row with the number of degrees of freedom. For example, the critical value of the chi-squared distribution with 4 degrees of freedom for a probability of 0.25 is 5.38527. This means that the area under the density curve of the chi-squared distribution with 4 degrees of freedom to the right of the value of 5.38527 is 0.25.

Pearson's criterion for testing the hypothesis about the form of the law of distribution of a random variable. Testing hypotheses about normal, exponential and uniform distributions by the Pearson criterion. Kolmogorov's criterion. Approximate method for checking the normality of the distribution, associated with estimates of the coefficients of skewness and kurtosis.

In the previous lecture, hypotheses were considered in which the law of distribution of the general population was assumed to be known. Now let's test the hypotheses about the alleged law of the unknown distribution, that is, we will test the null hypothesis that the population is distributed according to some known law. Usually, statistical tests for testing such hypotheses are called goodness-of-fit tests.

The advantage of the Pearson criterion is its universality: it can be used to test hypotheses about various distribution laws.

1. Testing the hypothesis of a normal distribution.

Let a sample of a sufficiently large size be obtained P with a lot of different meanings option. For the convenience of its processing, we divide the interval from the smallest to the largest of the values ​​​​of the variant by s equal parts and we will assume that the values ​​of vari

ants falling into each interval are approximately equal to the number that specifies the middle of the interval. Having counted the number of options that fell into each interval, we will make the so-called grouped sample:

options X 1 X 2 x s

frequencies P 1 P 2 n s ,

where x i are the values ​​of the midpoints of the intervals, and n i- the number of options included in i th interval (empirical frequencies).

Based on the data obtained, it is possible to calculate the sample mean and sample standard deviation σ B. Let us check the assumption that the general population is distributed according to the normal law with parameters M(X) = , D(X) = . Then you can find the number of numbers from the volume sample P, which should be in each interval under this assumption (that is, theoretical frequencies). To do this, using the table of values ​​of the Laplace function, we find the probability of hitting i-th interval:

,

where a i and b i- borders i-th interval. Multiplying the resulting probabilities by the sample size n, we find the theoretical frequencies: p i \u003d n? p i. Our goal is to compare empirical and theoretical frequencies, which, of course, differ from each other, and find out whether these differences are insignificant, do not disprove the hypothesis of the normal distribution of the random variable under study, or are they so large that they contradict this hypothesis. For this, a criterion is used in the form of a random variable

. (20.1)

Its meaning is obvious: the parts are summed up, which are the squares of the deviations of the empirical frequencies from the theoretical ones from the corresponding theoretical frequencies. It can be proved that, regardless of the real distribution law of the general population, the distribution law of the random variable (20.1) at tends to the distribution law (see lecture 12) with the number of degrees of freedom k = s- 1 - r, where r- the number of parameters of the estimated distribution, estimated from the sample data. The normal distribution is characterized by two parameters, so k = s- 3. For the selected criterion, a right-handed critical region is constructed, determined by the condition


(20.2)

where α - significance level. Therefore, the critical region is given by the inequality and the acceptance area of ​​the hypothesis is .

So, to test the null hypothesis H 0: the population is normally distributed - you need to calculate the observed value of the criterion from the sample:

, (20.1`)

and according to the table of critical points of the distribution χ 2 find the critical point using the known values ​​of α and k = s- 3. If - the null hypothesis is accepted, if it is rejected.

2. Testing the hypothesis of uniform distribution.

When using the Pearson criterion to test the hypothesis of a uniform distribution of the general population with the expected probability density

it is necessary, having calculated the value from the available sample, to estimate the parameters a and b according to the formulas:

where a* and b*- estimates a and b. Indeed, for a uniform distribution M(X) = , , from where you can get a system for determining a* and b*: , whose solution is expressions (20.3).

Then, assuming that , you can find the theoretical frequencies using the formulas

Here s is the number of intervals into which the sample is divided.

The observed value of the Pearson criterion is calculated by the formula (20.1`), and the critical value is calculated from the table, taking into account the fact that the number of degrees of freedom k = s- 3. After that, the boundaries of the critical region are determined in the same way as for testing the hypothesis of a normal distribution.

3. Testing the hypothesis about the exponential distribution.

In this case, dividing the existing sample into intervals of equal length, we consider a sequence of options equidistant from each other (we assume that all options that fall into i-th interval, take a value coinciding with its middle), and their corresponding frequencies n i(number of sample options included in i-th interval). We calculate from these data and take as an estimate of the parameter λ value . Then the theoretical frequencies are calculated by the formula

Then, the observed and critical values ​​of the Pearson criterion are compared, taking into account that the number of degrees of freedom k = s- 2.