Manna Whitney. Mann-Whitney U-criterion in thesis, course and master's work in psychology

Criterion U Mann - Whitney

Assigning a criterion. The criterion is designed to assess the differences between two samples by level any trait that can be quantified. It allows you to distinguish between small samples when P 1, n 2 > 3 or p L \u003d 2, p 2\u003e 5, and is more powerful than the criterion Q Rosenbaum.

This method determines if the area of ​​overlapping values ​​between two series is small enough. We remember that we call the 1st row (sample, group) the row of values ​​in which the values, according to a preliminary estimate, are higher, and the 2nd row is the one where they are supposedly lower.

The smaller the crossover area, the more likely it is that differences reliable. These differences are sometimes referred to as differences in location two samples. The empirical value of the criterion reflects how large the zone of coincidence between the rows is. That's why the less t/ 3Mn , especially it is likely that the differences reliable.

Hypotheses.

The level of non-verbal intelligence in the group of physics students is higher than in the group of psychology students.

Graphical representation of a criterionU. Pa fig. 7.25 presents three of the many possible options for the ratio of two series of values.

In option (a), the second row is lower than the first, and the rows almost do not intersect. Overlay area ( S j) too small to obscure differences between rows. There is a chance that the differences between them are significant. We can determine this exactly using the criterion U.

In variant (b), the second row is also lower than the first, but the area of ​​overlapping values ​​for the two rows is quite extensive (5 2). It may not yet reach a critical value, when the differences will have to be recognized as insignificant. But whether this is so can only be determined by exact calculation of the criterion U.

In option (c), the second row is lower than the first, but the overlap is so extensive (5 3) that the differences between the rows are obscured.

Rice. 7.25.

in two samples

Note. The overlap (5 t , S 2 , *$z) indicates the areas of possible overlap. Criteria restrictionsU.

  • 1. Each sample must contain at least three observations: n v p 2 > 3; it is allowed that there are two observations in one sample, but then there must be at least 5 of them in the second.
  • 2. Each sample should contain no more than 60 observations; p l, p 2 w, n 2 > 20 ranking becomes quite laborious.

Let's return to the results of a survey of students of the physical and psychological faculties of Leningrad University using D. Veksler's method for measuring verbal and non-verbal intelligence. Using the criterion Q Rosenbaum, it was determined with a high level of significance that the level of verbal intelligence in the sample of students of the Faculty of Physics is higher. Let us now try to establish whether this result is reproduced when comparing samples according to the level of non-verbal intelligence. The data are given in the table.

2 is below the level of the trait in sample 1 at a significantly significant level. The smaller the value u, the higher the significance of the differences.

Now let's do all this work on the material of our example. As a result of work on 1-6 steps of the algorithm, we will build a table (Table 7.4).

Table 7.4

Calculation of rank sums for samples of students of physical and psychological faculties

Physics students (P = 14)

Psychology students (n= 12)

Non-verbal intelligence score

Average 107.2

The total amount of ranks: 165 + 186 = 351. The calculated amount according to the formula (5.1) is as follows:

The equality of the real and estimated amounts is observed. We see that in terms of the level of non-verbal intelligence, a sample of psychology students is more “higher”. It is this sample that accounts for a large rank sum: 186. Now we are ready to formulate statistical hypotheses:

Self 0: a group of psychology students does not outperform a group of physics students in terms of non-verbal intelligence;

Me: a group of psychology students outperforms a group of physics students in terms of non-verbal intelligence.

In accordance with the next step of the algorithm, we determine the empirical value U :

Because in our case p l * p 2, calculate the empirical value U and for the second rank sum (165), substituting into formula (7.4) the corresponding p x.:

According to Appendix 8, we determine the critical values ​​for p l = 14, n 2 = 12:

We remember that the criterion U is one of two exceptions to the general rule for deciding whether differences are significant, namely, we can state significant differences if (/ emp U Kp 0 05 (at temp = 60, and sp > U Kf) about,05).

Consequently, H 0 is taken as follows: the group of psychology students does not surpass the group of physics students in terms of the level of non-verbal intelligence.

Let's pay attention to the fact that for this case the Rosenbaum Q-criterion is not applicable, since the range of variability in the group of physicists is wider than in the group of psychologists: both the highest and the lowest values ​​of non-verbal intelligence fall on the group of physicists (see Table 7.4) .

This statistical method was proposed by Frank Wilcoxon (see photo) in 1945. However, in 1947, the method was improved and expanded by H. B. Mann and D. R. Whitney, so the U-test is more commonly referred to by their names.

The criterion is designed to assess the differences between two samples in terms of the level of any trait, quantitatively measured. It allows you to identify differences between small samples when n 1 ,n 2 ≥3 or n 1 =2, n 2 ≥5, and is more powerful than the Rosenbaum test.

Description of the Mann-Whitney U test

There are several ways to use the criterion and several options for tables of critical values ​​corresponding to these methods (Gubler E. V., 1978; Runion R., 1982; Zakharov V. P., 1985; McCall R., 1970; Krauth J., 1988) .

This method determines if the area of ​​overlapping values ​​between two series is small enough. We remember that we call the 1st row (sample, group) the row of values ​​in which the values, according to a preliminary estimate, are higher, and the 2nd row is the one where they are supposedly lower.

The smaller the crossover area, the more likely the differences are to be significant. Sometimes these differences are called differences in the location of two samples (Welkowitz J. et al., 1982).

The empirical value of the U criterion reflects how large the zone of coincidence between the rows is. Therefore, the smaller U emp, the more likely it is that the differences are significant.

Hypotheses U - Mann-Whitney test

H0: The level of the attribute in group 2 is not lower than the level of the attribute in group 1.
H1: The level of the trait in group 2 is lower than the level of the trait in group 1.

Limitations of the Mann-Whitney U test

1. Each sample must contain at least 3 observations: n 1 ,n 2 ≥ З; it is allowed that there are 2 observations in one sample, but then there must be at least 5 of them in the second.

2. Each sample should contain no more than 60 observations; n 1 , n 2 ≤ 60.

Automatic calculation of the Mann-Whitney U-test

Step 1

Enter the data from the first sample in the first column (“Sample 1”) and the data from the second sample in the second column (“Sample 2”). Data is entered one number per line; no spaces, gaps, etc. Only numbers are entered. Fractional numbers are entered with a "." (dot). After filling in the columns, click on the "Step 2" button to automatically calculate the Mann-Whitney U-test.

Mann-Whitney U-test(English) Mann-Whitney U-test) is a statistical test used to assess the differences between two independent samples in terms of the level of any trait, measured quantitatively. Allows you to detect differences in the value of a parameter between small samples.

Wilcoxon rank-sum test ). Less common: the criterion for the number of inversions.

Story

This method for detecting differences between samples was proposed in 1945 by Frank Wilcoxon ( F. WilcoxonH. B. Mann) and D. R. Whitney ( D. R. Whitney

Description of the criterion

  1. There should be no matching values ​​in the sample data (all numbers are different) or there should be very few such matches (up to 10).

Using a Criterion

  1. Compile a single ranked series from both compared samples, arranging their elements according to the degree of growth of the feature and assigning a lower rank to the lower value. The total number of ranks will be equal to: N = n 1 + n 2 , (\displaystyle N=n_(1)+n_(2),) where n 1 (\displaystyle n_(1)) is the number of elements in the first sample, and n 2 (\displaystyle n_(2)) - the number of elements in the second sample.
  2. Divide a single ranked series into two, consisting of units of the first and second samples, respectively. Calculate separately the sum of the ranks that fell on the share of the elements of the first sample, and separately - on the share of the elements of the second sample. Define big of two rank sums (T x (\displaystyle T_(x))) corresponding to a sample with n x (\displaystyle n_(x)) elements.
  3. Determine the value of the Mann-Whitney U-test using the formula: U = n 1 ⋅ n 2 + n x ⋅ (n x + 1) 2 − T x . (\displaystyle U=n_(1)\cdot n_(2)+(\frac (n_(x)\cdot (n_(x)+1))(2))-T_(x).)
  4. Using the table for the selected level of statistical significance, determine the critical value of the criterion for data n 1 (\displaystyle n_(1)) and n 2 (\displaystyle n_(2)) . If the received value is U (\displaystyle U) less tabular or equal to it, then the existence of a significant difference between the level of the feature in the considered samples is recognized (an alternative hypothesis is accepted). If the resulting value U (\displaystyle U) is greater than the table value, the null hypothesis is accepted. The significance of the differences is higher, the smaller the value of U (\displaystyle U) .
  5. If the null hypothesis is true, the criterion has the expectation M (U) = n 1 ⋅ n 2 2 (\displaystyle M(U)=(\frac (n_(1)\cdot n_(2))(2))) and variance D (U) = n 1 ⋅ n 2 ⋅ (n 1 + n 2 + 1) 12 (\displaystyle D(U)=(\frac (n_(1)\cdot n_(2)\cdot (n_(1)+ n_(2)+1))(12))) and with a sufficiently large amount of sample data (n 1 > 19 , n 2 > 19) (\displaystyle (n_(1)>19,\;n_(2)>19 )) is distributed almost normally.

Table of critical values

  • Calculation of the critical values ​​of the Mann-Whitney U-test for samples greater than 20 (N>20) (downlink from 10-02-2017 )

Mann-Whitney test: example, table

A criterion in mathematical statistics is a strict rule according to which a hypothesis with a certain level of significance is accepted or rejected. To build it, you need to find a certain function. It should depend on the final results of the experiment, that is, on empirically found values. It is this function that will be a tool for assessing the discrepancy between samples.

Statistically significant value. General information

Statistical significance is a quantity that is unlikely to occur by chance. Its more extreme indicators are also insignificant. A difference is said to be statistically significant if there are data that are unlikely to occur if the difference is said not to exist. But this does not mean at all that this difference must necessarily be large and significant.

The level of statistical significance of the test

This term should be understood as the probability of rejecting the null hypothesis if it is true. This is also called a Type I error or a false positive decision. In most cases, the process relies on a p-value ("pi-value"). This is the cumulative probability when observing the level of the statistical criterion. It, in turn, is calculated from the sample at the time of accepting the null hypothesis. The assumption will be rejected if this p-value is less than the level declared by the analyst. The significance of the test value directly depends on this indicator: the smaller it is, the more reason to reject the hypothesis, respectively.
The significance level is usually denoted by the letter b (alpha). Popular indicators among specialists: 0.1%, 1%, 5% and 10%. If, say, it is said that the chances of coincidence are 1 in 1000, then we are definitely talking about the level of 0.1% of the statistical significance of a random variable. Different b-levels have their pros and cons. If the score is lower, then the alternative hypothesis is more likely to be significant. However, there is a risk that the false null guess will not be rejected. It can be concluded that the choice of the optimal b-level depends on the "significance-power" balance or, accordingly, on the trade-off of the probabilities of false positive and false negative decisions. A synonym for "statistical significance" in the domestic literature is the term "reliability".

Null Hypothesis Definition

In mathematical statistics, this is an assumption that is tested for consistency with empirical data already in stock. In most cases, the null hypothesis is the hypothesis that there is no correlation between the variables under study or that there are no differences in homogeneity in the distributions under study. In standard research, a mathematician tries to disprove the null hypothesis, that is, to prove that it is not consistent with experimental data. Moreover, there must be an alternative assumption, which is taken instead of the zero one.

Key Definition

The U criterion (Mann-Whitney) in mathematical statistics allows you to evaluate the differences between two samples. They can be given according to the level of some trait, which is measured quantitatively. This method is ideal for estimating differences in small samples. This simple criterion was proposed by Frank Wilcoxon in 1945. And already in 1947, the method was revised and supplemented by scientists H. B. Mann and D. R. Whitney, whose names it is called to this day. The Mann-Whitney criterion in psychology, mathematics, statistics and many other sciences is one of the fundamental elements of the mathematical substantiation of the results of theoretical research.

Description

The Mann-Whitney test is a relatively simple method with no parameters. Its power is significant. It is significantly higher than the power of the Rosenbaum Q-test. The method evaluates how small the area of ​​cross values ​​between samples, namely between the ranked series of values ​​of the first and second sets. The smaller the criterion value, the more likely it is that the parameter value discrepancies are reliable. To correctly apply the U (Mann-Whitney) criterion, one should not forget about some limitations. Each sample must contain at least 3 feature values. It is possible that in one case there are two values, but in the second case there must be at least five of them. In the studied samples, there should be a minimum number of matching indicators. All numbers should be different ideally.

Usage

How to use the Mann-Whitney test correctly? The table compiled by this method contains certain critical values. The first step is to create a single series from both matched samples, which is then ranked. That is, the elements are lined up according to the degree of growth of the attribute, and a lower rank is assigned to a lower value. As a result, we get the following total number of ranks:

N = N1 + N2,

where the values ​​N1 and N2 are the number of units contained in the first and second samples, respectively. Further, a single ranked series of values ​​is divided into two categories. Units, respectively, from the first and second samples. Now the sum of the ranks of the values ​​in the first and second rows is calculated in turn. The largest of them (Tx) is determined, which corresponds to a sample with nx units. To use the Wilcoxon method further, its value is calculated by the following method. It is necessary to find out from the table for the chosen level of significance the critical value of this criterion for specifically taken N1 and N2.
The resulting indicator can be less than or equal to the value from the table. In this case, a significant difference in the levels of the trait in the studied samples is stated. If the value obtained is greater than the table value, then the null hypothesis is accepted. When calculating the Mann-Whitney test, it should be noted that if the null hypothesis is true, the test will have a mean as well as a variance. Note that for sufficiently large volumes of sample data, the method is considered to be almost normally distributed. The significance of differences is the higher, the lower the value of the Mann-Whitney test.

Values ​​of the Pearson criterion (criterion)

  1. Tables of probabilities associated with the values ​​of the Mann-Whitney test.

Tables of probabilities associated with the values ​​of the Mann-Whitney test. For the experimental value of the criterion (the smaller of the two values) and sample sizes, find the probability that both groups belong to the same general population. Thus, a low probability value, for example, P

    Table 3

  1. Table 4

  2. Table 5

    1. Table 6

  1. Table of critical values ​​of the Mann-Whitney test for the significance level.

If , then the difference between the samples is significant for , that is, the null hypothesis should be rejected.

N 2

N 1

2. U - Mann-Whitney test

The criterion is designed to assess the differences between two samples in terms of the level of any trait, quantitatively measured. It allows you to detect differences between small samples when n1 and n2 are greater than or equal to 3 (or n1 = 2, and n2 is then greater than or equal to 5.)

The method determines if the area of ​​overlapping values ​​between two series is small enough. The smaller this area, the more likely it is that the differences are significant. The empirical (actually obtained) value of the U criterion reflects how large the zone of coincidence between the rows is. The lower Uemp., the more likely it is that the differences are significant.

Hypotheses.

But: The level of the attribute in group 2 is not lower than the level of the attribute in group 1.

H1: The level of the trait in group 2 is lower than the level of the trait in group 1.

Limitations of the U criterion.

1. There must be at least 3 observations in each sample or, in extreme cases, a ratio of 2 to 5 or more is allowed.

2. There should be no more than 60 observations in each sample.

Algorithm for calculating the criterion U - Mann-Whitney.

1. Transfer all sample data to individual cards (on which it will be reflected in color or in some other way which of the samples the value belongs to).

2. Lay out all the cards in a common row as the sign increases, regardless of which sample they belong to.

3. Rank (according to the ranking algorithm) the values ​​on the cards, assigning a lower rank to the lower value. There should be n1 + n2 ranks in total (the size of the first sample + the size of the second sample).

4. Re-arrange the cards in two rows, based on belonging to sample 1 or sample 2.

6. Determine the larger of the two rank sums.

7. Determine the value of U by the formula:

8. Determine from the tables the critical values ​​of U, in accordance with this, accept or reject the hypothesis No.

3. H - Kruskal - Wallis criterion

The H criterion is used to assess differences in the severity of the analyzed trait simultaneously between three, four or more samples. It allows you to identify the degree of change in the trait in the samples, without indicating, however, the direction of these changes.

The criterion is based on the principle that the smaller the overlap of samples, the higher the level of significance. H emp . It should be emphasized that there may be a different number of subjects in the samples, although in the tasks below, an equal number of subjects in the samples is given.

Working with data begins with the fact that all samples are conditionally combined in the order of occurring values ​​into one sample, and the values ​​of this combined sample are ranked. Then the obtained ranks are affixed to the original sample data, and the sum of the ranks is calculated separately for each sample. The criterion is based on the following idea – if the differences between the samples are insignificant, then the sums of ranks will not differ significantly from one another and vice versa.

Value H emp calculated by the formula:

H emp

Where N is the total number of members in the generalized sample;

n i is the number of members in each individual sample;

are the squares of the sums of ranks for each sample.

When determining the critical values ​​of the criterion for four or more samples, use the table for the criterion hee-square, having previously calculated the number of degrees of freedom v for c = 4. Then v = c - 1 = 4 – 1=3..

We emphasize that if we use criteria that allow us to compare only two series of values, then the result obtained above would require six comparisons - the first sample with the second, third, etc.

To use the criterion H the following conditions must be observed:

1. The measurement must be taken on a scale of order, intervals or ratios.

2. Samples must be independent.

3. A different number of subjects in the compared samples is allowed.

4. When comparing three samples, it is allowed that one of them contains n = 3, and in the other two n = 2. However, in this case, the differences can be recorded only at the 5% significance level.

5. Table 9 of the Appendix is ​​provided for only three samples and ( n 1n 2, n H), £ 5, that is, the maximum number of subjects in all three samples can be less than and equal to 5.

6. With a larger number of samples and a different number of subjects in each sample, you should use the table for the criterion hee-square. In this case, the number of degrees of freedom is determined by the formula: v = With - 1, where With - the number of matched samples.

The Mann-Whitney U-test is:

Mann-Whitney U-test

Mann-Whitney U-test

Mann-Whitney U-test(English) Mann-Whitney U-test) is a statistical criterion used to evaluate the differences between two samples in terms of the level of some trait, measured quantitatively. Allows you to detect differences in the value of a parameter between small samples.

Other names: Mann-Whitney-Wilcoxon test Mann-Whitney-Wilcoxon, MWW), the Wilcoxon rank sum test (eng. Wilcoxon rank-sum test) or the Wilcoxon-Mann-Whitney test (eng. Wilcoxon - Mann - Whitney test).

Story

This method of detecting differences between samples was proposed in 1945 by Frank Wilcoxon ( F. Wilcoxon). In 1947 it was substantially revised and expanded by H. B. Mann ( H. B. Mann) and D. R. Whitney ( D. R. Whitney), by whose names it is usually called today.

Description of the criterion

A simple nonparametric test. The power of the test is higher than that of the Rosenbaum Q-test.

This method determines if the area of ​​overlapping values ​​between two series (the ranked series of parameter values ​​in the first sample and the same in the second sample) is small enough. The smaller the criterion value, the more likely it is that the differences between the parameter values ​​in the samples are significant.

Criterion Applicability Limitations

  1. Each of the samples must contain at least 3 feature values. It is allowed that in one sample there are two values, but in the second there are at least five.
  2. There should be no matching values ​​in the sample data (all numbers are different) or there should be very few such matches.

Using a Criterion

To apply the Mann-Whitney U-test, you need to perform the following operations.

  • Automatic calculation of the Mann-Whitney U-test

Table of critical values

  • Table of critical values ​​of the Mann-Whitney U-test
  • Critical Values ​​for the Mann - Whitney U-Test.

see also

  • The Kruskal-Wallis test is a multivariate generalization of the Mann-Whitney U-test.

Literature

  • Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other. // Annals of Mathematical Statistics. - 1947. - No. 18. - P. 50-60.
  • Wilcoxon F. Individual Comparisons by Ranking Methods. // Biometrics Bulletin 1. - 1945. - P. 80-83.
  • Gubler E. V., Genkin A. A. Application of non-parametric statistics criteria in biomedical research. - L., 1973.
  • Sidorenko E.V. Methods of mathematical processing in psychology. - St. Petersburg, 2002.

Wikimedia Foundation. 2010.

  • U-954
  • U-point women

See what the "Mann-Whitney U-test" is in other dictionaries:

    Mann U-test- U test Mann Whitney (eng. Mann Whitney U test) is a statistical test used to assess the differences between two independent samples in terms of the level of any trait, measured quantitatively. Allows you to identify ... ... Wikipedia

    Mann-Whitney U test- (Eng. Mann Whitney U test) non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in the value of a parameter between small ... Wikipedia

    Mann-Whitney test

    Mann-Whitney-Wilcoxon test- The Mann Whitney U test is a non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in meaning ... Wikipedia

    Mann-Whitney-Wilcoxon test- The Mann Whitney U test is a non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in meaning ... Wikipedia

    Mann Whitney test- - Telecommunication topics, basic concepts EN Mann Whitney U test ... Technical translator's guide

    Wilcoxon-Mann-Whitney test- The Mann Whitney U test is a non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in meaning ... Wikipedia

    Wilcoxon-Mann-Whitney test- The Mann Whitney U test is a non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in meaning ... Wikipedia

    Wilcoxon rank sum test- The Mann Whitney U test is a non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in meaning ... Wikipedia

    Wilcoxon rank sum test- The Mann Whitney U test is a non-parametric statistical test used to assess the differences between two samples in terms of the level of any trait, measured quantitatively. Allows you to identify differences in meaning ... Wikipedia

Books

  • Statistics and seals, Vladimir Savelyev. From this book you will learn what variance and standard deviation are, how to find Student's t-test and Mann-Whitney U-test, what regression and factor analyzes are used for, ... More Buy for 280 rubles e-book

The U-criterion is a rank one, so it is invariant with respect to any monotone transformation of the measurement scale.

Other names: Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum test or Wilcoxon-Mann-Whitney test (WMW).

Task examples

Example 1 The first sample is patients who were treated with drug A. The second sample is patients who were treated with drug B. The values ​​in the samples are some characteristics of the effectiveness of treatment (the level of the metabolite in the blood, temperature three days after the start of treatment, the duration of recovery, the number of hospital beds). days, etc.) It is required to find out whether there is a significant difference in the effectiveness of drugs A and B, or the differences are purely random and are explained by the "natural" variance of the selected characteristic.

Example 2 The first sample is the fields treated with cultural method A. The second sample is the fields treated with cultural method B. The values ​​in the samples are the yield. It is required to find out whether one of the methods is more effective than the other, or whether the yield differences are due to random factors.

Example 3 The first sample is the days when a type A promotion (red price tags with a discount) was held in the supermarket. The second sample is the days of the type B promotion (every fifth pack is free). The values ​​in the samples are an indicator of the effectiveness of the promotion (sales volume, or revenue in rubles). It is required to find out which type of promotion is more effective.

Description of the criterion

Two samples are given.

Additional guesses:

It is sometimes mistakenly considered that the U-test tests the null hypothesis of equal medians in two samples. There are distributions for which the hypothesis is true, but their medians are different.

U-criterion can be used to test the shift hypothesis as an alternative , where is some non-zero constant. With this alternative, the U-test is consistent. It is advisable to use it if two series of measurements of two values ​​of a certain physical quantity are carried out with the same instrument. In this case, the distribution function describes the measurement errors of one value, and another. However, in many applications (econometrics in particular) there is no particular reason to assume that the distribution of the second sample only shifts, but does not change in any other way.

The U-test is a non-parametric analogue of Student's t-test. If the samples are normal, then it is preferable to apply the more powerful Student's t-test to test the shift hypothesis.

Story

This method of detecting differences between samples was proposed in 1945 by Frank Wilcoxon. It was substantially revised and expanded in 1947 by Mann and Whitney, by whose names it is commonly referred to today.

Literature

  1. Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other. // Annals of Mathematical Statistics. - 1947, No. 18. - Pp. 50-60.
  2. Wilcoxon F. Individual Comparisons by Ranking Methods. // Biometrics Bulletin 1. 1945. - Pp. 80–83.
  3. Orlov A.I. Econometrics. - M.: Exam, 2003. - 576 p. (§4.5 What hypotheses can be tested using the two-sample Wilcoxon test?)
  4. Kobzar A.I. Applied mathematical statistics. - M.: Fizmatlit, 2006. - 816 p.

Mann-Whitney test represents a non-parametric alternative to the t-test for independent samples. Its advantage is that we abandon the assumption of normal distribution and equal variances. It is essential that the data be measured at least on an ordinal scale.

STATISTICA assumes that the data are arranged in the same way as in and t-tests for independent samples. The file must contain a code (independent) variable that has at least two different codes to uniquely identify whether each observation belongs to a particular group.

Assumptions and interpretation. Mann-Whitney test assumes that the variables in question are measured at least on an ordinal scale (ranked). The interpretation of the test is essentially similar to the interpretation of the results of the t-test for independent samples, except that the U test is calculated as the sum of indicators of pairwise comparison of the elements of the first sample with the elements of the second sample. U test - the most powerful (sensitive) non-parametric alternative t-test for independent samples; in fact, in some cases it is even more powerful than the t-test.

If the sample size is greater than 20, then the sample distribution for the U statistic converges rapidly to a normal distribution (see Siegel, 1956). Therefore, along with the U statistic, the z value (for a normal distribution) and the corresponding p-value will be shown.

Exact probabilities for small samples. For small samples, STATISTICA will calculate the exact probability associated with the corresponding U statistic. This probability is based on counting all possible U values ​​for a given number of observations in two samples (see Dinneen & Blakesley, 1973). The program will report (in the last column of the results table) the value 2 * p, where p is equal to 1 minus the cumulative (one-tailed) probability of the corresponding U statistic. Note that this usually does not lead to a large underestimation of the statistical significance of the relevant effects (see Siegel, 1956).

The test statistic looks like this.

where W- statistics Wilcoxon designed to test the same hypothesis

otherwise

So the statistics U counts the total number of cases in which the elements of the second sample outnumber the elements of the first sample. If the hypothesis is correct, then

Mann-Whitney test assumes that the variables in question are measured at least on an ordinal scale (ranked). The interpretation of the test is essentially the same as the interpretation of the results t-criteria for independent samples, except that the U criterion is calculated as the sum of indicators of pairwise comparison of the elements of the first sample with the elements of the second sample. U test - the most powerful (sensitive) non-parametric alternative t-criteria for independent samples; in fact, in some cases it has even more power than t-criterion.

If the sample size is greater than 20, then the sample distribution for the U statistic converges rapidly to a normal distribution. Therefore, along with the U statistic, the z value (for a normal distribution) and the corresponding p-meaning.

Detailed instructions on how to use the criteria can be found later in the application example section.

Example

Let's test the hypothesis that the compared independent samples belong to the same general population using the nonparametric Mann-Whitney U-test. Let's compare the results obtained in the example Basic Statistics and Student's t-test for the 2nd and 3rd columns of the Student's t-test table with the results of nonparametric comparison.

To calculate the Wilcoxon U-test, we arrange the variants of the compared samples in ascending order into one generalized series and assign ranks from 1 to n1 + n2 to the variants of the generalized series. The first line represents the variants of the first sample, the second - the second sample, the third - the corresponding ranks in the generalized series:





















It should be noted that if there are identical options, they are assigned an average rank, but the value of the last rank should be equal to n1 + n2 (in our case, 20). This rule is used to check if the ranking is correct.

Separately for each sample, we calculate the sum of the ranks of their R1 and R2 variants. In our case:

R1 = 1 + 2.5 + 2.5 + 5 + 5 + 9 + 9 + 9 + 12 + 14 = 69

R2 = 5 + 9 + 9 + 14 + 14 + 17 + 17 +17 + 19.5 + 19.5 = 141

To check the correctness of the calculations, you can use another rule: R1 + R2 = 0.5 * (n1 + n2) * (n1 + n2 + 1). In our case, R1 + R2 = 210.

Statistics U1 = 69 - 10*11/2 = 14; U2 \u003d 141 - 10 * 11/2 \u003d 86.

To test a one-tailed test, we choose the minimum statistic U1 = 14 and compare it with the critical value for n1 = n2 = 10 and the 1% significance level equal to 19.

Since the calculated value of the criterion is less than the tabular value, the null hypothesis is rejected at the chosen significance level, and the differences between the samples are considered statistically significant. Thus, the conclusion about the existence of differences, made using the parametric Student's test, is confirmed using this non-parametric method.