Determine m arithmetic mean by the method of moments. Calculation of the arithmetic mean weighted by the method of moments

With a large number of observations or with a large numerical value, the option is used

a simplified way to calculate the arithmetic mean is the method of moments.

M = A+ iSap

where M is the arithmetic mean; A - conditional average; i - interval between groups option;

S - summation sign.; a - conditional deviation of each option from the conditional average;

p is the frequency of occurrence of the variant; n is the number of observations.

An example of calculating the arithmetic mean by the method of moments (average body weight

boys under the age of 18)

V(n in kg) R a (V-A) a. R
+2 +4
+1 +3
M o \u003d 62
-1 -6
-2 -8
-3 -3
n = 25 Sar \u003d - 10 kg

Stages of calculating the average by the method of moments:

2) we determine "a" - the conditional deviation of the options from the conditional average, for this we subtract the conditional average from each option: a \u003d V - A, (for example, a \u003d 64 - 62 \u003d +2, etc.).

3) we multiply the conditional deviation "a" by the frequency "p" of each option and get the product a p;

4) find the sum Sa. p = - 10kg

5) calculate the arithmetic mean by the method of moments:

M = A + i SaP\u003d 62 - 1 × 0.4 \u003d 61.6 kg

Thus, we can conclude that in the group of young men studied by us, the average body weight

The arithmetic mean by itself says nothing about the variational series from which

she was calculated. Its typicality (reliability) is affected by the homogeneity of the considered

material and series variability.

Example: two variational series identical in number of observations are given, in which

presents the measurement data of the head circumference of children aged 1 to 2 years

Having the same number of observations and the same arithmetic means (M = 46 cm), the series

have differences in distribution within. So the variants of the first row deviate in general from

arithmetic mean with a lower value than the second row options, which gives

possibility to assume that the arithmetic mean (46 cm) is more typical for the first

row than for the second.

In statistics, to characterize the diversity of the variation series, they use the average

standard deviation(s)

There are two ways to calculate the standard deviation: arithmetic mean

way and way of moments. With the arithmetic mean method of calculation, the formula is used:

where d is the true deviation of each option from the true mean M. The formula is used when

a small number of observations (n<30)

Formula for determining s by the method of moments:

where a is the conditional deviation of the options from the conditional average;

The moment of the second degree, and the moment of the first degree, squared.

It has been theoretically and practically proved that if, with a large number of observations, to the average

arithmetic add and subtract from it 1s (M ± 1s), then within the obtained values

68.3% of all variants of the variation series will be found. If to the arithmetic mean

add and subtract 2s (M ± 2s), then 95.5% will be within the obtained values

all option. M ±3s includes 99.7% of all variants of the variation series.

Based on this provision, it is possible to check the typicality of the arithmetic mean for

variational series from which it was calculated. For this, it is necessary to the average

add arithmetic and subtract three times s (M ± 3s) from it. If within the limits

given variational series fits, then the arithmetic mean is typical, i.e. she is

expresses the basic regularity of the series and it can be used.

This provision is widely used in the development of various standards (clothes,

shoes, school furniture, etc.).

Degree of diversity trait in the variational series can be estimated by coefficient

variations(the ratio of the standard deviation to the arithmetic mean,

multiplied by 100%)

With v = s x 100

At C v less than 10%, a weak diversity is noted, at C v 10-20% - average, and at more than 20% -

strong trait diversity.

Assessment of the reliability of the results of a statistical study

As we have said, the most reliable results can be obtained by applying

continuous method i.e. when studying the general population.

Meanwhile, the study of the general population is associated with significant laboriousness.

Therefore, in biomedical research, as a rule, selective

observations. So that the data obtained from the study of the sample population can be

was transferred to the general population, it is necessary to assess the reliability

the results of a statistical study. The sampling frame may not be enough

fully represent the population, so sample observations are always

accompanied by a representativeness error. By the size of the mean error (m), one can judge

how the found sample mean differs from the general mean

aggregates. A small error indicates the closeness of these indicators, a large error such

does not give confidence.

The value of the average error of the arithmetic mean is affected by the following two circumstances.

First, the homogeneity of the collected material: the smaller the dispersion of the variant around

its mean, the smaller the error of representativeness. Second, the number of observations:

the average error will be the smaller, the larger the number of observations.

The average error of the arithmetic mean is calculated using the following formula:

The average error (error of representativeness) for relative values ​​is determined by

formula:

where m p is the average error of the indicator;

p - indicator in% or in% o

q - (100 -p), (1000 -p)

n - total number of observations

289 patients left the medical institution, 12 of them died.

Relative value (mortality rate) p = (12:289)x100 = 4.1%; q=100 -p=

100-4.1 \u003d 95.9, from where

m p = ±

Thus, the relative value upon re-examination will correspond to

Confidence boundaries is the maximum and minimum value within which

for a given degree of probability of an error-free forecast, there can be a relative

indicator or average in the general population

Confidence limits of the relative value in the general population are determined by

P gene = P sample ± tm m

The confidence limits of the arithmetic mean in the general population are determined by the formula:

M gene = M select ± tm m

where P gene and M gene are the relative and average values ​​obtained for the general

aggregates.

P vyb and M vyb - the values ​​of the relative and average values ​​obtained for the sample population.

m p and m m - representativeness error for average and relative values.

t - reliability criterion.

It is established that if t= 1, the reliability does not exceed 68%; if t=2 -95%; if t=3- 99%

In medical and biological research, it is considered sufficient if the criterion

confidence t ³ 2 (95% confidence)

To find the criterion t for the number of observations £ 30, it is necessary to use a special

table

As the size of the error of representativeness decreases, the confidence limits decrease.

average and relative values, i.e. the results of the study are specified, approaching

the corresponding values ​​of the general population. If the representativeness error

large, then get large confidence limits, which may contradict

logical assessment of the desired value in the general population. Confidence boundaries

also depend on the degree of probability of an error-free forecast chosen by the researcher. At

a high degree of probability of an error-free forecast range of confidence limits

M cf - calculated using the method of moments = 61.6 kg

The arithmetic mean has three properties.

1. The middle one occupies the middle position in the variation series . In a strictly symmetrical row: M \u003d M 0 \u003d M e.

2. The average is a generalizing value and random fluctuations, differences in individual data are not visible behind the average, it reveals that typical that is characteristic of the entire population . The average is used whenever it is necessary to exclude the random influence of individual factors, to identify common features, existing patterns, to get a complete and deep idea of ​​the most common and characteristic features of the entire group.

3. The sum of the deviations of all options from the mean is zero : S(V-M)=0 . This is because the average value exceeds the dimensions of some variants and is smaller than the dimensions of other variants.

In other words, the true deviation of the variant from the true mean (d=v-m) can be positive or negative, so the sum S all "+"d and "-"d is equal to zero.

This property of the average is used when checking the correctness of calculations M. If the sum of the deviations of the variant from the mean is zero, then we can conclude that the mean is calculated correctly. This property is based on the method of moments for determining M. After all, if the conditional average BUT will be equal to true M, then the sum of the deviations of the variant from the conditional mean will be equal to zero.

The role of averages in biology is extremely great. On the one hand, they are used to characterize phenomena as a whole, on the other hand, they are necessary to evaluate individual quantities. When comparing individual values ​​with averages, valuable characteristics are obtained for each of them. The use of averages requires strict adherence to the principle of population homogeneity. Violation of this principle distorts the idea of ​​real processes.

The calculation of averages from a socio-economically heterogeneous population makes them fictitious, distorted. Therefore, in order to use averages correctly, one must be sure that they characterize homogeneous statistical populations.

CHARACTERISTICS OF THE DIVERSITY OF SIGN B

STATISTICAL POPULATION

The value of this or that feature is not the same for all members of the population, despite its relative homogeneity. For example, in a group of children that is homogeneous in age, sex, and place of residence, the height of each child differs from the height of their peers. The same can be said about the number of visits made by individuals to the polyclinic, about the level of blood protein in each patient with rheumatism, about the level of blood pressure in individuals with hypertension, etc. This shows the diversity, fluctuation of the sign in the studied population. Variability can be defiantly represented by the example of growth in groups of adolescents.



Statistics allows us to characterize this with special criteria that determine the level of diversity of each feature in a particular group. These criteria include limit (lim), series amplitude (Am), standard deviation (s) and coefficient of variation (C v). Since each of these criteria has its own independent value, it is necessary to dwell on them separately.

Limit- determined by the extreme values ​​​​of the variant in the variation series

Amplitude (Am) - difference of extremes

Limit and amplitude - give some information about the degree of diversity of growth in each group. However, both the limit and the amplitude of the series have one significant drawback. They take into account only the diversity of extreme variants and do not allow obtaining information about the diversity of a trait in the aggregate, taking into account its internal structure. The fact is that diversity is manifested not so much in extreme variants as in the analysis of the entire internal structure of the group. Therefore, these criteria can be used for an approximate characterization of diversity, especially with a small number of observations (n<30).

The most complete description of the diversity of a trait in the aggregate is given by the so-called standard deviation, denoted by the Greek letter "sigma" -s.

There are two ways to calculate the standard deviation: arithmetic mean and method of moments.

With the arithmetic mean method of calculation, a formula is used where d- true deviation of the variant from the true mean (V-M).

The formula is used with a small number of observations (n<30), когда в вариационном ряду все частоты p= 1.

At R> 1 use a formula like this:

In the presence of computer technology, this formula is also used for a large number of observations.

This formula is designed to determine "sigma" by the method of moments:

where:a- conditional deviation from the conditional average ( V-A); p- frequency of occurrence for variants; n- number option; i- the size of the interval between groups.

This method is used in cases where there is no computer technology, and the variational series is cumbersome both due to the large number of observations and due to the variant expressed in multi-valued numbers. With the number of observations equal to 30 or less, at the moment of the second degree P replace for (P-1).

As can be seen from the formula for the standard deviation (4), the denominator is ( P-1), i.e. when the number of observations is equal to or less than 30 (n £ 30), it is necessary to take in the denominator of the formula ( P-one). If, when determining the arithmetic mean M take into account all the elements of the series, then, calculating a, it is necessary to take not all cases, but one less (n-1).

With a large number of observations (n>30), the denominator of the formula is P, So as a unit does not change the results of the calculation and is therefore automatically omitted.

It should be noted that the standard deviation is a named value, so it must have a designation common to the variant and the arithmetic mean (dimension - kg, see km, etc.).

The calculation of the standard deviation by the method of moments is carried out after the calculation of the average value.

There is another criterion that characterizes the level of diversity of the trait values ​​in the aggregate, - coefficient of variation.

Coefficient of variation (Cv)- is a relative measure of diversity, as it is calculated as a percentage of the standard deviation (a) to arithmetic mean (M). The formula for the coefficient of variation is:

For an approximate assessment of the degree of diversity of a trait, the following gradations of the coefficient of variation are used. If the coefficient is more than 20%, then a strong diversity is noted; at 20-10% - the average, and if the coefficient is less than 10%, then it is considered that the diversity is weak.

The coefficient of variation is used when comparing the degree of diversity of features that have differences in the size of features or their unequal dimensions. Suppose you want to compare the degree of diversity in body weight in newborns and 5-year-old children. It is clear that newborns will always have less "sigma" than seven-year-old children, since their individual weight is less. The standard deviation will be smaller where the value of the feature itself is smaller. In this case, to determine the difference in the degree of diversity, it is necessary to focus not on the standard deviation, but on the relative measure of diversity - the coefficient of variation Сv.

The coefficient of variation is also of great importance for assessing and comparing the degree of diversity of several features with different dimensions. By the standard deviation it is still impossible to judge the difference in the degree of diversity of the indicated characters. To do this, you need to use the coefficient of variation - Cv.

The standard deviation is related to the structure of the feature distribution series. Schematically, this can be represented as follows.

The theory of statistics has proved that with a normal distribution within M ± s there are 68% of all cases, within M ± 2s - 95.5% of all cases, and within M ± 3s - 99.7% of all cases that make up the population. Thus, M±3s covers almost the entire variational series.

This theoretical position of statistics on the regularities of the structure of a series is of great importance for the practical application of the standard deviation. You can use this rule to clarify - the question of the typicality of the average. If 95% of all variants are within M ± 2s, then the average - is characteristic for this series and it is not required to increase the number of observations in the aggregate. To determine the typicality of the mean, the actual distribution is compared with the theoretical one by calculating the sigma deviations.

The practical significance of the standard deviation also lies in the fact that knowing M and s, it is possible to construct the necessary variational series for practical use. Sigma ( s) are also used to compare the degree of diversity of homogeneous characteristics, for example, when comparing fluctuations (variability) in the growth of children in urban and rural areas. Knowing sigma ( s), it is possible to calculate the coefficient of variation (Cv) necessary to compare the degree of diversity of features expressed in different units of measurement (centimeters, kilograms, etc.). This allows you to identify more stable (permanent) and less stable signs in the aggregate.

Comparing the coefficients of variation (Cv), it is possible to draw conclusions about what is the most stable feature in the totality of features. Standard deviation (s) It is also used to evaluate individual features of one object. The standard deviation indicates how many sigma ( s) from the average (M) individual measurements are rejected.

Standard deviation ( s) can be used in biology and ecology in the development of problems of norm and pathology.

Finally, the standard deviation is an important component of the formula t m- mean error of the arithmetic mean (error of representativeness):

where t m- average error of the arithmetic mean (error of representativeness), P- number of observations.

Representativeness. The most important theoretical foundations of representativeness were highlighted above in the section on sampling and general population. Representativeness means the representativeness in the sample set of all considered characteristics (gender, age, profession, length of service, etc.) of the units of observation that make up the general population. This representativeness of the sample population in relation to the general population is achieved with the help of special selection methods, which are described below.

The assessment of the reliability of the results of the study is based on the theoretical foundations of representativeness.

RELIABILITY ASSESSMENT OF RESEARCH RESULTS

The reliability of statistical indicators should be understood as the degree of their compliance with the reality they reflect. Reliable results are those that do not distort and correctly reflect the objective reality.

To assess the reliability of the results of the study means to determine with what probability it is possible to transfer the results obtained on the sample population to the entire population.

In most studies, the researcher, as a rule, has to deal with a part of the phenomenon under study, and transfer the conclusions based on the results of such a study to the entire phenomenon as a whole - to the general population.

Thus, an assessment of reliability is necessary in order to judge the phenomenon as a whole, its regularities, by part of the phenomenon.

The assessment of the reliability of the results of the study involves the determination of:

1) representativeness errors (average errors of arithmetic means and relative values) - t;

2) confidence limits of average (or relative) values;

3) the reliability of the difference between the average (or relative) values
(according to the criterion
t );

4) the reliability of the difference between the compared groups according to the criterionc 2 .

1. Determination of the average error of the mean (or relative) value (representativeness error) - i.e.

Representative error ( m) is the most important statistical value needed to assess the reliability of the results of the study. This error occurs in those cases when it is required to characterize the phenomenon as a whole in part. These mistakes are inevitable. They stem from the nature of sampling; the general population can be characterized by the sample population only with some error, measured by the representativeness error.

Representativeness errors should not be confused with the usual idea of ​​errors: methodological, measurement accuracy, arithmetic, etc.

The magnitude of the error of representativeness determines how much the results obtained during selective observation differ from the results that could be obtained by conducting a continuous study of all elements of the general population without exception.

This is the only type of errors accounted for by statistical methods, which cannot be eliminated unless a transition to continuous study is made. Representativeness errors can be reduced to a fairly small value, i.e., to the value of the permissible error. This is done by including a sufficient number of observations in the sample. (P).

Each average is M(average duration of treatment, average height, average body weight, average blood protein level, etc.), as well as each relative value - R(mortality rate, morbidity, etc.) should be presented with their average error - t. Thus, the arithmetic mean of the sample (M) has a representativeness error, which is called the average error of the arithmetic mean (m m) and is determined by the formula:

As can be seen from this formula, the value of the average error of the arithmetic mean is directly proportional to the degree of diversity of the feature and inversely proportional to the square root of the number of observations. Therefore, a decrease in the magnitude of this error in determining the degree of diversity ( s) is possible by increasing the number of observations.

This principle is the basis for the method of determining a sufficient number of observations for a sample study.

Relative values (R), obtained in a sample study also have their own representativeness error, which is called the average error of the relative value and is denoted m p

To determine the average error of the relative value (R) the following formula is used:

where R- relative value. If the indicator is expressed as a percentage, then q=100-P, if R- in ppm, then q=1000-P, if R- in decimils, then q= 10000-R etc.; P- number of observations. When the number of observations is less than 30, the denominator should be taken ( P - 1 ).

Each arithmetic mean or relative value obtained from a sample population must be presented with its own mean error. This makes it possible to calculate the confidence limits of the average and relative values, as well as to determine the reliability of the difference between the compared indicators (research results).

There are three types of averages: mode (M0), median (Me), arithmetic mean (M).

They cannot replace each other, and only in the aggregate, quite fully and in a concise form, are the features of the variational series.

Fashion (Mo)- the most frequently occurring in the variant distribution series. It gives an idea of ​​the distribution center of the variation series. Used:

To determine the distribution center in open variation series

To determine the average level in rows with a sharply asymmetric distribution

Median- this is the middle option, the central member of the ranked series. The name median is taken from geometry, where this is the name of the line dividing the side of the triangle into two equal parts.

The median is applied:

To determine the average level of a feature in numerical series with unequal intervals in groups

To determine the average level of a feature, when the source data are presented as qualitative features and when the only way to indicate a certain center of gravity of the population is to indicate the variant (variant group) that occupies a central position

When calculating some demographic indicators (average life expectancy)

When determining the most rational location for health facilities, communal facilities, etc. (meaning taking into account the optimal distance of institutions from all service facilities)

At present, various surveys (marketing, sociological, etc.) are very common, in which respondents are asked to give points to products, politicians, etc. Then, average points are calculated from the obtained estimates and considered as integral marks given by the group of respondents. In this case, the arithmetic mean is usually used to determine the average. However, this method cannot actually be used. In this case, it is reasonable to use the median or mode as the mean scores.

To characterize the average level of a trait, the arithmetic mean (M) is most often used in medicine.

Arithmetic mean - this is a general quantitative characteristic of a certain feature of the studied phenomena, constituting a qualitatively homogeneous statistical aggregate.

Distinguish between simple arithmetic mean and weighted mean.

The simple arithmetic mean is calculated for an ungrouped variation series by summing all the options and dividing this sum by the total number of options included in the variation series.

The simple arithmetic mean is calculated by the formula:

M - arithmetic weighted average,

∑Vp is the sum of products of a variant and their frequencies,

n is the number of observations.

In addition to the specified method of direct calculation of the weighted arithmetic average, there are other methods, in particular, the method of moments in which arithmetic calculations are somewhat simplified.

The calculation of the arithmetic mean of moments is carried out according to the formula:

M = A + ∑dp
n

A - conditional average (most often, the M0 mode is taken as a conditional average)

d - deviation of each option from the conditional average (V-A)

∑dp is the sum of the products of deviations and their frequency.

The order of calculation is presented in the table (we take M0 = 76 beats per minute as a conditional average).

pulse rate V R d(V-A) dp
-16 -16
-14 -28
-12 -36
-10 -30
-8 -24
-6 -54
-4 -24
-2 -14
n=54 | ∑dp=-200

where i is the interval between groups.

The order of calculation is presented in table. (for the conditional average we take M 0 = 73 beats per minute, where i = 3)

Determination of the arithmetic mean by the method of moments

n=54 ∑dp=-13

M = A + ∑dp = 73+ -13*3 \u003d 73 - 0.7 \u003d 72.3 (beats per minute
n

Thus, the value of the arithmetic mean obtained by the method of moments is identical to that found in the usual way.

Methods for calculating the arithmetic mean (simple and weighted arithmetic mean, by the method of moments)

We determine the average values:

Mode (Mo) \u003d 11, because this variant occurs most often in the variation series (p=6).

Median (Me) - the serial number of the variant occupying the middle position = 23, this place in the variation series is occupied by the variant equal to 11. The arithmetic mean (M) allows you to most fully characterize the average level of the trait under study. To calculate the arithmetic mean, two methods are used: the arithmetic mean method and the method of moments.

If the frequency of occurrence of each variant in the variation series is equal to 1, then the simple arithmetic mean is calculated using the arithmetic mean method: M = .

If the frequency of occurrence of the variant in the variation series differs from 1, then the weighted arithmetic mean is calculated, according to the arithmetic mean method:

According to the method of moments: A - conditional average,

M = A + =11 += 10.4 d=V-A, A=Mo=11

If the number of options in the variation series is more than 30, then a grouped series is built. Building a grouped series:

1) determination of Vmin and Vmax Vmin=3, Vmax=20;

2) determination of the number of groups (according to the table);

3) calculation of the interval between groups i = 3;

4) determination of the beginning and end of groups;

5) determination of frequency variant of each group (Table 2).

table 2

Technique for constructing a grouped series

Duration

treatment in days

n=45 p=480 p=30 2 p=766

The advantage of a grouped variational series is that the researcher does not work with every variant, but only with the variants that are average for each group. This makes it much easier to calculate the average.

The value of this or that feature is not the same for all members of the population, despite its relative homogeneity. This feature of the statistical population is characterized by one of the group properties of the general population - trait diversity. For example, let's take a group of 12 year old boys and measure their height. After the calculations, the average level of this trait will be 153 cm. But the average characterizes the general measure of the studied trait. Among the boys of this age there are boys whose height is 165 cm or 141 cm. The more boys who have a height other than 153 cm, the greater the diversity of this trait in the statistical population.

Statistics allows us to characterize this property by the following criteria:

limit (lim),

amplitude (Amp),

standard deviation ( y) ,

coefficient of variation (Cv).

Limit (lim) is determined by the extreme values ​​​​of the variant in the variation series:

lim=Vmin /Vmax

Amplitude (Amp) - difference of extreme options:

Amp=Vmax -Vmin

These values ​​take into account only the diversity of the extreme options and do not allow obtaining information about the diversity of the trait in the aggregate, taking into account its internal structure. Therefore, these criteria can be used for an approximate characterization of diversity, especially with a small number of observations (n<30).

variation series medical statistics

Calculations of the arithmetic mean can be cumbersome if the options (feature values) and weights have very large or very small values ​​and the calculation process itself becomes difficult. Then, for ease of calculation, a number of properties of the arithmetic mean are used:

1) if you reduce (increase) all options by any arbitrary number BUT, then the new average will decrease (increase) by the same number BUT, i.e. will change to ± BUT;

2) if we reduce all options (feature values) by the same number of times ( To), then the average will decrease by the same amount, and with an increase in ( To) times - will increase in ( To) once;

3) if we reduce or increase the weights (frequencies) of all variants by some constant number BUT, then the arithmetic mean will not change;

4) the sum of the deviations of all options from the total average is zero.

The listed properties of the arithmetic mean allow, if necessary, to simplify calculations by replacing the absolute frequencies with relative ones, to reduce the options (feature values) by any number BUT, reduce them to To times and calculate the arithmetic mean of the reduced version, and then move on to the mean of the original series.

The method of calculating the arithmetic mean using its properties is known in statistics as "conditional zero method", or "conditional average", or how "method of moments".

Briefly, this method can be written as a formula

If the reduced variants (character values ​​) are denoted by , then the above formula can be rewritten as .

When using a formula to simplify the calculation of the arithmetic mean weighted interval series when determining the value of any number BUT use such methods of its definition.

Value BUT is equal to the value:

1) the first value of the average value of the interval (we will continue on the example of the problem, where million dollars, and .

Calculation of the average of the reduced option

Intervals Interval mean Number of factories f Work
Up to 2 1,5 0 (1,5–1,5)
2–3 2,5 1 (2,5–1,5)
3–4 3,5 2 (3,5–1,5)
4–5 4,5 3 (4,5–1,5)
5–6 5,5 4 (5,5–1,5)
Over 6 6,5 5 (6,5–1,5)
Total: 3,7

,

2) value BUT we take equal to the value of the average value of the interval with the highest frequency of repetitions, in this case BUT= 3.5 at ( f= 30), or the value of the middle variant, or the largest variant (in this case, the largest value of the feature X= 6.5) and divided by the interval size (1 in this example).

Calculation of the average at BUT = 3,5, f = 30, To= 1 in the same example.

Calculation of the average method of moments

Intervals Interval mean Number of factories f Work
Up to 2 1,5 (1,5 – 3,5) : 1 = –2 –20
2–3 2,5 (2,5 – 3,5) : 1 = –1 –20
3–4 3,5 (3,5 – 3,5) : 1 = 0
4–5 4,5 (4,5 – 3,5) : 1 = 1
5–6 5,5 (5,5 – 3,5) : 1 = 2
Over 6 6,5 (6,5 – 3,5) : 1 = 3
Total: 3,7

; ; ;

The method of moments, conditional zero or conditional average is that with the reduced method of calculating the arithmetic mean, we choose such a moment that in the new series one of the values ​​​​of the feature , i.e., we equate and from here we select the value BUT and To.

It must be kept in mind that if XBUT) : To, where To is the equal value of the interval, then the new variants obtained form in the equal-interval series series of natural numbers (1, 2, 3, etc.) positive downwards and negative upwards from zero. The arithmetic mean of these new variants is called the moment of the first order and is expressed by the formula

.

To determine the value of the arithmetic mean, you need to multiply the value of the moment of the first order by the value of that interval ( To), by which we divide all options, and add to the resulting product the value of options ( BUT) that was read.

;

Thus, using the method of moments or conditional zero, it is much easier to calculate the arithmetic mean from the variational series, if the series is equal-interval.

Fashion

Mode is the value of a feature (variant) that is most frequently repeated in the studied population.

For discrete distribution series, the mode will be the value of the variants with the highest frequency.

Example. When determining the plan for the production of men's shoes, the factory studied consumer demand based on the results of the sale. The distribution of shoes sold was characterized by the following indicators:

Shoes of size 41 were in the greatest demand and accounted for 30% of the sold quantity. In this distribution series M 0 = 41.

For interval distribution series with equal intervals, the mode is determined by the formula

.

First of all, it is necessary to find the interval in which the mode is located, i.e., the modal interval.

In a variational series with equal intervals modal spacing is determined by the highest frequency, in series with unequal intervals - by the highest distribution density, where: - the value of the lower boundary of the interval containing the mode; is the frequency of the modal interval; - the frequency of the interval preceding the modal, i.e. premodal; - the frequency of the interval following the modal, i.e. post-modal.

An example of calculating the mode in an interval series

The grouping of enterprises according to the number of industrial and production personnel is given. Find fashion. In our problem, the largest number of enterprises (30) has a group with 400 to 500 employees. Therefore, this interval is the modal interval of the evenly spaced propagation series. Let us introduce the following notation:

Substitute these values ​​into the mode calculation formula and calculate:

Thus, we have determined the value of the modal value of the attribute contained in this interval (400–500), i.e. M 0 = 467 people

In many cases, when characterizing the population as a generalizing indicator, preference is given to fashion, not the arithmetic mean. So, when studying prices in the market, it is not the average price for a certain product that is fixed and studied in dynamics, but the modal one. When studying the demand of the population for a certain size of shoes or clothes, it is of interest to determine the modal number, and not the average size, which does not matter at all. If the arithmetic mean is close in value to the mode, then it is typical.

TASKS FOR SOLUTION

Task 1

At the variety seed station, when determining the quality of wheat seeds, the following determination of seeds was obtained by the percentage of germination:

Define fashion.

Task 2

When registering prices during the busiest trading hours, individual sellers recorded the following actual selling prices (USD per kg):

Potato: 0.2; 0.12; 0.12; 0.15; 0.2; 0.2; 0.2; 0.15; 0.15; 0.15; 0.15; 0.12; 0.12; 0.12; 0.15.

Beef: 2; 2.5; 2; 2; 1.8; 1.8; 2; 2.2; 2.5; 2; 2; 2; 2; 3; 3; 2.2; 2; 2; 2; 2.

What prices for potatoes and beef are modal?

Task 3

There is data on the wages of 16 workshop mechanics. Find the modal value of wages.

In dollars: 118; 120; 124; 126; 130; 130; 130; 130; 132; 135; 138; 140; 140; 140; 142; 142.

Median Calculation

In statistics, the median is the variant located in the middle of the variation series. If the discrete distribution series has an odd number of series members, then the median will be the variant located in the middle of the ranked series, i.e. add 1 to the sum of frequencies and divide everything by 2 - the result will give the ordinal number of the median.

If there is an even number of options in the variational series, then the median will be half the sum of the two middle options.

To find the median in the interval variation series, we first determine the median interval for the accumulated frequencies. Such an interval will be one whose cumulative (cumulative) frequency is equal to or exceeds half the sum of the frequencies. Accumulated frequencies are formed by gradual summation of frequencies, starting from the interval with the lowest value of the attribute.

Calculation of the median in the interval variation series

Intervals Frequencies ( f) Cumulative (accumulated) frequencies
60–70 10 (10)
70–80 40 (10+30)
80–90 90 (40+50)
90–100 15 (90+60)
100–110 295 (150+145)
110–120 405 (295+110)
120–130 485 (405+80)
130–140 500 (485+15)
Sum: f = 500

Half the sum of the accumulated frequencies in the example is 250 (500:2). Therefore, the median interval will be an interval with a feature value of 100–110.

Before this interval, the sum of the accumulated frequencies was 150. Therefore, in order to obtain the value of the median, it is necessary to add another 100 units (250 - 150). When determining the value of the median, it is assumed that the value of the feature within the boundaries of the interval is distributed evenly. Therefore, if 145 units in this interval are distributed evenly in the interval, equal to 10, then 100 units will correspond to the value:

10: 145 ´ 100 = 6.9.

Adding the obtained value to the minimum boundary of the median interval, we obtain the desired value of the median:

Or the median in the variational interval series can be calculated by the formula:

,

where is the value of the lower boundary of the median interval (); – the value of the median interval ( =10); – the sum of the frequencies of the series (the number of the series is 500); is the sum of accumulated frequencies in the interval preceding the median one ( = 150); is the frequency of the median interval ( = 145).