The numerical value of the variation series is called. Variation series

variational called distribution series built on a quantitative basis. The values ​​of quantitative characteristics in individual units of the population are not constant, more or less differ from each other.

Variation- fluctuation, variability of the value of the attribute in units of the population. Separate numerical values ​​of the trait occurring in the studied population are called options values. The insufficiency of the average value for a complete characterization of the population makes it necessary to supplement the average values ​​with indicators that make it possible to assess the typicality of these averages by measuring the fluctuation (variation) of the trait under study.

The presence of variation is due to the influence of a large number of factors on the formation of the trait level. These factors act with unequal force and in different directions. Variation indicators are used to describe the measure of trait variability.

Tasks of the statistical study of variation:

  • 1) the study of the nature and degree of variation of signs in individual units of the population;
  • 2) determination of the role of individual factors or their groups in the variation of certain features of the population.

In statistics, special methods for studying variation are used, based on the use of a system of indicators, with by which variation is measured.

The study of variation is essential. The measurement of variations is necessary when conducting sample observation, correlation and variance analysis, etc. Ermolaev O.Yu. Mathematical statistics for psychologists: Textbook [Text] / O.Yu. Ermolaev. - M.: Flint Publishing House of the Moscow Psychological and Social Institute, 2012. - 335p.

According to the degree of variation, one can judge the homogeneity of the population, the stability of individual values ​​of features and the typicality of the average. On their basis, indicators of the closeness of the relationship between the signs, indicators for assessing the accuracy of selective observation are developed.

There is variation in space and variation in time.

Variation in space is understood as the fluctuation of the values ​​of a feature in units of the population representing separate territories. Under the variation in time is meant the change in the values ​​of the attribute in different periods of time.

To study the variation in the distribution series, all variants of the attribute values ​​are arranged in ascending or descending order. This process is called series ranking.

The simplest signs of variation are minimum and maximum- the smallest and largest value of the attribute in the aggregate. The number of repetitions of individual variants of feature values ​​is called the frequency of repetition (fi). It is convenient to replace frequencies with frequencies - wi. Frequency - a relative indicator of frequency, which can be expressed in fractions of a unit or a percentage and allows you to compare variation series with a different number of observations. Expressed by the formula:

where Xmax, Xmin - the maximum and minimum values ​​of the attribute in the aggregate; n is the number of groups.

To measure the variation of a trait, various absolute and relative indicators are used. The absolute indicators of variation include the range of variation, the average linear deviation, variance, standard deviation. The relative indicators of fluctuation include the coefficient of oscillation, the relative linear deviation, the coefficient of variation.

An example of finding a variation series

Exercise. For this sample:

  • a) Find a variation series;
  • b) Construct the distribution function;

No.=42. Sample items:

1 5 1 8 1 3 9 4 7 3 7 8 7 3 2 3 5 3 8 3 5 2 8 3 7 9 5 8 8 1 2 2 5 1 6 1 7 6 7 7 6 2

Decision.

  • a) construction of a ranked variational series:
    • 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 4 5 5 5 5 5 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 9 9
  • b) construction of a discrete variational series.

Let's calculate the number of groups in the variation series using the Sturgess formula:

Let's take the number of groups equal to 7.

Knowing the number of groups, we calculate the value of the interval:

For the convenience of constructing the table, we will take the number of groups equal to 8, the interval will be 1.

Rice. one The volume of sales of goods by the store for a certain period of time

The grouping method also allows you to measure variation(variability, fluctuation) of signs. With a relatively small number of population units, the variation is measured on the basis of a ranked series of units that make up the population. The row is called ranked if the units are arranged in ascending (descending) feature.

However, ranked series are rather indicative when a comparative characteristic of variation is needed. In addition, in many cases one has to deal with statistical aggregates consisting of a large number of units, which are practically difficult to represent in the form of a specific series. In this regard, for the initial general acquaintance with statistical data and especially to facilitate the study of the variation of signs, the studied phenomena and processes are usually combined into groups, and the results of the grouping are drawn up in the form of group tables.

If there are only two columns in the group table - groups according to the selected feature (options) and the number of groups (frequencies or frequencies), it is called near distribution.

Distribution range - the simplest type of structural grouping according to one attribute, displayed in a group table with two columns containing variants and frequencies of the attribute. In many cases, with such a structural grouping, i.e. with the compilation of distribution series, the study of the initial statistical material begins.

A structural grouping in the form of a distribution series can be turned into a true structural grouping if the selected groups are characterized not only by frequencies, but also by other statistical indicators. The main purpose of distribution series is to study the variation of features. The theory of distribution series is developed in detail by mathematical statistics.

The distribution series are divided into attributive(grouping by attributive characteristics, for example, the division of the population by sex, nationality, marital status, etc.) and variational(grouping by quantitative characteristics).

Variation series is a group table that contains two columns: a grouping of units according to one quantitative attribute and the number of units in each group. The intervals in the variation series are usually formed equal and closed. The variation series is the following grouping of the Russian population in terms of average per capita cash income (Table 3.10).

Table 3.10

Distribution of Russia's population by average per capita income in 2004-2009

Population groups by average per capita cash income, rub./month

Population in the group, in % of the total

8 000,1-10 000,0

10 000,1-15 000,0

15 000,1-25 000,0

Over 25,000.0

All population

Variational series, in turn, are divided into discrete and interval. Discrete variation series combine variants of discrete features that vary within narrow limits. An example of a discrete variation series is the distribution of Russian families according to the number of children they have.

Interval variational series combine variants of either continuous features or discrete features that change over a wide range. The interval series is the variational series of the distribution of the Russian population in terms of average per capita cash income.

Discrete variational series are not used very often in practice. Meanwhile, compiling them is not difficult, since the composition of the groups is determined by the specific variants that the studied grouping characteristics actually possess.

Interval variational series are more widespread. In compiling them, the difficult question arises of the number of groups, as well as the size of the intervals that should be established.

The principles for resolving this issue are set out in the chapter on the methodology for constructing statistical groupings (see paragraph 3.3).

Variation series are a means of collapsing or compressing diverse information into a compact form; they can be used to make a fairly clear judgment about the nature of the variation, to study the differences in the signs of the phenomena included in the set under study. But the most important significance of the variational series is that on their basis the special generalizing characteristics of the variation are calculated (see Chapter 7).

The concept of a variation series. The first step in systematizing the materials of statistical observation is counting the number of units that have one or another feature. Having arranged the units in ascending or descending order of their quantitative attribute and counting the number of units with a specific attribute value, we obtain a variation series. The variation series characterizes the distribution of units of a certain statistical population according to some quantitative attribute.

The variation series consists of two columns, the left column contains the values ​​of the variable attribute, called variants and denoted by (x), and the right column contains absolute numbers showing how many times each variant occurs. The values ​​in this column are called frequencies and are denoted by (f).

Schematically, the variation series can be represented in the form of Table 5.1:

Table 5.1

Type of variation series

Options (x)

Frequencies (f)

In the right column, relative indicators characterizing the proportion of the frequency of individual variants in the total amount of frequencies can also be used. These relative indicators are called frequencies and are conventionally denoted by , i.e. . The sum of all frequencies is equal to one. Frequencies can also be expressed as a percentage, and then their sum will be equal to 100%.

Variable signs can be of a different nature. Variants of some signs are expressed in integers, for example, the number of rooms in an apartment, the number of published books, etc. These signs are called discontinuous, or discrete. Variants of other features can take on any values ​​within certain limits, such as the fulfillment of planned targets, wages, etc. These features are called continuous.

Discrete variation series. If the variants of the variational series are expressed as discrete values, then such a variational series is called discrete, its appearance is presented in Table. 5.2:

Table 5.2

Distribution of students by grades obtained in the exam

Ratings (x)

Number of students (f)

In % of total ()

The nature of the distribution in discrete series is depicted graphically as a distribution polygon, Fig.5.1.

Rice. 5.1. Distribution of students by grades obtained in the exam.

Interval variation series. For continuous features, variation series are constructed as interval series, i.e. feature values ​​in them are expressed as intervals "from and to". In this case, the minimum value of a feature in such an interval is called the lower limit of the interval, and the maximum value is called the upper limit of the interval.

Interval variational series are built both for discontinuous features (discrete) and for those varying in a large range. Interval rows can be with equal and unequal intervals. In economic practice, for the most part, unequal intervals are used, progressively increasing or decreasing. Such a need arises especially in cases where the fluctuation of the sign is carried out unevenly and within large limits.

Consider the type of interval series with equal intervals, Table. 5.3:

Table 5.3

Distribution of workers by output

Output, tr. (X)

Number of workers (f)

Cumulative frequency (f´)

The interval distribution series is graphically depicted as a histogram, Fig.5.2.

Fig.5.2. Distribution of workers by output

Accumulated (cumulative) frequency. In practice, there is a need to convert the distribution series into cumulative rows, built on the accumulated frequencies. They can be used to define structural averages that facilitate the analysis of distribution series data.

The cumulative frequencies are determined by successively adding to the frequencies (or frequencies) of the first group of these indicators of the subsequent groups of the distribution series. Cumulates and ogives are used to illustrate the distribution series. To build them, the values ​​of a discrete feature (or the ends of the intervals) are marked on the abscissa axis, and the growing totals of frequencies (cumulate) are marked on the ordinate axis, Fig.5.3.

Rice. 5.3. The cumulative distribution of workers by development

If the scales of frequencies and variants are interchanged, i.e. reflect the accumulated frequencies on the abscissa axis, and the values ​​​​of the options on the ordinate axis, then the curve characterizing the change in frequencies from group to group will be called the distribution ogive, Fig. 5.4.

Rice. 5.4. Ogiva distribution of workers for production

Variation series with equal intervals provide one of the most important requirements for statistical distribution series, ensuring their comparability in time and space.

Distribution density. However, the frequencies of individual unequal intervals in these series are not directly comparable. In such cases, to ensure the necessary comparability, the distribution density is calculated, i.e. determine how many units in each group are per unit of interval value.

When constructing a graph of the distribution of a variational series with unequal intervals, the height of the rectangles is determined in proportion not to the frequencies, but to the indicators of the distribution density of the values ​​of the studied trait in the corresponding intervals.

Compilation of a variational series and its graphical representation is the first step in processing the initial data and the first step in the analysis of the studied population. The next step in the analysis of variational series is the determination of the main generalizing indicators, called the characteristics of the series. These characteristics should give an idea of ​​the average value of the attribute in the units of the population.

average value. The average value is a generalized characteristic of the studied trait in the studied population, reflecting its typical level per population unit in specific conditions of place and time.

The average value is always named, has the same dimension as the attribute of individual units of the population.

Before calculating the average values, it is necessary to group the units of the studied population, highlighting qualitatively homogeneous groups.

The average calculated for the population as a whole is called the general average, and for each group - group averages.

There are two types of averages: power (arithmetic average, harmonic average, geometric average, root mean quadratic); structural (mode, median, quartiles, deciles).

The choice of the average for the calculation depends on the purpose.

Types of power averages and methods for their calculation. In the practice of statistical processing of the collected material, various problems arise, for the solution of which different averages are required.

Mathematical statistics derive various means from power mean formulas:

where is the average value; x - individual options (feature values); z - exponent (at z = 1 - arithmetic mean, z = 0 geometric mean, z = - 1 - harmonic mean, z = 2 - mean quadratic).

However, the question of what type of average should be applied in each individual case is resolved by a specific analysis of the population under study.

The most common type of average in statistics is arithmetic mean. It is calculated in those cases when the volume of the averaged attribute is formed as the sum of its values ​​for individual units of the studied statistical population.

Depending on the nature of the initial data, the arithmetic mean is determined in various ways:

If the data is ungrouped, then the calculation is carried out according to the formula of a simple average value

Calculation of the arithmetic mean in a discrete series occurs according to the formula 3.4.

Calculation of the arithmetic mean in the interval series. In an interval variation series, where the middle of the interval is conditionally taken as the value of a feature in each group, the arithmetic mean may differ from the mean calculated from ungrouped data. Moreover, the larger the interval in groups, the greater the possible deviations of the average calculated from grouped data from the average calculated from ungrouped data.

When calculating the average for an interval variation series, in order to perform the necessary calculations, one passes from the intervals to their midpoints. And then calculate the average value by the formula of the arithmetic weighted average.

Properties of the arithmetic mean. The arithmetic mean has some properties that allow us to simplify calculations, let's consider them.

1. The arithmetic mean of the constant numbers is equal to this constant number.

If x = a. Then .

2. If the weights of all options are proportionally changed, i.e. increase or decrease by the same number of times, then the arithmetic mean of the new series will not change from this.

If all weights f are reduced by k times, then .

3. The sum of positive and negative deviations of individual options from the average, multiplied by the weights, is equal to zero, i.e.

If , then . From here.

If all options are reduced or increased by some number, then the arithmetic mean of the new series will decrease or increase by the same amount.

Reduce all options x on the a, i.e. x´ = xa.

Then

The arithmetic mean of the initial series can be obtained by adding to the reduced mean the number previously subtracted from the variants a, i.e. .

5. If all options are reduced or increased in k times, then the arithmetic mean of the new series will decrease or increase by the same amount, i.e. in k once.

Let then .

Hence , i.e. to obtain the average of the original series, the arithmetic mean of the new series (with reduced options) must be increased by k once.

Average harmonic. The harmonic mean is the reciprocal of the arithmetic mean. It is used when statistical information does not contain frequencies for individual population options, but is presented as their product (M = xf). The harmonic mean will be calculated using formula 3.5

The practical application of the harmonic mean is to calculate some indices, in particular, the price index.

Geometric mean. When using the geometric mean, the individual values ​​of the attribute are, as a rule, relative values ​​of the dynamics, built in the form of chain values, as a ratio to the previous level of each level in the dynamics series. The average thus characterizes the average growth rate.

The geometric mean is also used to determine the equidistant value from the maximum and minimum values ​​of the attribute. For example, an insurance company enters into contracts for the provision of auto insurance services. Depending on the specific insured event, the insurance payment may vary from 10,000 to 100,000 dollars per year. The average insurance payout is US$.

The geometric mean is the value used as the average of the ratios or in the distribution series, presented as a geometric progression, when z = 0. This average is convenient to use when attention is paid not to absolute differences, but to the ratios of two numbers.

Formulas for calculation are as follows

where are variants of the averaged feature; - the product of options; f– frequency of options.

The geometric mean is used in calculating average annual growth rates.

Mean square. The root mean square formula is used to measure the degree of fluctuation of the individual values ​​of a trait around the arithmetic mean in the distribution series. So, when calculating the indicators of variation, the average is calculated from the squares of the deviations of the individual values ​​of the trait from the arithmetic mean.

The mean square value is calculated by the formula

In economic research, the modified form of the root mean square is widely used in the calculation of indicators of the variation of a trait, such as variance, standard deviation.

Majority rule. There is the following relationship between power-law averages - the larger the exponent, the greater the value of the average, Table 5.4:

Table 5.4

Relationship between averages

z value

The ratio between the averages

This relation is called the rule of majorance.

Structural averages. To characterize the structure of the population, special indicators are used, which can be called structural averages. These measures include mode, median, quartiles, and deciles.

Fashion. Mode (Mo) is the most frequently occurring value of a feature in population units. Mode is the value of the attribute that corresponds to the maximum point of the theoretical distribution curve.

Fashion is widely used in commercial practice in the study of consumer demand (when determining the size of clothes and shoes that are in great demand), price registration. There can be several mods in total.

Mode calculation in a discrete series. In a discrete series, the mode is the variant with the highest frequency. Consider finding a mode in a discrete series.

Calculation of fashion in an interval series. In the interval variation series, the central variant of the modal interval is approximately considered to be a mode, i.e. the interval that has the highest frequency (frequency). Within the interval, it is necessary to find the value of the attribute, which is the mode. For an interval series, the mode will be determined by the formula

where is the lower limit of the modal interval; is the value of the modal interval; is the frequency corresponding to the modal interval; is the frequency preceding the modal interval; is the frequency of the interval following the modal.

Median. The median () is the value of the feature in the middle unit of the ranked series. A ranked series is a series in which the characteristic values ​​are written in ascending or descending order. Or the median is a value that divides the number of an ordered variational series into two equal parts: one part has a value of a variable feature that is less than the average variant, and the other is large.

To find the median, its serial number is first determined. To do this, with an odd number of units, one is added to the sum of all frequencies and everything is divided by two. With an even number of units, the median is found as the value of the attribute of the unit, the serial number of which is determined by the total sum of frequencies divided by two. Knowing the ordinal number of the median, it is easy to find its value from the accumulated frequencies.

Calculation of the median in a discrete series. According to the sample survey, data were obtained on the distribution of families by the number of children, Table. 5.5. To determine the median, first determine its ordinal number

In these families, the number of children is 2, therefore, = 2. Thus, in 50% of families, the number of children does not exceed 2.

–accumulated frequency preceding the median interval;

On the one hand, this is a very positive property. in this case, the effect of all causes affecting all units of the population under study is taken into account. On the other hand, even one observation that was accidentally included in the initial data can significantly distort the idea of ​​the level of development of the studied trait in the population under consideration (especially in short series).

Quartiles and deciles. By analogy with finding the median in variational series, one can find the value of a feature in any ranked series unit in order. So, in particular, one can find the value of a feature for units dividing the series into 4 equal parts, into 10, etc.

Quartiles. Variants that divide the ranked series into four equal parts are called quartiles.

At the same time, the following are distinguished: the lower (or first) quartile (Q1) - the value of the feature of the unit of the ranked series, dividing the population in the ratio of ¼ to ¾ and the upper (or third) quartile (Q3) - the value of the feature of the unit of the ranked series, dividing the population in the ratio ¾ to ¼.

– frequencies of quartile intervals (lower and upper)

The intervals containing Q1 and Q3 are determined from the accumulated frequencies (or frequencies).

Deciles. In addition to quartiles, deciles are calculated - options that divide the ranked series into 10 equal parts.

They are denoted by D, the first decile D1 divides the series in the ratio of 1/10 and 9/10, the second D2 - 2/10 and 8/10, etc. They are calculated in the same way as the median and quartiles.

Both the median, and quartiles, and deciles belong to the so-called ordinal statistics, which is understood as a variant that occupies a certain ordinal place in a ranked series.

The rows built by quantity, are called variational.

The distribution series consist of options(characteristic values) and frequencies(number of groups). Frequencies expressed as relative values ​​(shares, percentages) are called frequencies. The sum of all frequencies is called the volume of the distribution series.

By type, the distribution series are divided into discrete(built on discontinuous values ​​of the feature) and interval(built on continuous feature values).

Variation series represents two columns (or rows); one of which provides individual values ​​of the variable attribute, called variants and denoted by X; and in the other - absolute numbers showing how many times (how often) each option occurs. The indicators of the second column are called frequencies and are conventionally denoted by f. Once again, we note that in the second column, relative indicators characterizing the share of the frequency of individual variants in the total amount of frequencies can also be used. These relative indicators are called frequencies and conventionally denoted by ω The sum of all frequencies in this case is equal to one. However, frequencies can also be expressed as a percentage, and then the sum of all frequencies gives 100%.

If the variants of the variational series are expressed as discrete values, then such a variational series is called discrete.

For continuous features, variation series are constructed as interval, that is, the values ​​of the attribute in them are expressed “from ... to ...”. In this case, the minimum values ​​of the attribute in such an interval are called the lower limit of the interval, and the maximum - the upper limit.

Interval variational series are also built for discrete features that vary over a wide range. The interval series can be equal and unequal intervals.

Consider how the value of equal intervals is determined. Let us introduce the following notation:

i– interval value;

- the maximum value of the attribute for units of the population;

- the minimum value of the attribute for units of the population;

n- the number of allocated groups.

if n is known.

If the number of allocated groups is difficult to determine in advance, then the formula proposed by Sturgess in 1926 can be recommended to calculate the optimal size of the interval with a sufficient population size:

n = 1+ 3.322 log N, where N is the number of ones in the population.

The value of unequal intervals is determined in each individual case, taking into account the characteristics of the object of study.

The statistical distribution of the sample call the list of options and their corresponding frequencies (or relative frequencies).

The statistical distribution of the sample can be specified in the form of a table, in the first column of which there are options, and in the second - the frequencies corresponding to these options. ni, or relative frequencies Pi .

Statistical distribution of the sample

Interval series are called variation series in which the values ​​of the features underlying their formation are expressed within certain limits (intervals). Frequencies in this case do not refer to individual values ​​of the attribute, but to the entire interval.

Interval distribution series are constructed according to continuous quantitative characteristics, as well as according to discrete characteristics, varying within a significant range.

The interval series can be represented by the statistical distribution of the sample, indicating the intervals and their corresponding frequencies. In this case, the sum of the frequencies of the variant that fell into this interval is taken as the frequency of the interval.

When grouping by quantitative continuous features, it is important to determine the size of the interval.

In addition to the sample mean and sample variance, other characteristics of the variation series are also used.

Fashion name the variant that has the highest frequency.

As a result of mastering this chapter, the student must: know

  • indicators of variation and their relationship;
  • basic laws of distribution of features;
  • the essence of the consent criteria; be able to
  • calculate rates of variation and goodness of fit;
  • determine the characteristics of distributions;
  • evaluate the main numerical characteristics of statistical distribution series;

own

  • methods of statistical analysis of distribution series;
  • basics of dispersion analysis;
  • methods for checking statistical distribution series for compliance with the basic laws of distribution.

Variation indicators

In the statistical study of the features of various statistical populations, it is of great interest to study the variation of the feature of individual statistical units of the population, as well as the nature of the distribution of units according to this feature. Variation - these are the differences in the individual values ​​of the trait among the units of the studied population. The study of variation is of great practical importance. By the degree of variation, one can judge the boundaries of the variation of the trait, the homogeneity of the population for this trait, the typicality of the average, the relationship of factors that determine the variation. Variation indicators are used to characterize and order statistical populations.

The results of the summary and grouping of statistical observation materials, drawn up in the form of statistical distribution series, represent an ordered distribution of units of the studied population into groups according to a grouping (variable) attribute. If a qualitative trait is taken as the basis for grouping, then such a distribution series is called attributive(distribution by profession, gender, color, etc.). If the distribution series is built on a quantitative basis, then such a series is called variational(distribution by height, weight, wages, etc.). To build a variation series means to order the quantitative distribution of population units according to the characteristic values, to count the number of population units with these values ​​(frequency), to arrange the results in a table.

Instead of the frequency of a variant, it is possible to use its ratio to the total volume of observations, which is called the frequency (relative frequency).

There are two types of variation series: discrete and interval. Discrete series- this is such a variational series, the construction of which is based on signs with a discontinuous change (discrete signs). The latter include the number of employees in the enterprise, the wage category, the number of children in the family, etc. A discrete variational series is a table that consists of two columns. The first column indicates the specific value of the attribute, and the second - the number of population units with a specific value of the attribute. If a sign has a continuous change (the amount of income, work experience, the cost of fixed assets of an enterprise, etc., which, within certain limits, can take any values), then for this sign it is possible to build interval variation series. The table when constructing an interval variation series also has two columns. The first indicates the value of the feature in the interval "from - to" (options), the second - the number of units included in the interval (frequency). Frequency (repetition frequency) - the number of repetitions of a particular variant of the attribute values. Intervals can be closed and open. Closed intervals are limited on both sides, i.e. have a border both lower (“from”) and upper (“to”). Open intervals have any one border: either upper or lower. If the options are arranged in ascending or descending order, then the rows are called ranked.

For variational series, there are two types of frequency response options: cumulative frequency and cumulative frequency. The cumulative frequency shows how many observations the value of the feature took on values ​​less than the specified value. The cumulative frequency is determined by summing the values ​​of the characteristic frequency for a given group with all the frequencies of the previous groups. The accumulated frequency characterizes the proportion of units of observation in which the values ​​of the feature do not exceed the upper limit of the day group. Thus, the accumulated frequency shows the specific weight of the variant in the aggregate, which have a value not greater than the given one. Frequency, frequency, absolute and relative densities, cumulative frequency and frequency are characteristics of the magnitude of the variant.

Variations in the sign of statistical units of the population, as well as the nature of the distribution, are studied using indicators and characteristics of the variation series, which include the average level of the series, the average linear deviation, the standard deviation, dispersion, oscillation coefficients, variation, asymmetry, kurtosis, etc.

Average values ​​are used to characterize the distribution center. The average is a generalizing statistical characteristic, in which the typical level of a trait possessed by members of the studied population is quantified. However, there may be cases where the arithmetic means coincide with a different nature of the distribution, therefore, as statistical characteristics of the variation series, the so-called structural averages are calculated - mode, median, as well as quantiles that divide the distribution series into equal parts (quartiles, deciles, percentiles, etc.). ).

Fashion - this is the value of the feature that occurs more frequently in the distribution series than its other values. For discrete series, this is the variant with the highest frequency. In interval variation series, in order to determine the mode, it is necessary to determine, first of all, the interval in which it is located, the so-called modal interval. In a variational series with equal intervals, the modal interval is determined by the highest frequency, in series with unequal intervals - but by the highest distribution density. Then, to determine the mode in rows with equal intervals, apply the formula

where Mo is the value of fashion; x Mo - the lower limit of the modal interval; h- modal interval width; / Mo - modal interval frequency; / Mo j - frequency of the pre-modal interval; / Mo+1 is the frequency of the post-modal interval, and for a series with unequal intervals in this calculation formula, instead of the frequencies / Mo, / Mo, / Mo, distribution densities should be used Mind 0 _| , Mind 0> UMO+"

If there is a single mode, then the probability distribution of the random variable is called unimodal; if there is more than one mode, it is called multimodal (polymodal, multimodal), in the case of two modes - bimodal. As a rule, multimodality indicates that the distribution under study does not follow the normal distribution law. Homogeneous populations, as a rule, are characterized by unimodal distributions. Multivertex also indicates the heterogeneity of the studied population. The appearance of two or more vertices makes it necessary to regroup the data in order to isolate more homogeneous groups.

In an interval variation series, the mode can be determined graphically using a histogram. To do this, two intersecting lines are drawn from the top points of the highest column of the histogram to the top points of two adjacent columns. Then, from the point of their intersection, a perpendicular is lowered to the abscissa axis. The feature value on the abscissa corresponding to the perpendicular is the mode. In many cases, when characterizing the population as a generalized indicator, preference is given to the mode, rather than the arithmetic mean.

Median - this is the central value of the feature; it is possessed by the central member of the ranked distribution series. In discrete series, to find the value of the median, its serial number is first determined. To do this, with an odd number of units, one is added to the sum of all frequencies, the number is divided by two. If there are an even number of 1s, there will be 2 median 1s in the series, so in this case the median is defined as the average of the values ​​of the 2 median 1s. Thus, the median in a discrete variation series is the value that divides the series into two parts containing the same number of options.

In the interval series, after determining the serial number of the median, the median interval is found by the accumulated frequencies (frequencies), and then, using the formula for calculating the median, the value of the median itself is determined:

where Me is the value of the median; x Me - the lower limit of the median interval; h- median interval width; - the sum of the frequencies of the distribution series; /D - the accumulated frequency of the pre-median interval; / Me - the frequency of the median interval.

The median can be found graphically using the cumulate. To do this, on the scale of accumulated frequencies (frequencies) of the cumulate, from the point corresponding to the ordinal number of the median, a straight line is drawn parallel to the abscissa axis until it intersects with the cumulate. Further, from the point of intersection of the indicated straight line with the cumulate, a perpendicular is lowered to the abscissa axis. The value of the feature on the x-axis corresponding to the drawn ordinate (perpendicular) is the median.

The median is characterized by the following properties.

  • 1. It does not depend on those attribute values ​​that are located on both sides of it.
  • 2. It has the property of minimality, which means that the sum of the absolute deviations of the attribute values ​​from the median is the minimum value compared to the deviation of the attribute values ​​from any other value.
  • 3. When combining two distributions with known medians, it is impossible to predict the median value of the new distribution in advance.

These properties of the median are widely used in designing the location of public service points - schools, clinics, gas stations, water pumps, etc. For example, if it is planned to build a polyclinic in a certain quarter of the city, then it is more expedient to locate it at a point in the quarter that bisects not the length of the quarter, but the number of inhabitants.

The ratio of the mode, median and arithmetic mean indicates the nature of the distribution of the trait in the aggregate, allows you to evaluate the symmetry of the distribution. If a x Me then there is a right-hand asymmetry of the series. With a normal distribution X - Me - Mo.

K. Pearson, based on the alignment of various types of curves, determined that for moderately asymmetric distributions, the following approximate relationships between the arithmetic mean, median and mode are valid:

where Me is the value of the median; Mo - fashion value; x arithm - the value of the arithmetic mean.

If there is a need to study the structure of the variation series in more detail, then the characteristic values ​​are calculated, similar to the median. Such feature values ​​divide all distribution units into equal numbers, they are called quantiles or gradients. Quantiles are subdivided into quartiles, deciles, percentiles, etc.

Quartiles divide the population into four equal parts. The first quartile is calculated similarly to the median using the formula for calculating the first quartile, having previously determined the first quarterly interval:

where Qi is the value of the first quartile; xQ^- the lower limit of the first quartile interval; h- width of the first quarterly interval; /, - frequencies of the interval series;

Accumulated frequency in the interval preceding the first quartile interval; Jq ( - frequency of the first quartile interval.

The first quartile shows that 25% of the population units are less than its value, and 75% are more. The second quartile is equal to the median, i.e. Q2 = me.

By analogy, the third quartile is calculated, having previously found the third quarterly interval:

where is the lower limit of the third quartile interval; h- width of the third quartile interval; /, - frequencies of the interval series; /X"- accumulated frequency in the interval preceding

G

third quartile interval; Jq - frequency of the third quartile interval.

The third quartile shows that 75% of the population units are less than its value, and 25% are more.

The difference between the third and first quartiles is the interquartile range:

where Aq is the value of the interquartile interval; Q 3 - the value of the third quartile; Q, - the value of the first quartile.

Deciles divide the population into 10 equal parts. A decile is a value of a feature in a distribution series that corresponds to tenths of the population. By analogy with quartiles, the first decile shows that 10% of the population units are less than its value, and 90% are more, and the ninth decile reveals that 90% of the population units are less than its value, and 10% are more. The ratio of the ninth and first deciles, i.e. decile coefficient, widely used in the study of income differentiation to measure the ratio of income levels of 10% of the most wealthy and 10% of the least wealthy population. Percentiles divide the ranked population into 100 equal parts. The calculation, meaning and use of percentiles are similar to deciles.

Quartiles, deciles and other structural characteristics can be determined graphically by analogy with the median using the cumulate.

To measure the size of the variation, the following indicators are used: the range of variation, the average linear deviation, the standard deviation, and the variance. The magnitude of the range of variation depends entirely on the randomness of the distribution of the extreme members of the series. This indicator is of interest in cases where it is important to know what is the amplitude of fluctuations in the values ​​of the attribute:

where R- the value of the range of variation; x max - the maximum value of the attribute; x tt - the minimum value of the feature.

When calculating the range of variation, the value of the vast majority of the series members is not taken into account, while the variation is associated with each value of the series member. This shortcoming is devoid of indicators that are averages obtained from the deviations of individual trait values ​​from their average value: the average linear deviation and the standard deviation. There is a direct relationship between individual deviations from the average and the fluctuation of a particular trait. The stronger the volatility, the greater the absolute size of the deviations from the average.

The average linear deviation is the arithmetic average of the absolute values ​​of the deviations of individual options from their average value.

Mean Linear Deviation for Ungrouped Data

where / pr - the value of the average linear deviation; x, - - the value of the feature; X - P - number of population units.

Grouped Series Average Linear Deviation

where / vz - the value of the average linear deviation; x, - the value of the feature; X - the average value of the trait for the studied population; / - the number of population units in a separate group.

Deviation signs are ignored in this case, otherwise the sum of all deviations will be equal to zero. The average linear deviation depending on the grouping of the analyzed data is calculated using different formulas: for grouped and non-grouped data. The average linear deviation, due to its conventionality, separately from other indicators of variation, is used relatively rarely in practice (in particular, to characterize the fulfillment of contractual obligations in terms of the uniformity of supply; in the analysis of foreign trade turnover, the composition of employees, the rhythm of production, product quality, taking into account the technological features of production and etc.).

The standard deviation characterizes how much the individual values ​​of the studied trait deviate on average from the average value for the population, and is expressed in units of the studied trait. The standard deviation, being one of the main measures of variation, is widely used in assessing the boundaries of the variation of a trait in a homogeneous population, in determining the values ​​of the ordinates of the normal distribution curve, as well as in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics. The standard deviation for ungrouped data is calculated according to the following algorithm: each deviation from the average is squared, all squares are summed, after which the sum of squares is divided by the number of terms in the series and the square root is taken from the quotient:

where a Iip - the value of the standard deviation; Xj- feature value; X- the average value of the attribute for the studied population; P - number of population units.

For grouped analyzed data, the standard deviation of the data is calculated using the weighted formula

where - the value of the standard deviation; Xj- feature value; X - the average value of the trait for the studied population; fx- the number of population units in a particular group.

The expression under the root in both cases is called the variance. Thus, the variance is calculated as the average square of the deviations of the trait values ​​from their average value. For unweighted (simple) feature values, the variance is defined as follows:

For weighted characteristic values

There is also a special simplified way to calculate the variance: in general terms

for unweighted (simple) feature values for weighted characteristic values
using the method of counting from conditional zero

where a 2 - the value of the dispersion; x, - - the value of the feature; X - the average value of the feature, h- group interval value, t 1 - weight (A =

Dispersion has an independent expression in statistics and is one of the most important indicators of variation. It is measured in units corresponding to the square of the units of measurement of the trait under study.

The dispersion has the following properties.

  • 1. The dispersion of a constant value is zero.
  • 2. Reducing all values ​​of the feature by the same value of A does not change the value of the variance. This means that the mean square of deviations can be calculated not from the given values ​​of the attribute, but from their deviations from some constant number.
  • 3. Decreasing all values ​​of the feature in k times reduces the dispersion in k 2 times, and the standard deviation - in k times, i.e. all feature values ​​can be divided by some constant number (say, by the value of the interval of the series), calculate the standard deviation, and then multiply it by a constant number.
  • 4. If we calculate the average square of deviations from any value And at differs to some extent from the arithmetic mean, then it will always be greater than the mean square of the deviations calculated from the arithmetic mean. In this case, the mean square of deviations will be larger by a well-defined value - by the square of the difference between the average and this conditionally taken value.

The variation of an alternative feature is the presence or absence of the studied property in the units of the population. Quantitatively, the variation of an alternative attribute is expressed by two values: the presence of the studied property in a unit is denoted by one (1), and its absence is denoted by zero (0). The proportion of units that have the property under study is denoted by P, and the proportion of units that do not have this property is denoted by G. Thus, the variance of an alternative attribute is equal to the product of the proportion of units that have a given property (P) by the proportion of units that do not have this property (G). The greatest variation of the population is achieved in cases where a part of the population, which is 50% of the total volume of the population, has a feature, and the other part of the population, also equal to 50%, does not have this feature, while the variance reaches a maximum value of 0.25, m .e. P = 0.5, G= 1 - P \u003d 1 - 0.5 \u003d 0.5 and o 2 \u003d 0.5 0.5 \u003d 0.25. The lower limit of this indicator is equal to zero, which corresponds to a situation in which there is no variation in the aggregate. The practical application of the variance of an alternative feature is to build confidence intervals when conducting a sample observation.

The smaller the value of the variance and the standard deviation, the more homogeneous the population and the more typical the average will be. In the practice of statistics, it often becomes necessary to compare variations of various features. For example, it is interesting to compare variations in the age of workers and their qualifications, length of service and wages, cost and profit, length of service and labor productivity, etc. For such comparisons, indicators of the absolute variability of characteristics are unsuitable: it is impossible to compare the variability of work experience, expressed in years, with the variation of wages, expressed in rubles. To carry out such comparisons, as well as comparisons of the fluctuation of the same attribute in several populations with different arithmetic means, variation indicators are used - the oscillation coefficient, the linear coefficient of variation and the coefficient of variation, which show the measure of fluctuations of extreme values ​​around the average.

Oscillation factor:

where V R - the value of the oscillation coefficient; R- the value of the range of variation; X -

Linear coefficient of variation".

where vj- the value of the linear coefficient of variation; I- the value of the average linear deviation; X - the average value of the trait for the population under study.

The coefficient of variation:

where Va- the value of the coefficient of variation; a - the value of the standard deviation; X - the average value of the trait for the population under study.

The oscillation coefficient is the percentage of the range of variation to the mean value of the trait under study, and the linear coefficient of variation is the ratio of the mean linear deviation to the mean value of the trait under study, expressed as a percentage. The coefficient of variation is the percentage of the standard deviation to the mean value of the trait under study. As a relative value, expressed as a percentage, the coefficient of variation is used to compare the degree of variation of various traits. Using the coefficient of variation, the homogeneity of the statistical population is estimated. If the coefficient of variation is less than 33%, then the population under study is homogeneous, and the variation is weak. If the coefficient of variation is greater than 33%, then the population under study is heterogeneous, the variation is strong, and the average value is atypical and cannot be used as a generalizing indicator of this population. In addition, the coefficients of variation are used to compare the fluctuation of one trait in different populations. For example, to assess the variation in the length of service of workers at two enterprises. The larger the value of the coefficient, the more significant the variation of the feature.

Based on the calculated quartiles, it is also possible to calculate the relative indicator of quarterly variation using the formula

where Q 2 and

The interquartile range is determined by the formula

The quartile deviation is used instead of the range of variation to avoid the disadvantages associated with using extreme values:

For unequal interval variational series, the distribution density is also calculated. It is defined as the quotient of the corresponding frequency or frequency divided by the interval value. In unequal interval series, absolute and relative distribution densities are used. The absolute distribution density is the frequency per unit length of the interval. Relative distribution density - the frequency per unit length of the interval.

All of the above is true for distribution series whose distribution law is well described by the normal distribution law or is close to it.