Discrete variation series in statistics definition. Variation Series

The set of values ​​of the parameter studied in a given experiment or observation, ranked by magnitude (increase or decrease) is called a variation series.

Let's assume that we measured the blood pressure of ten patients in order to obtain an upper BP threshold: systolic pressure, i.e. only one number.

Imagine that a series of observations (statistical population) of arterial systolic pressure in 10 observations has the following form (Table 1):

Table 1

The components of a variational series are called variants. Variants represent the numerical value of the trait being studied.

The construction of a variational series from a statistical set of observations is only the first step towards comprehending the features of the entire set. Next, it is necessary to determine the average level of the studied quantitative trait (the average level of blood protein, the average weight of patients, the average time of onset of anesthesia, etc.)

The average level is measured using criteria that are called averages. The average value is a generalizing numerical characteristic of qualitatively homogeneous values, characterizing by one number the entire statistical population according to one attribute. The average value expresses the general that is characteristic of a trait in a given set of observations.

There are three types of averages in common use: mode (), median () and arithmetic mean ().

To determine any average value, it is necessary to use the results of individual observations, writing them in the form of a variation series (Table 2).

Fashion- the value that occurs most frequently in a series of observations. In our example, mode = 120. If there are no repeating values ​​in the variation series, then they say that there is no mode. If several values ​​are repeated the same number of times, then the smallest of them is taken as the mode.

Median- the value dividing the distribution into two equal parts, the central or median value of a series of observations ordered in ascending or descending order. So, if there are 5 values ​​in the variational series, then its median is equal to the third member of the variational series, if there is an even number of members in the series, then the median is the arithmetic mean of its two central observations, i.e. if there are 10 observations in the series, then the median is equal to the arithmetic mean of 5 and 6 observations. In our example.

Note an important feature of the mode and median: their values ​​are not affected by the numerical values ​​of the extreme variants.

Arithmetic mean calculated by the formula:

where is the observed value in the -th observation, and is the number of observations. For our case.

The arithmetic mean has three properties:

The middle one occupies the middle position in the variation series. In a strictly symmetrical row.

The average is a generalizing value and random fluctuations, differences in individual data are not visible behind the average. It reflects the typical that is characteristic of the entire population.

The sum of deviations of all variants from the mean is equal to zero: . The deviation of the variant from the mean is indicated.

The variation series consists of variants and their corresponding frequencies. Of the ten values ​​obtained, the number 120 was encountered 6 times, 115 - 3 times, 125 - 1 time. Frequency () - the absolute number of individual options in the population, indicating how many times this option occurs in the variation series.

The variation series can be simple (frequencies = 1) or grouped shortened, 3-5 options each. A simple series is used with a small number of observations (), grouped - with a large number of observations ().

Variation series: definition, types, main characteristics. Method of calculation
fashion, median, arithmetic mean in medical and statistical studies
(Show on a conditional example).

A variational series is a series of numerical values ​​of the trait under study, which differ from each other in their magnitude and are arranged in a certain sequence (in ascending or descending order). Each numerical value of the series is called a variant (V), and the numbers showing how often this or that variant occurs in the composition of this series is called the frequency (p).

The total number of cases of observations, of which the variation series consists, is denoted by the letter n. The difference in the meaning of the studied characteristics is called variation. If the variable sign does not have a quantitative measure, the variation is called qualitative, and the distribution series is called attribute (for example, distribution by disease outcome, health status, etc.).

If a variable sign has a quantitative expression, such a variation is called quantitative, and the distribution series is called variational.

Variational series are divided into discontinuous and continuous - according to the nature of the quantitative trait, simple and weighted - according to the frequency of occurrence of the variant.

In a simple variational series, each variant occurs only once (p=1), in a weighted one, the same variant occurs several times (p>1). Examples of such series will be discussed later in the text. If the quantitative attribute is continuous, i.e. between integer values ​​there are intermediate fractional values, the variational series is called continuous.

For example: 10.0 - 11.9

14.0 - 15.9, etc.

If the quantitative sign is discontinuous, i.e. its individual values ​​(options) differ from each other by an integer and do not have intermediate fractional values, the variation series is called discontinuous or discrete.

Using the data from the previous example about the heart rate

for 21 students, we will build a variation series (Table 1).

Table 1

Distribution of medical students by pulse rate (bpm)

Thus, to build a variational series means to systematize, streamline the existing numerical values ​​(options), i.e. arrange in a certain sequence (in ascending or descending order) with their corresponding frequencies. In the example under consideration, the options are arranged in ascending order and are expressed as discontinuous (discrete) integers, each option occurs several times, i.e. we are dealing with a weighted, discontinuous or discrete variational series.

As a rule, if the number of observations in the statistical population we are studying does not exceed 30, then it is enough to arrange all the values ​​of the trait under study in a variational series in increasing order, as in Table. 1, or in descending order.

With a large number of observations (n>30), the number of occurring variants can be very large, in this case an interval or grouped variational series is compiled, in which, to simplify subsequent processing and clarify the nature of the distribution, the variants are combined into groups.

Usually the number of group options ranges from 8 to 15.

There must be at least 5 of them, because. otherwise, it will be too rough, excessive enlargement, which distorts the overall picture of variation and greatly affects the accuracy of the average values. When the number of group options is more than 20-25, the accuracy of calculating the average values ​​increases, but the features of the variation of the attribute are significantly distorted and mathematical processing becomes more complicated.

When compiling a grouped series, it is necessary to take into account

− variant groups must be placed in a specific order (ascending or descending);

- the intervals in the variant groups should be the same;

− the values ​​of the boundaries of the intervals should not coincide, because it will not be clear in which groups to attribute individual options;

- it is necessary to take into account the qualitative features of the collected material when setting the limits of the intervals (for example, when studying the weight of adults, an interval of 3-4 kg is acceptable, and for children in the first months of life it should not exceed 100 g.)

Let's build a grouped (interval) series that characterizes the data on the pulse rate (number of beats per minute) for 55 medical students before the exam: 64, 66, 60, 62,

64, 68, 70, 66, 70, 68, 62, 68, 70, 72, 60, 70, 74, 62, 70, 72, 72,

64, 70, 72, 76, 76, 68, 70, 58, 76, 74, 76, 76, 82, 76, 72, 76, 74,

79, 78, 74, 78, 74, 78, 74, 74, 78, 76, 78, 76, 80, 80, 80, 78, 78.

To build a grouped series, you need:

1. Determine the value of the interval;

2. Determine the middle, beginning and end of the groups of the variant of the variation series.

● The value of the interval (i) is determined by the number of expected groups (r), the number of which is set depending on the number of observations (n) according to a special table

Number of groups depending on the number of observations:

In our case, for 55 students, it is possible to make up from 8 to 10 groups.

The value of the interval (i) is determined by the following formula -

i = Vmax-Vmin/r

In our example, the value of the interval is 82-58/8= 3.

If the interval value is a fractional number, the result should be rounded up to an integer.

There are several types of averages:

● arithmetic mean,

● geometric mean,

● harmonic mean,

● root mean square,

● medium progressive,

● median

In medical statistics, arithmetic averages are most often used.

The arithmetic mean (M) is a generalizing value that determines the typical value that is characteristic of the entire population. The main methods for calculating M are: the arithmetic mean method and the method of moments (conditional deviations).

The arithmetic mean method is used to calculate the simple arithmetic mean and the weighted arithmetic mean. The choice of method for calculating the arithmetic mean value depends on the type of variation series. In the case of a simple variational series, in which each variant occurs only once, the simple arithmetic mean is determined by the formula:

where: М – arithmetic mean value;

V is the value of the variable feature (options);

Σ - indicates the action - summation;

n is the total number of observations.

An example of calculating the arithmetic mean is simple. Respiratory rate (number of breaths per minute) in 9 men aged 35: 20, 22, 19, 15, 16, 21, 17, 23, 18.

To determine the average level of respiratory rate in men aged 35, it is necessary:

1. Build a variational series, placing all options in ascending or descending order. We got a simple variational series, because variant values ​​occur only once.

M = ∑V/n = 171/9 = 19 breaths per minute

Conclusion. The respiratory rate in men aged 35 is on average 19 breaths per minute.

If individual values ​​of a variant are repeated, there is no need to write out each variant in a line; it is enough to list the sizes of the variant that occur (V) and next to indicate the number of their repetitions (p). such a variational series, in which the variants are, as it were, weighted according to the number of frequencies corresponding to them, is called the weighted variational series, and the calculated average value is the arithmetic weighted average.

The arithmetic weighted average is determined by the formula: M= ∑Vp/n

where n is the number of observations equal to the sum of frequencies - Σr.

An example of calculating the arithmetic weighted average.

The duration of disability (in days) in 35 patients with acute respiratory diseases (ARI) treated by a local doctor during the first quarter of the current year was: 6, 7, 5, 3, 9, 8, 7, 5, 6, 4, 9, 8, 7, 6, 6, 9, 6, 5, 10, 8, 7, 11, 13, 5, 6, 7, 12, 4, 3, 5, 2, 5, 6, 6, 7 days .

The methodology for determining the average duration of disability in patients with acute respiratory infections is as follows:

1. Let's build a weighted variational series, because individual variant values ​​are repeated several times. To do this, you can arrange all the options in ascending or descending order with their corresponding frequencies.

In our case, the options are in ascending order.

2. Calculate the arithmetic weighted average using the formula: M = ∑Vp/n = 233/35 = 6.7 days

Distribution of patients with acute respiratory infections by duration of disability:

Duration of incapacity for work (V) Number of patients (p) vp
∑p = n = 35 ∑Vp = 233

Conclusion. The duration of disability in patients with acute respiratory diseases averaged 6.7 days.

Mode (Mo) is the most common variant in the variation series. For the distribution presented in the table, the mode corresponds to the variant equal to 10, it occurs more often than others - 6 times.

Distribution of patients by length of stay in a hospital bed (in days)

V
p

Sometimes it is difficult to determine the exact value of the mode, since there may be several observations in the data being studied that occur “most often”.

Median (Me) is a non-parametric indicator that divides the variation series into two equal halves: the same number of options is located on both sides of the median.

For example, for the distribution shown in the table, the median is 10 because on both sides of this value is located on the 14th option, i.e. the number 10 occupies a central position in this series and is its median.

Given that the number of observations in this example is even (n=34), the median can be determined as follows:

Me = 2+3+4+5+6+5+4+3+2/2 = 34/2 = 17

This means that the middle of the series falls on the seventeenth option, which corresponds to a median of 10. For the distribution presented in the table, the arithmetic mean is:

M = ∑Vp/n = 334/34 = 10.1

So, for 34 observations from Table. 8, we got: Mo=10, Me=10, arithmetic mean (M) is 10.1. In our example, all three indicators turned out to be equal or close to each other, although they are completely different.

The arithmetic mean is the resultant sum of all influences; all options, without exception, take part in its formation, including extreme ones, often atypical for a given phenomenon or set.

Mode and median, in contrast to the arithmetic mean, do not depend on the value of all individual values ​​of the variable attribute (the values ​​of the extreme variants and the degree of scattering of the series). The arithmetic mean characterizes the entire mass of observations, the mode and median characterize the bulk

The grouping method also allows you to measure variation(variability, fluctuation) of signs. With a relatively small number of population units, the variation is measured on the basis of a ranked series of units that make up the population. The row is called ranked if the units are arranged in ascending (descending) feature.

However, ranked series are rather indicative when a comparative characteristic of variation is needed. In addition, in many cases one has to deal with statistical aggregates consisting of a large number of units, which are practically difficult to represent in the form of a specific series. In this regard, for the initial general acquaintance with statistical data and especially to facilitate the study of the variation of signs, the studied phenomena and processes are usually combined into groups, and the results of the grouping are drawn up in the form of group tables.

If there are only two columns in the group table - groups according to the selected feature (options) and the number of groups (frequencies or frequencies), it is called near distribution.

Distribution range - the simplest type of structural grouping according to one attribute, displayed in a group table with two columns containing variants and frequencies of the attribute. In many cases, with such a structural grouping, i.e. with the compilation of distribution series, the study of the initial statistical material begins.

A structural grouping in the form of a distribution series can be turned into a true structural grouping if the selected groups are characterized not only by frequencies, but also by other statistical indicators. The main purpose of distribution series is to study the variation of features. The theory of distribution series is developed in detail by mathematical statistics.

The distribution series are divided into attributive(grouping by attributive characteristics, for example, the division of the population by sex, nationality, marital status, etc.) and variational(grouping by quantitative characteristics).

Variation series is a group table that contains two columns: a grouping of units according to one quantitative attribute and the number of units in each group. The intervals in the variation series are usually formed equal and closed. The variation series is the following grouping of the Russian population in terms of average per capita cash income (Table 3.10).

Table 3.10

Distribution of Russia's population by average per capita income in 2004-2009

Population groups by average per capita cash income, rub./month

Population in the group, in % of the total

8 000,1-10 000,0

10 000,1-15 000,0

15 000,1-25 000,0

Over 25,000.0

All population

Variational series, in turn, are divided into discrete and interval. Discrete variation series combine variants of discrete features that vary within narrow limits. An example of a discrete variation series is the distribution of Russian families according to the number of children they have.

Interval variational series combine variants of either continuous features or discrete features that change over a wide range. The interval series is the variational series of the distribution of the Russian population in terms of average per capita cash income.

Discrete variational series are not used very often in practice. Meanwhile, compiling them is not difficult, since the composition of the groups is determined by the specific variants that the studied grouping characteristics actually possess.

Interval variational series are more widespread. In compiling them, the difficult question arises of the number of groups, as well as the size of the intervals that should be established.

The principles for resolving this issue are set out in the chapter on the methodology for constructing statistical groupings (see paragraph 3.3).

Variation series are a means of collapsing or compressing diverse information into a compact form; they can be used to make a fairly clear judgment about the nature of the variation, to study the differences in the signs of the phenomena included in the set under study. But the most important significance of the variational series is that on their basis the special generalizing characteristics of the variation are calculated (see Chapter 7).

A special place in statistical analysis belongs to the determination of the average level of the studied trait or phenomenon. The average level of a feature is measured by average values.

The average value characterizes the general quantitative level of the studied trait and is a group property of the statistical population. It levels, weakens the random deviations of individual observations in one direction or another and highlights the main, typical property of the trait under study.

Averages are widely used:

1. To assess the health status of the population: characteristics of physical development (height, weight, chest circumference, etc.), identifying the prevalence and duration of various diseases, analyzing demographic indicators (natural population movement, average life expectancy, population reproduction, average population and etc.).

2. To study the activities of medical institutions, medical personnel and assess the quality of their work, planning and determining the needs of the population in various types of medical care (average number of requests or visits per inhabitant per year, average length of stay of a patient in a hospital, average duration of examination patient, average provision with doctors, beds, etc.).

3. To characterize the sanitary and epidemiological state (average dustiness of the air in the workshop, average area per person, average consumption of proteins, fats and carbohydrates, etc.).

4. To determine the medical and physiological parameters in the norm and pathology, in the processing of laboratory data, to establish the reliability of the results of a selective study in socio-hygienic, clinical, experimental studies.

Calculation of average values ​​is performed on the basis of variation series. Variation series- this is a qualitatively homogeneous statistical set, the individual units of which characterize the quantitative differences of the studied feature or phenomenon.

Quantitative variation can be of two types: discontinuous (discrete) and continuous.

A discontinuous (discrete) sign is expressed only as an integer and cannot have any intermediate values ​​(for example, the number of visits, the population of the site, the number of children in the family, the severity of the disease in points, etc.).

A continuous sign can take on any values ​​within certain limits, including fractional ones, and is expressed only approximately (for example, weight - for adults it can be limited to kilograms, and for newborns - grams; height, blood pressure, time spent on seeing a patient, and etc.).



The digital value of each individual feature or phenomenon included in the variation series is called a variant and is indicated by the letter V . There are also other notations in the mathematical literature, for example x or y.

A variational series, where each option is indicated once, is called simple. Such series are used in most statistical problems in the case of computer data processing.

With an increase in the number of observations, as a rule, there are repeated values ​​of the variant. In this case, it creates grouped variation series, where the number of repetitions is indicated (frequency, denoted by the letter " R »).

Ranked variation series consists of options arranged in ascending or descending order. Both simple and grouped series can be composed with ranking.

Interval variation series are made up in order to simplify subsequent calculations performed without using a computer, with a very large number of observation units (more than 1000).

Continuous variation series includes variant values, which can be any value.

If in the variation series the values ​​of the attribute (options) are given in the form of separate specific numbers, then such a series is called discrete.

The general characteristics of the values ​​of the attribute reflected in the variation series are the average values. Among them, the most used are: the arithmetic mean M, fashion Mo and median me. Each of these characteristics is unique. They cannot replace each other, and only in the aggregate, quite fully and in a concise form, are the features of the variational series.

Fashion (Mo) name the value of the most frequently occurring options.

Median (me) is the value of the variant dividing the ranged variational series in half (on each side of the median there is a half of the variant). In rare cases, when there is a symmetrical variation series, the mode and median are equal to each other and coincide with the value of the arithmetic mean.

The most typical characteristic of variant values ​​is arithmetic mean value( M ). In mathematical literature, it is denoted .

Arithmetic mean (M, ) is a general quantitative characteristic of a certain feature of the studied phenomena, which make up a qualitatively homogeneous statistical set. Distinguish between simple arithmetic mean and weighted mean. The simple arithmetic mean is calculated for a simple variational series by summing all the options and dividing this sum by the total number of options included in this variational series. Calculations are carried out according to the formula:

,

where: M - simple arithmetic mean;

Σ V - amount option;

n- number of observations.

In the grouped variation series, a weighted arithmetic mean is determined. The formula for its calculation:

,

where: M - arithmetic weighted average;

Σ vp - the sum of products of a variant on their frequencies;

n- number of observations.

With a large number of observations in the case of manual calculations, the method of moments can be used.

The arithmetic mean has the following properties:

the sum of the deviations of the variant from the mean ( Σ d ) is equal to zero (see Table 15);

When multiplying (dividing) all options by the same factor (divisor), the arithmetic mean is multiplied (divided) by the same factor (divider);

If you add (subtract) the same number to all options, the arithmetic mean increases (decreases) by the same number.

Arithmetic averages, taken by themselves, without taking into account the variability of the series from which they are calculated, may not fully reflect the properties of the variation series, especially when comparison with other averages is necessary. Average values ​​close in value can be obtained from series with different degrees of dispersion. The closer the individual options are to each other in terms of their quantitative characteristics, the less scattering (fluctuation, variability) series, the more typical its average.

The main parameters that allow assessing the variability of a trait are:

· scope;

Amplitude;

· Standard deviation;

· The coefficient of variation.

Approximately, the fluctuation of a trait can be judged by the scope and amplitude of the variation series. The range indicates the maximum (V max) and minimum (V min) options in the series. The amplitude (A m) is the difference between these options: A m = V max - V min .

The main, generally accepted measure of the fluctuation of the variational series are dispersion (D ). But the more convenient parameter is most often used, calculated on the basis of the variance - the standard deviation ( σ ). It takes into account the deviation value ( d ) of each variant of the variation series from its arithmetic mean ( d=V - M ).

Since the deviations of the variant from the mean can be positive and negative, when summed they give the value "0" (S d=0). To avoid this, the deviation values ​​( d) are raised to the second power and averaged. Thus, the variance of the variational series is the average square of the deviations of the variant from the arithmetic mean and is calculated by the formula:

.

It is the most important characteristic of variability and is used to calculate many statistical tests.

Because the variance is expressed as the square of the deviations, its value cannot be used in comparison with the arithmetic mean. For these purposes, it is used standard deviation, which is denoted by the sign "Sigma" ( σ ). It characterizes the average deviation of all variants of the variation series from the arithmetic mean in the same units as the mean itself, so they can be used together.

The standard deviation is determined by the formula:

This formula is applied for the number of observations ( n ) is greater than 30. With a smaller number n the value of the standard deviation will have an error associated with the mathematical bias ( n - one). In this regard, a more accurate result can be obtained by taking into account such a bias in the formula for calculating the standard deviation:

standard deviation (s ) is an estimate of the standard deviation of the random variable X relative to its mathematical expectation based on an unbiased estimate of its variance.

For values n > 30 standard deviation ( σ ) and standard deviation ( s ) will be the same ( σ=s ). Therefore, in most practical manuals, these criteria are treated as having different meanings. In Excel, the calculation of the standard deviation can be done with the function =STDEV(range). And in order to calculate the standard deviation, you need to create an appropriate formula.

The root mean square or standard deviation allows you to determine how much the values ​​of a feature can differ from the mean value. Suppose there are two cities with the same average daily temperature in summer. One of these cities is located on the coast, and the other on the continent. It is known that in cities located on the coast, the differences in daytime temperatures are less than in cities located inland. Therefore, the standard deviation of daytime temperatures near the coastal city will be less than that of the second city. In practice, this means that the average air temperature of each particular day in a city located on the continent will differ more from the average value than in a city on the coast. In addition, the standard deviation makes it possible to estimate possible temperature deviations from the average with the required level of probability.

According to the theory of probability, in phenomena that obey the normal distribution law, there is a strict relationship between the values ​​of the arithmetic mean, standard deviation and options ( three sigma rule). For example, 68.3% of the values ​​of a variable attribute are within M ± 1 σ , 95.5% - within M ± 2 σ and 99.7% - within M ± 3 σ .

The value of the standard deviation makes it possible to judge the nature of the homogeneity of the variation series and the group under study. If the value of the standard deviation is small, then this indicates a sufficiently high homogeneity of the phenomenon under study. The arithmetic mean in this case should be recognized as quite characteristic of this variational series. However, a too small sigma makes one think of an artificial selection of observations. With a very large sigma, the arithmetic mean characterizes the variation series to a lesser extent, which indicates a significant variability of the studied trait or phenomenon or the heterogeneity of the study group. However, comparison of the value of the standard deviation is possible only for signs of the same dimension. Indeed, if we compare the weight diversity of newborns and adults, we will always get higher sigma values ​​in adults.

Comparison of the variability of features of different dimensions can be performed using coefficient of variation. It expresses diversity as a percentage of the mean, which allows comparison of different traits. The coefficient of variation in the medical literature is indicated by the sign " With ", and in the mathematical " v» and calculated by the formula:

.

The values ​​of the coefficient of variation less than 10% indicate a small scattering, from 10 to 20% - about the average, more than 20% - about a strong scattering around the arithmetic mean.

The arithmetic mean is usually calculated on the basis of sample data. With repeated studies under the influence of random phenomena, the arithmetic mean may change. This is due to the fact that, as a rule, only a part of the possible units of observation, that is, a sample population, is investigated. Information about all possible units representing the phenomenon under study can be obtained by studying the entire general population, which is not always possible. At the same time, in order to generalize the experimental data, the value of the average in the general population is of interest. Therefore, in order to formulate a general conclusion about the phenomenon under study, the results obtained on the basis of the sample population must be transferred to the general population by statistical methods.

In order to determine the degree of coincidence between the sample study and the general population, it is necessary to estimate the amount of error that inevitably arises during sample observation. Such an error is called representativeness error” or “Mean error of the arithmetic mean”. It is, in fact, the difference between the averages obtained from selective statistical observation and similar values ​​that would be obtained from a continuous study of the same object, i.e. when studying the general population. Since the sample mean is a random variable, such a forecast is made with an acceptable level of probability for the researcher. In medical research, it is at least 95%.

The representativeness error should not be confused with registration errors or attentional errors (misprints, miscalculations, misprints, etc.), which should be minimized by an adequate methodology and tools used in the experiment.

The magnitude of the error of representativeness depends on both the sample size and the variability of the trait. The larger the number of observations, the closer the sample to the general population and the smaller the error. The more variable the feature, the greater the statistical error.

In practice, the following formula is used to determine the representativeness error in variational series:

,

where: m – representativeness error;

σ – standard deviation;

n is the number of observations in the sample.

It can be seen from the formula that the size of the average error is directly proportional to the standard deviation, i.e., the variability of the trait under study, and inversely proportional to the square root of the number of observations.

When performing statistical analysis based on the calculation of relative values, the construction of a variation series is not mandatory. In this case, the determination of the average error for relative indicators can be performed using a simplified formula:

,

where: R- the value of the relative indicator, expressed as a percentage, ppm, etc.;

q- the reciprocal of P and expressed as (1-P), (100-P), (1000-P), etc., depending on the basis for which the indicator is calculated;

n is the number of observations in the sample.

However, the indicated formula for calculating the representativeness error for relative values ​​can only be applied when the value of the indicator is less than its base. In a number of cases of calculating intensive indicators, this condition is not met, and the indicator can be expressed as a number of more than 100% or 1000%o. In such a situation, a variation series is constructed and the representativeness error is calculated using the formula for average values ​​based on the standard deviation.

Forecasting the value of the arithmetic mean in the general population is performed with the indication of two values ​​- the minimum and maximum. These extreme values ​​​​of possible deviations, within which the desired average value of the general population can fluctuate, are called " Confidence boundaries».

The postulates of probability theory proved that with a normal distribution of a feature with a probability of 99.7%, the extreme values ​​of the deviations of the mean will not exceed the value of the triple error of representativeness ( M ± 3 m ); in 95.5% - no more than the value of the doubled average error of the average value ( M ±2 m ); in 68.3% - no more than the value of one average error ( M ± 1 m ) (Fig. 9).

P%

Rice. 9. Probability density of normal distribution.

Note that the above statement is true only for a feature that obeys the normal Gaussian distribution law.

Most experimental studies, including those in the field of medicine, are associated with measurements, the results of which can take almost any value in a given interval, therefore, as a rule, they are described by a model of continuous random variables. In this regard, most statistical methods consider continuous distributions. One of these distributions, which plays a fundamental role in mathematical statistics, is normal, or Gaussian, distribution.

This is due to a number of reasons.

1. First of all, many experimental observations can be successfully described using a normal distribution. It should be immediately noted that there are no distributions of empirical data that would be exactly normal, since a normally distributed random variable is in the range from to , which never occurs in practice. However, the normal distribution is very often a good approximation.

Whether measurements of weight, height and other physiological parameters of the human body are carried out - everywhere a very large number of random factors (natural causes and measurement errors) influence the results. And, as a rule, the effect of each of these factors is insignificant. Experience shows that the results in such cases will be distributed approximately normally.

2. Many distributions associated with a random sample, with an increase in the volume of the latter, become normal.

3. The normal distribution is well suited as an approximate description of other continuous distributions (for example, asymmetric ones).

4. The normal distribution has a number of favorable mathematical properties, which largely ensured its widespread use in statistics.

At the same time, it should be noted that in medical data there are many experimental distributions that cannot be described by the normal distribution model. To do this, statistics have developed methods that are commonly called "Nonparametric".

The choice of a statistical method that is suitable for processing the data of a particular experiment should be made depending on whether the data obtained belong to the normal distribution law. Hypothesis testing for the subordination of a sign to the normal distribution law is performed using a histogram of the frequency distribution (graph), as well as a number of statistical criteria. Among them:

Asymmetry criterion ( b );

Criteria for checking for kurtosis ( g );

Shapiro–Wilks criterion ( W ) .

An analysis of the nature of the distribution of data (it is also called a test for the normality of the distribution) is carried out for each parameter. In order to confidently judge the correspondence of the parameter distribution to the normal law, a sufficiently large number of observation units (at least 30 values) is required.

For a normal distribution, the skewness and kurtosis criteria take the value 0. If the distribution is shifted to the right b > 0 (positive asymmetry), with b < 0 - график распределения смещен влево (отрицательная асимметрия). Критерий асимметрии проверяет форму кривой распределения. В случае нормального закона g =0. At g > 0 the distribution curve is sharper if g < 0 пик более сглаженный, чем функция нормального распределения.

To test for normality using the Shapiro-Wilks test, it is required to find the value of this criterion using statistical tables at the required level of significance and depending on the number of units of observation (degrees of freedom). Appendix 1. The hypothesis of normality is rejected for small values ​​of this criterion, as a rule, for w <0,8.

(definition of a variational series; components of a variational series; three forms of a variational series; expediency of constructing an interval series; conclusions that can be drawn from the constructed series)

A variational series is a sequence of all elements of a sample arranged in non-decreasing order. The same elements are repeated

Variational - these are series built on a quantitative basis.

Variational distribution series consist of two elements: variants and frequencies:

Variants are the numerical values ​​of a quantitative trait in the variation series of the distribution. They can be positive or negative, absolute or relative. So, when grouping enterprises according to the results of economic activity, the options are positive - this is profit, and negative numbers - this is a loss.

Frequencies are the numbers of individual variants or each group of the variation series, i.e. these are numbers showing how often certain options occur in a distribution series. The sum of all frequencies is called the volume of the population and is determined by the number of elements of the entire population.

Frequencies are frequencies expressed as relative values ​​(fractions of units or percentages). The sum of the frequencies is equal to one or 100%. The replacement of frequencies by frequencies makes it possible to compare variational series with different numbers of observations.

There are three forms of variation series: ranked series, discrete series and interval series.

A ranked series is the distribution of individual units of the population in ascending or descending order of the trait under study. Ranking makes it easy to divide quantitative data into groups, immediately detect the smallest and largest values ​​of a feature, and highlight the values ​​that are most often repeated.

Other forms of the variation series are group tables compiled according to the nature of the variation in the values ​​of the trait under study. By the nature of the variation, discrete (discontinuous) and continuous signs are distinguished.

A discrete series is such a variational series, the construction of which is based on signs with a discontinuous change (discrete signs). The latter include the tariff category, the number of children in the family, the number of employees in the enterprise, etc. These signs can take only a finite number of certain values.

A discrete variational series is a table that consists of two columns. The first column indicates the specific value of the attribute, and the second - the number of population units with a specific value of the attribute.

If a sign has a continuous change (the amount of income, work experience, the cost of fixed assets of an enterprise, etc., which can take any value within certain limits), then an interval variation series must be built for this sign.



The group table here also has two columns. The first indicates the value of the feature in the interval "from - to" (options), the second - the number of units included in the interval (frequency).

Frequency (repetition frequency) - the number of repetitions of a particular variant of the attribute values, denoted fi , and the sum of frequencies equal to the volume of the studied population, denoted

Where k is the number of attribute value options

Very often, the table is supplemented with a column in which the accumulated frequencies S are calculated, which show how many units of the population have a feature value no greater than this value.

A discrete variational distribution series is a series in which groups are composed according to a trait that varies discretely and takes only integer values.

The interval variation series of distribution is a series in which the grouping attribute, which forms the basis of the grouping, can take any values ​​in a certain interval, including fractional ones.

An interval variational series is an ordered set of intervals of variation of the values ​​of a random variable with the corresponding frequencies or frequencies of the values ​​of the quantity falling into each of them.

It is expedient to build an interval distribution series, first of all, with a continuous variation of a trait, and also if a discrete variation manifests itself over a wide range, i.e. the number of options for a discrete feature is quite large.

Several conclusions can already be drawn from this series. For example, the average element of a variation series (median) can be an estimate of the most probable result of a measurement. The first and last element of the variational series (i.e., the minimum and maximum element of the sample) show the spread of the elements of the sample. Sometimes, if the first or last element is very different from the rest of the sample, then they are excluded from the measurement results, considering that these values ​​were obtained as a result of some kind of gross failure, for example, technology.