The ratio of the standard deviation to the mean. Standard deviation

Dispersion. Standard deviation

Dispersion is the arithmetic mean of the squared deviations of each feature value from the total mean. Depending on the source data, the variance can be unweighted (simple) or weighted.

The dispersion is calculated using the following formulas:

for ungrouped data

for grouped data

The procedure for calculating the weighted variance:

1. determine the arithmetic weighted average

2. Variant deviations from the mean are determined

3. square the deviation of each option from the mean

4. multiply squared deviations by weights (frequencies)

5. summarize the received works

6. the resulting amount is divided by the sum of the weights

The formula for determining the variance can be converted to the following formula:

- simple

The procedure for calculating the variance is simple:

1. determine the arithmetic mean

2. square the arithmetic mean

3. square each row option

4. find the sum of squares option

5. divide the sum of the squares of the option by their number, i.e. determine the mean square

6. determine the difference between the mean square of the feature and the square of the mean

Also the formula for determining the weighted variance can be converted to the following formula:

those. the variance is equal to the difference between the mean of the squares of the feature values ​​and the square of the arithmetic mean. When using the transformed formula, an additional procedure for calculating the deviations of individual values ​​of a feature from x is excluded and an error in the calculation associated with rounding deviations is excluded

The dispersion has a number of properties, some of which make it easier to calculate:

1) the dispersion of a constant value is zero;

2) if all variants of the attribute values ​​are reduced by the same number, then the variance will not decrease;

3) if all variants of the attribute values ​​are reduced by the same number of times (times), then the variance will decrease by a factor of

Standard deviation S- is the square root of the variance:

For ungrouped data:

;

For a variation series:

The range of variation, mean linear and mean square deviation are named quantities. They have the same units of measure as the individual characteristic values.

Dispersion and standard deviation are the most widely used measures of variation. This is explained by the fact that they are included in most theorems of probability theory, which serves as the foundation of mathematical statistics. In addition, the variance can be decomposed into its constituent elements, allowing to assess the influence of various factors that cause the variation of a trait.

The calculation of variation indicators for banks grouped by profit is shown in the table.

Profit, million rubles Number of banks calculated indicators
3,7 - 4,6 (-) 4,15 8,30 -1,935 3,870 7,489
4,6 - 5,5 5,05 20,20 - 1,035 4,140 4,285
5,5 - 6,4 5,95 35,70 - 0,135 0,810 0,109
6,4 - 7,3 6,85 34,25 +0,765 3,825 2,926
7,3 - 8,2 7,75 23,25 +1,665 4,995 8,317
Total: 121,70 17,640 23,126

The mean linear and mean square deviation show how much the value of the attribute fluctuates on average for the units and the population under study. So, in this case, the average value of the fluctuation in the amount of profit is: according to the average linear deviation, 0.882 million rubles; according to the standard deviation - 1.075 million rubles. The standard deviation is always greater than the average linear deviation. If the distribution of the trait is close to normal, then there is a relationship between S and d: S=1.25d, or d=0.8S. The standard deviation shows how the bulk of the population units are located relative to the arithmetic mean. Regardless of the form of distribution, 75 attribute values ​​fall within the x 2S interval, and at least 89 of all values ​​fall within the x 3S interval (P.L. Chebyshev’s theorem).

To calculate the geometric mean simple, the formula is used:

geometric weighted

To determine the geometric weighted average, the formula is used:

The average diameters of wheels, pipes, the average sides of the squares are determined using the root mean square.

RMS values ​​are used to calculate some indicators, such as the coefficient of variation, which characterizes the rhythm of output. Here, the standard deviation from the planned output for a certain period is determined by the following formula:

These values ​​accurately characterize the change in economic indicators compared to their base value, taken in its average value.

Quadratic simple

The mean square simple is calculated by the formula:

Quadratic weighted

The weighted root mean square is:

22. Absolute measures of variation include:

range of variation

mean linear deviation

dispersion

standard deviation

Range of variation (r)

Span variation is the difference between the maximum and minimum values ​​of the attribute

It shows the limits in which the value of the attribute changes in the studied population.

The work experience of five applicants in the previous job is: 2,3,4,7 and 9 years. Solution: range of variation = 9 - 2 = 7 years.

For a generalized characteristic of the differences in the values ​​of the attribute, the average variation indicators are calculated based on the allowance for deviations from the arithmetic mean. The difference is taken as the deviation from the mean.

At the same time, in order to avoid turning into zero the sum of deviations of the trait options from the mean (the zero property of the mean), one has to either ignore the signs of the deviation, that is, take this sum modulo , or square the deviation values

Mean linear and square deviation

Average linear deviation is the arithmetic mean of the absolute deviations of the individual values ​​of the attribute from the mean.

The average linear deviation is simple:

The work experience of five applicants in the previous job is: 2,3,4,7 and 9 years.

In our example: years;

Answer: 2.4 years.

Average linear deviation weighted applies to grouped data:

The average linear deviation, due to its conventionality, is used relatively rarely in practice (in particular, to characterize the fulfillment of contractual obligations in terms of the uniformity of delivery; in the analysis of product quality, taking into account the technological features of production).

Standard deviation

The most perfect characteristic of variation is the standard deviation, which is called the standard (or standard deviation). Standard deviation() is equal to the square root of the mean square of the deviations of the individual values ​​of the feature from the arithmetic mean:

The standard deviation is simple:

The weighted standard deviation is applied for grouped data:

Between the mean square and mean linear deviations under conditions of normal distribution, the following relationship takes place: ~ 1.25.

The standard deviation, being the main absolute measure of variation, is used in determining the values ​​of the ordinates of the normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the boundaries of the variation of a trait in a homogeneous population.

Standard deviation is one of those statistical terms in the corporate world that raises the profile of people who manage to screw it up successfully in a conversation or presentation, and leaves a vague misunderstanding for those who don't know what it is but are embarrassed to ask. In fact, most managers don't understand the concept of standard deviation, and if you're one of them, it's time for you to stop living the lie. In today's article, I'll show you how this underrated statistic can help you better understand the data you're working with.

What does standard deviation measure?

Imagine that you are the owner of two stores. And in order to avoid losses, it is important that there is a clear control of stock balances. In an attempt to find out who is the best stock manager, you decide to analyze stocks from the past six weeks. The average weekly cost of the stock of both stores is approximately the same and is about 32 conventional units. At first glance, the average value of the stock shows that both managers work in the same way.

But if you take a closer look at the activity of the second store, you can see that although the average value is correct, the stock variability is very high (from 10 to 58 USD). Thus, it can be concluded that the mean does not always correctly estimate the data. This is where the standard deviation comes in.

The standard deviation shows how the values ​​are distributed relative to the mean in our . In other words, you can understand how big the runoff is from week to week.

In our example, we used the Excel function STDEV to calculate the standard deviation along with the mean.

In the case of the first manager, the standard deviation was 2. This tells us that each value in the sample deviates on average by 2 from the mean. Is it good? Let's look at the question from a different angle - a standard deviation of 0 tells us that each value in the sample is equal to its mean value (in our case, 32.2). For example, a standard deviation of 2 is not much different from 0, indicating that most of the values ​​are close to the mean. The closer the standard deviation is to 0, the more reliable the mean. Moreover, a standard deviation close to 0 indicates little variability in the data. That is, a sink value with a standard deviation of 2 indicates the first manager's incredible consistency.

In the case of the second store, the standard deviation was 18.9. That is, the cost of the runoff deviates on average by 18.9 from the average value from week to week. Crazy spread! The further the standard deviation is from 0, the less accurate the mean. In our case, the figure of 18.9 indicates that the average value ($32.8 per week) simply cannot be trusted. It also tells us that the weekly runoff is highly variable.

This is the concept of standard deviation in a nutshell. Although it does not provide insight into other important statistical measurements (Mode, Median…), in fact the standard deviation plays a crucial role in most statistical calculations. Understanding the principles of standard deviation will shed light on the essence of many processes in your activity.

How to calculate standard deviation?

So, now we know what the standard deviation figure says. Let's see how it counts.

Consider a data set from 10 to 70 in increments of 10. As you can see, I have already calculated the standard deviation for them using the STDEV function in cell H2 (orange).

Below are the steps Excel takes to arrive at 21.6.

Please note that all calculations are visualized for better understanding. In fact, in Excel, the calculation is instantaneous, leaving all the steps behind the scenes.

Excel first finds the mean of the sample. In our case, the average turned out to be 40, which is subtracted from each sample value in the next step. Each resulting difference is squared and summed up. We got the sum equal to 2800, which must be divided by the number of sample elements minus 1. Since we have 7 elements, it turns out that we need to divide 2800 by 6. From the result we find the square root, this figure will be the standard deviation.

For those who are not entirely clear on the principle of calculating the standard deviation using visualization, I give a mathematical interpretation of finding this value.

Standard deviation calculation functions in Excel

There are several varieties of standard deviation formulas in Excel. You just need to type =STDEV and you will see for yourself.

It is worth noting that the functions STDEV.V and STDEV.G (the first and second functions in the list) duplicate the functions STDEV and STDEV (the fifth and sixth functions in the list), respectively, which were retained for compatibility with earlier versions of Excel.

In general, the difference in the endings. In and. G functions indicate the principle of calculating the standard deviation of a sample or population. I already explained the difference between these two arrays in the previous one.

A feature of the STDEV and STDEVPA functions (the third and fourth functions in the list) is that when calculating the standard deviation of an array, logical and text values ​​are taken into account. Text and true booleans are 1, and false booleans are 0. It's hard for me to imagine a situation where I would need these two functions, so I think they can be ignored.

The values ​​obtained from experience inevitably contain errors due to a variety of reasons. Among them, systematic and random errors should be distinguished. Systematic errors are due to causes that act in a very specific way, and can always be eliminated or taken into account with sufficient accuracy. Random errors are caused by a very large number of individual causes that cannot be accurately accounted for and act differently in each individual measurement. These errors cannot be completely ruled out; they can be taken into account only on the average, for which it is necessary to know the laws to which random errors are subject.

We will denote the measured value by A, and the random error in the measurement x. Since the error x can take any value, it is a continuous random variable, which is fully characterized by its own distribution law.

The simplest and most accurately reflecting reality (in the vast majority of cases) is the so-called normal distribution of errors:

This distribution law can be obtained from various theoretical premises, in particular, from the requirement that the most probable value of an unknown quantity for which a series of values ​​with the same degree of accuracy is obtained by direct measurement is the arithmetic mean of these values. The value 2 is called dispersion of this normal law.

Average

Determination of dispersion according to experimental data. If for any quantity A, n values ​​a i are obtained by direct measurement with the same degree of accuracy, and if the errors in the quantity A are subject to the normal distribution law, then the most probable value of A will be average:

a - arithmetic mean,

a i - measured value at the i-th step.

Deviation of the observed value (for each observation) a i of the value A from arithmetic mean: a i - a.

To determine the dispersion of the normal distribution of errors in this case, use the formula:

2 - dispersion,
a - arithmetic mean,
n is the number of parameter measurements,

standard deviation

standard deviation shows the absolute deviation of the measured values ​​from arithmetic mean. In accordance with the formula for the linear combination accuracy measure root mean square error the arithmetic mean is determined by the formula:

, where


a - arithmetic mean,
n is the number of parameter measurements,
a i - measured value at the i-th step.

The coefficient of variation

The coefficient of variation characterizes the relative degree of deviation of the measured values ​​from arithmetic mean:

, where

V - coefficient of variation,
- standard deviation,
a - arithmetic mean.

The greater the value coefficient of variation, the relatively greater the scatter and the less uniformity of the studied values. If a the coefficient of variation less than 10%, then the variability of the variation series is considered to be insignificant, from 10% to 20% refers to the average, more than 20% and less than 33% to significant, and if the coefficient of variation exceeds 33%, this indicates the heterogeneity of information and the need to exclude the largest and smallest values.

Average linear deviation

One of the indicators of the range and intensity of variation is mean linear deviation(average modulus of deviation) from the arithmetic mean. Average linear deviation calculated by the formula:

, where

_
a - average linear deviation,
a - arithmetic mean,
n is the number of parameter measurements,
a i - measured value at the i-th step.

To check the compliance of the studied values ​​with the law of normal distribution, the relation is used asymmetry index to his mistake and attitude kurtosis indicator to his mistake.

Asymmetry index

Asymmetry index(A) and its error (m a) is calculated using the following formulas:

, where

A - asymmetry indicator,
- standard deviation,
a - arithmetic mean,
n is the number of parameter measurements,
a i - measured value at the i-th step.

Kurtosis indicator

Kurtosis indicator(E) and its error (m e) is calculated using the following formulas:

, where

Standard deviation is a classic indicator of variability from descriptive statistics.

Standard deviation, standard deviation, RMS, sample standard deviation (English standard deviation, STD, STDev) is a very common measure of dispersion in descriptive statistics. But, because technical analysis is akin to statistics, this indicator can (and should) be used in technical analysis to detect the degree of dispersion of the price of the analyzed instrument over time. Denoted by the Greek symbol Sigma "σ".

Thanks to Karl Gauss and Pearson for the fact that we have the opportunity to use the standard deviation.

Using standard deviation in technical analysis, we turn this "scattering index" in "volatility indicator“Keeping the meaning but changing the terms.

What is Standard Deviation

But in addition to intermediate auxiliary calculations, standard deviation is quite acceptable for self-calculation and applications in technical analysis. As noted by an active reader of our magazine burdock, “ I still don’t understand why RMS is not included in the set of standard indicators of domestic dealing centers«.

Really, standard deviation can in a classical and "pure" way measure the variability of an instrument. But unfortunately, this indicator is not so common in securities analysis.

Applying the Standard Deviation

Manually calculating the standard deviation is not very interesting. but useful for experience. The standard deviation can be expressed formula STD=√[(∑(x-x ) 2)/n] , which sounds like the root sum of the squared differences between the sample items and the mean, divided by the number of items in the sample.

If the number of elements in the sample exceeds 30, then the denominator of the fraction under the root takes on the value n-1. Otherwise, n is used.

step by step standard deviation calculation:

  1. calculate the arithmetic mean of the data sample
  2. subtract this average from each element of the sample
  3. all the resulting differences are squared
  4. sum all the resulting squares
  5. divide the resulting sum by the number of elements in the sample (or by n-1 if n>30)
  6. calculate the square root of the resulting quotient (called dispersion)