Calculate the confidence interval. Quantitative Analysis Methods: Estimating Confidence Intervals

The confidence interval came to us from the field of statistics. This is specific range, which serves to estimate the unknown parameter with a high degree reliability. The easiest way to explain this is with an example.

Suppose you need to investigate some random variable, for example, the speed of the server's response to a client request. Each time a user types in the address of a particular site, the server responds at a different rate. Thus, the investigated response time has a random character. So here it is confidence interval allows you to determine the boundaries of this parameter, and then it will be possible to assert that with a probability of 95% the server will be in the range calculated by us.

Or you need to find out how many people know about trademark firms. When the confidence interval is calculated, it will be possible, for example, to say that with a 95% probability the share of consumers who know about this is in the range from 27% to 34%.

Closely related to this term is confidence level. It represents the probability that the desired parameter is included in the confidence interval. This value determines how large our desired range will be. How greater value it accepts, the narrower the confidence interval becomes, and vice versa. Usually it is set to 90%, 95% or 99%. The value of 95% is the most popular.

This indicator is also influenced by the variance of observations and its definition is based on the assumption that the feature under study obeys. This statement is also known as Gauss' Law. According to him, such a distribution of all probabilities of a continuous random variable, which can be described by the probability density. If the assumption about normal distribution turned out to be erroneous, then the estimate may be incorrect.

First, let's figure out how to calculate the confidence interval for Here, two cases are possible. Dispersion (the degree of spread of a random variable) may or may not be known. If it is known, then our confidence interval is calculated using the following formula:

xsr - t*σ / (sqrt(n))<= α <= хср + t*σ / (sqrt(n)), где

α - sign,

t is a parameter from the Laplace distribution table,

σ is the square root of the dispersion.

If the variance is unknown, then it can be calculated if we know all the values ​​of the desired feature. For this, the following formula is used:

σ2 = х2ср - (хр)2, where

х2ср - the average value of the squares of the trait under study,

(xsr)2 is the square of this attribute.

The formula by which the confidence interval is calculated in this case changes slightly:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n)), где

xsr - sample mean,

α - sign,

t is a parameter that is found using the Student's distribution table t \u003d t (ɣ; n-1),

sqrt(n) is the square root of the total sample size,

s is the square root of the variance.

Consider this example. Assume that, based on the results of 7 measurements, the trait under study was determined to be 30 and the sample variance equal to 36. It is necessary to find, with a probability of 99%, a confidence interval that contains the true value of the measured parameter.

First, let's determine what t is equal to: t \u003d t (0.99; 7-1) \u003d 3.71. Using the above formula, we get:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n))

30 - 3.71*36 / (sqrt(7))<= α <= 30 + 3.71*36 / (sqrt(7))

21.587 <= α <= 38.413

The confidence interval for the variance is calculated both in the case of a known mean and when there is no data on the mathematical expectation, and only the value of the unbiased point estimate of the variance is known. We will not give here the formulas for its calculation, since they are quite complex and, if desired, they can always be found on the net.

We only note that it is convenient to determine the confidence interval using the Excel program or a network service, which is called so.

One of the methods for solving statistical problems is the calculation of the confidence interval. It is used as a preferred alternative to point estimation when the sample size is small. It should be noted that the process of calculating the confidence interval is rather complicated. But the tools of the Excel program allow you to somewhat simplify it. Let's find out how this is done in practice.

This method is used in the interval estimation of various statistical quantities. The main task of this calculation is to get rid of the uncertainties of the point estimate.

In Excel, there are two main options to calculate using this method: when the variance is known, and when it is unknown. In the first case, the function is used for calculations CONFIDENCE NORM, and in the second TRUST.STUDENT.

Method 1: CONFIDENCE NORM function

Operator CONFIDENCE NORM, which refers to the statistical group of functions, first appeared in Excel 2010. Earlier versions of this program use its counterpart TRUST. The task of this operator is to calculate a confidence interval with a normal distribution for the population mean.

Its syntax is as follows:

CONFIDENCE NORM(alpha, standard_dev, size)

"Alpha" is an argument indicating the level of significance that is used to calculate the confidence level. The confidence level is equal to the following expression:

(1-"Alpha")*100

"Standard deviation" is an argument, the essence of which is clear from the name. This is the standard deviation of the proposed sample.

"The size" is an argument that determines the size of the sample.

All arguments to this operator are required.

Function TRUST has exactly the same arguments and possibilities as the previous one. Its syntax is:

TRUST(alpha, standard_dev, size)

As you can see, the differences are only in the name of the operator. This feature has been retained in Excel 2010 and newer versions in a special category for compatibility reasons. "Compatibility". In versions of Excel 2007 and earlier, it is present in the main group of statistical operators.

The confidence interval boundary is determined using the formula of the following form:

X+(-)CONFIDENCE NORM

Where X is the sample mean, which is located in the middle of the selected range.

Now let's look at how to calculate the confidence interval using a specific example. 12 tests were carried out, resulting in different results, which are listed in the table. This is our totality. The standard deviation is 8. We need to calculate the confidence interval at the 97% confidence level.

  1. Select the cell where the result of data processing will be displayed. Clicking on the button "Insert Function".
  2. Appears Function Wizard. Go to category "Statistical" and highlight the name "CONFIDENCE.NORM". After that click on the button OK.
  3. The arguments window opens. Its fields naturally correspond to the names of the arguments.
    Set the cursor to the first field - "Alpha". Here we should specify the level of significance. As we remember, our level of trust is 97%. At the same time, we said that it is calculated in this way:

    (1-trust level)/100

    That is, by substituting the value, we get:

    By simple calculations, we find out that the argument "Alpha" equals 0,03 . Enter this value in the field.

    As you know, the standard deviation is equal to 8 . Therefore, in the field "Standard deviation" just write down that number.

    In field "The size" you need to enter the number of elements of the tests performed. As we remember, they 12 . But in order to automate the formula and not edit it every time a new test is performed, let's set this value not to an ordinary number, but using the operator CHECK. So, we set the cursor in the field "The size", and then click on the triangle, which is located to the left of the formula bar.

    A list of recently used functions appears. If the operator CHECK used by you recently, it should be on this list. In this case, you just need to click on its name. Otherwise, if you do not find it, then go to the point "More features...".

  4. Appears already familiar to us Function Wizard. Moving back to the group "Statistical". We select the name there "CHECK". Click on the button OK.
  5. The argument window for the above operator appears. This function is designed to calculate the number of cells in the specified range that contain numeric values. Its syntax is the following:

    COUNT(value1, value2,…)

    Argument group "Values" is a reference to the range in which you want to calculate the number of cells filled with numeric data. In total, there can be up to 255 such arguments, but in our case we need only one.

    Set the cursor in the field "Value1" and, holding down the left mouse button, select the range on the sheet that contains our population. Then its address will be displayed in the field. Click on the button OK.

  6. After that, the application will perform the calculation and display the result in the cell where it is itself. In our particular case, the formula turned out like this:

    CONFIDENCE NORM(0.03,8,COUNT(B2:B13))

    The overall result of the calculations was 5,011609 .

  7. But that is not all. As we remember, the boundary of the confidence interval is calculated by adding and subtracting from the average sample value of the calculation result CONFIDENCE NORM. In this way, the right and left boundaries of the confidence interval are calculated, respectively. The sample mean itself can be calculated using the operator AVERAGE.

    This operator is designed to calculate the arithmetic mean of the selected range of numbers. It has the following rather simple syntax:

    AVERAGE(number1, number2,…)

    Argument "Number" can be either a single numeric value or a reference to cells or even entire ranges that contain them.

    So, select the cell in which the calculation of the average value will be displayed, and click on the button "Insert Function".

  8. opens Function Wizard. Back to category "Statistical" and select a name from the list "AVERAGE". As always, click on the button OK.
  9. The arguments window is launched. Set the cursor in the field "Number1" and with the left mouse button pressed, select the entire range of values. After the coordinates are displayed in the field, click on the button OK.
  10. After that AVERAGE outputs the result of the calculation to a sheet element.
  11. We calculate the right boundary of the confidence interval. To do this, select a separate cell, put the sign «=» and add the contents of the sheet elements in which the results of the calculation of functions are located AVERAGE and CONFIDENCE NORM. In order to perform the calculation, press the button Enter. In our case, we got the following formula:

    Calculation result: 6,953276

  12. In the same way, we calculate the left boundary of the confidence interval, only this time from the result of the calculation AVERAGE subtract the result of the calculation of the operator CONFIDENCE NORM. It turns out the formula for our example of the following type:

    Calculation result: -3,06994

  13. We tried to describe in detail all the steps for calculating the confidence interval, so we described each formula in detail. But you can combine all the actions in one formula. The calculation of the right bound of the confidence interval can be written as follows:

    AVERAGE(B2:B13)+CONFIDENCE(0.03,8,COUNT(B2:B13))

  14. A similar calculation of the left border would look like this:

    AVERAGE(B2:B13)-CONFIDENCE.NORM(0.03,8,COUNT(B2:B13))

Method 2: TRUST.STUDENT function

In addition, there is another function in Excel that is related to the calculation of the confidence interval - TRUST.STUDENT. It has appeared only since Excel 2010. This operator performs the calculation of the population confidence interval using Student's distribution. It is very convenient to use it in the case when the variance and, accordingly, the standard deviation are unknown. The operator syntax is:

TRUST.STUDENT(alpha,standard_dev,size)

As you can see, the names of the operators in this case remained unchanged.

Let's see how to calculate the boundaries of the confidence interval with an unknown standard deviation using the example of the same population that we considered in the previous method. The level of confidence, like last time, we will take 97%.

  1. Select the cell in which the calculation will be made. Click on the button "Insert Function".
  2. In the opened Function Wizard go to category "Statistical". Choose a name "TRUST.STUDENT". Click on the button OK.
  3. The argument window for the specified operator is launched.

    In field "Alpha", given that the confidence level is 97%, we write down the number 0,03 . The second time we will not dwell on the principles of calculating this parameter.

    After that, set the cursor in the field "Standard deviation". This time, this indicator is unknown to us and it needs to be calculated. This is done using a special function - STDEV.B. To call the window of this operator, click on the triangle to the left of the formula bar. If we do not find the desired name in the list that opens, then go to the item "More features...".

  4. is running Function Wizard. Moving to category "Statistical" and mark the name "STDEV.B". Then click on the button OK.
  5. The arguments window opens. operator task STDEV.B is the definition of standard deviation in sampling. Its syntax looks like this:

    STDEV.V(number1,number2,…)

    It is easy to guess that the argument "Number" is the address of the selection element. If the selection is placed in a single array, then using only one argument, you can give a link to this range.

    Set the cursor in the field "Number1" and, as always, holding down the left mouse button, select the set. After the coordinates are in the field, do not rush to press the button OK because the result will be incorrect. First we need to return to the operator arguments window TRUST.STUDENT to make the final argument. To do this, click on the appropriate name in the formula bar.

  6. The argument window of the already familiar function opens again. Set the cursor in the field "The size". Again, click on the triangle already familiar to us to go to the choice of operators. As you understand, we need a name "CHECK". Since we used this function in the calculations in the previous method, it is present in this list, so just click on it. If you do not find it, then follow the algorithm described in the first method.
  7. Getting into the arguments window CHECK, put the cursor in the field "Number1" and with the mouse button held down, select the collection. Then click on the button OK.
  8. After that, the program calculates and displays the value of the confidence interval.
  9. To determine the boundaries, we will again need to calculate the sample mean. But, given that the calculation algorithm using the formula AVERAGE the same as in the previous method, and even the result has not changed, we will not dwell on this in detail a second time.
  10. Adding up the results of the calculation AVERAGE and TRUST.STUDENT, we obtain the right boundary of the confidence interval.
  11. Subtracting from the calculation results of the operator AVERAGE calculation result TRUST.STUDENT, we have the left bound of the confidence interval.
  12. If the calculation is written in one formula, then the calculation of the right border in our case will look like this:

    AVERAGE(B2:B13)+STUDENT CONFIDENCE(0.03,STDV(B2:B13),COUNT(B2:B13))

  13. Accordingly, the formula for calculating the left border will look like this:

    AVERAGE(B2:B13)-STUDENT CONFIDENCE(0.03,STDV(B2:B13),COUNT(B2:B13))

As you can see, the tools of the Excel program make it possible to significantly facilitate the calculation of the confidence interval and its boundaries. For these purposes, separate operators are used for samples whose variance is known and unknown.

And others. All of them are estimates of their theoretical counterparts, which could be obtained if there were not a sample, but the general population. But alas, the general population is very expensive and often unavailable.

The concept of interval estimation

Any sample estimate has some scatter, because is a random variable depending on the values ​​in a particular sample. Therefore, for more reliable statistical inferences, one should know not only the point estimate, but also the interval, which with a high probability γ (gamma) covers the estimated indicator θ (theta).

Formally, these are two such values ​​(statistics) T1(X) and T2(X), what T1< T 2 , for which at a given level of probability γ condition is met:

In short, it is likely γ or more the true value is between the points T1(X) and T2(X), which are called the lower and upper bounds confidence interval.

One of the conditions for constructing confidence intervals is its maximum narrowness, i.e. it should be as short as possible. Desire is quite natural, because. the researcher tries to more accurately localize the finding of the desired parameter.

It follows that the confidence interval should cover the maximum probabilities of the distribution. and the score itself be at the center.

That is, the probability of deviation (of the true indicator from the estimate) upwards is equal to the probability of deviation downwards. It should also be noted that for skewed distributions, the interval on the right is not equal to the interval on the left.

The figure above clearly shows that the greater the confidence level, the wider the interval - a direct relationship.

This was a small introduction to the theory of interval estimation of unknown parameters. Let's move on to finding confidence limits for the mathematical expectation.

Confidence interval for mathematical expectation

If the original data are distributed over , then the average will be a normal value. This follows from the rule that a linear combination of normal values ​​also has a normal distribution. Therefore, to calculate the probabilities, we could use the mathematical apparatus of the normal distribution law.

However, this will require the knowledge of two parameters - the expected value and the variance, which are usually not known. You can, of course, use estimates instead of parameters (arithmetic mean and ), but then the distribution of the mean will not be quite normal, it will be slightly flattened down. Citizen William Gosset of Ireland adroitly noted this fact when he published his discovery in the March 1908 issue of Biometrica. For secrecy purposes, Gosset signed with Student. This is how the Student's t-distribution appeared.

However, the normal distribution of data, used by K. Gauss in the analysis of errors in astronomical observations, is extremely rare in terrestrial life and it is quite difficult to establish this (for high accuracy, about 2 thousand observations are needed). Therefore, it is best to drop the normality assumption and use methods that do not depend on the distribution of the original data.

The question arises: what is the distribution of the arithmetic mean if it is calculated from the data of an unknown distribution? The answer is given by the well-known in probability theory Central limit theorem(CPT). In mathematics, there are several versions of it (the formulations have been refined over the years), but all of them, roughly speaking, come down to the statement that the sum of a large number of independent random variables obeys the normal distribution law.

When calculating the arithmetic mean, the sum of random variables is used. From this it turns out that the arithmetic mean has a normal distribution, in which the expected value is the expected value of the initial data, and the variance is .

Smart people know how to prove the CLT, but we will verify this with the help of an experiment conducted in Excel. Let's simulate a sample of 50 uniformly distributed random variables (using the Excel function RANDOMBETWEEN). Then we will make 1000 such samples and calculate the arithmetic mean for each. Let's look at their distribution.

It can be seen that the distribution of the average is close to the normal law. If the volume of samples and their number are made even larger, then the similarity will be even better.

Now that we have seen for ourselves the validity of the CLT, we can, using , calculate the confidence intervals for the arithmetic mean, which cover the true mean or mathematical expectation with a given probability.

To establish the upper and lower bounds, it is required to know the parameters of the normal distribution. As a rule, they are not, therefore, estimates are used: arithmetic mean and sample variance. Again, this method gives a good approximation only for large samples. When the samples are small, it is often recommended to use Student's distribution. Don't believe! Student's distribution for the mean occurs only when the original data has a normal distribution, that is, almost never. Therefore, it is better to immediately set the minimum bar for the amount of required data and use asymptotically correct methods. They say 30 observations are enough. Take 50 - you can't go wrong.

T 1.2 are the lower and upper bounds of the confidence interval

– sample arithmetic mean

s0– sample standard deviation (unbiased)

n – sample size

γ – confidence level (usually equal to 0.9, 0.95 or 0.99)

c γ =Φ -1 ((1+γ)/2) is the reciprocal of the standard normal distribution function. In simple terms, this is the number of standard errors from the arithmetic mean to the lower or upper bound (the indicated three probabilities correspond to the values ​​\u200b\u200bof 1.64, 1.96 and 2.58).

The essence of the formula is that the arithmetic mean is taken and then a certain amount is set aside from it ( with γ) standard errors ( s 0 /√n). Everything is known, take it and count.

Before the mass use of PCs, to obtain the values ​​​​of the normal distribution function and its inverse, they used . They are still used, but it is more efficient to turn to ready-made Excel formulas. All elements from the formula above ( , and ) can be easily calculated in Excel. But there is also a ready-made formula for calculating the confidence interval - CONFIDENCE NORM. Its syntax is the following.

CONFIDENCE NORM(alpha, standard_dev, size)

alpha– significance level or confidence level, which in the above notation is equal to 1-γ, i.e. the probability that the mathematicalthe expectation will be outside the confidence interval. With a confidence level of 0.95, alpha is 0.05, and so on.

standard_off is the standard deviation of the sample data. You don't need to calculate the standard error, Excel will divide by the root of n.

the size– sample size (n).

The result of the CONFIDENCE.NORM function is the second term from the formula for calculating the confidence interval, i.e. half-interval. Accordingly, the lower and upper points are the average ± the obtained value.

Thus, it is possible to build a universal algorithm for calculating confidence intervals for the arithmetic mean, which does not depend on the distribution of the initial data. The price for universality is its asymptotic nature, i.e. the need to use relatively large samples. However, in the age of modern technology, collecting the right amount of data is usually not difficult.

Testing Statistical Hypotheses Using a Confidence Interval

(module 111)

One of the main problems solved in statistics is. In a nutshell, its essence is this. An assumption is made, for example, that the expectation of the general population is equal to some value. Then the distribution of sample means is constructed, which can be observed with a given expectation. Next, we look at where in this conditional distribution the real average is located. If it goes beyond the allowable limits, then the appearance of such an average is very unlikely, and with a single repetition of the experiment it is almost impossible, which contradicts the hypothesis put forward, which is successfully rejected. If the average does not go beyond the critical level, then the hypothesis is not rejected (but it is not proved either!).

So, with the help of confidence intervals, in our case for the expectation, you can also test some hypotheses. It's very easy to do. Suppose the arithmetic mean for some sample is 100. The hypothesis is being tested that the expected value is, say, 90. That is, if we put the question primitively, it sounds like this: can it be that with the true value of the average equal to 90, the observed the average was 100?

To answer this question, additional information on standard deviation and sample size will be required. Let's say the standard deviation is 30, and the number of observations is 64 (to easily extract the root). Then the standard error of the mean is 30/8 or 3.75. To calculate the 95% confidence interval, you will need to set aside two standard errors on both sides of the mean (more precisely, 1.96). The confidence interval will be approximately 100 ± 7.5, or from 92.5 to 107.5.

Further reasoning is as follows. If the tested value falls within the confidence interval, then it does not contradict the hypothesis, since fits within the limits of random fluctuations (with a probability of 95%). If the tested point is outside the confidence interval, then the probability of such an event is very small, in any case below the acceptable level. Hence, the hypothesis is rejected as contradicting the observed data. In our case, the expectation hypothesis is outside the confidence interval (the tested value of 90 is not included in the interval of 100±7.5), so it should be rejected. Answering the primitive question above, one should say: no, it cannot, in any case, this happens extremely rarely. Often, this indicates a specific probability of erroneous rejection of the hypothesis (p-level), and not a given level, according to which the confidence interval was built, but more on that another time.

As you can see, it is not difficult to build a confidence interval for the mean (or mathematical expectation). The main thing is to catch the essence, and then things will go. In practice, most use the 95% confidence interval, which is about two standard errors wide on either side of the mean.

That's all for now. All the best!

From this article you will learn:

    What confidence interval?

    What is the point 3 sigma rules?

    How can this knowledge be put into practice?

Nowadays, due to an overabundance of information associated with a large assortment of products, sales directions, employees, activities, etc., it's hard to pick out the main, which, first of all, is worth paying attention to and making efforts to manage. Definition confidence interval and analysis of going beyond its boundaries of actual values ​​- a technique that help you identify situations, influencing trends. You will be able to develop positive factors and reduce the influence of negative ones. This technology is used in many well-known world companies.

There are so-called alerts", which inform managers stating that the next value in a certain direction went beyond confidence interval. What does this mean? This is a signal that some non-standard event has occurred, which may change the existing trend in this direction. This is the signal to that to sort it out in the situation and understand what influenced it.

For example, consider several situations. We have calculated the sales forecast with forecast boundaries for 100 commodity items for 2011 by months and actual sales in March:

  1. For "Sunflower oil" they broke through the upper limit of the forecast and did not fall into the confidence interval.
  2. For "Dry yeast" went beyond the lower limit of the forecast.
  3. On "Oatmeal Porridge" broke through the upper limit.

For the rest of the goods, the actual sales were within the specified forecast boundaries. Those. their sales were in line with expectations. So, we identified 3 products that went beyond the borders, and began to figure out what influenced the going beyond the borders:

  1. With Sunflower Oil, we entered a new trading network, which gave us additional sales volume, which led to going beyond the upper limit. For this product, it is worth recalculating the forecast until the end of the year, taking into account the forecast for sales to this chain.
  2. For Dry Yeast, the car got stuck at customs, and there was a shortage within 5 days, which affected the decline in sales and going beyond the lower border. It may be worthwhile to figure out what caused the cause and try not to repeat this situation.
  3. For Oatmeal, a sales promotion was launched, which resulted in a significant increase in sales and led to an overshoot of the forecast.

We identified 3 factors that influenced the overshoot of the forecast. There can be many more of them in life. To improve the accuracy of forecasting and planning, the factors that lead to the fact that actual sales can go beyond the forecast, it is worth highlighting and building forecasts and plans for them separately. And then take into account their impact on the main sales forecast. You can also regularly evaluate the impact of these factors and change the situation for the better for by reducing the influence of negative and increasing the influence of positive factors.

With a confidence interval, we can:

  1. Highlight destinations, which are worth paying attention to, because events have occurred in these areas that may affect change in trend.
  2. Determine Factors that actually make a difference.
  3. To accept weighted decision(for example, about procurement, when planning, etc.).

Now let's look at what a confidence interval is and how to calculate it in Excel using an example.

What is a confidence interval?

The confidence interval is the forecast boundaries (upper and lower), within which with a given probability (sigma) get the actual values.

Those. we calculate the forecast - this is our main benchmark, but we understand that the actual values ​​are unlikely to be 100% equal to our forecast. And the question arises to what extent may get actual values, if the current trend continues? And this question will help us answer confidence interval calculation, i.e. - upper and lower bounds of the forecast.

What is a given probability sigma?

When calculating confidence interval we can set probability hits actual values within the given forecast boundaries. How to do it? To do this, we set the value of sigma and, if sigma is equal to:

    3 sigma- then, the probability of hitting the next actual value in the confidence interval will be 99.7%, or 300 to 1, or there is a 0.3% probability of going beyond the boundaries.

    2 sigma- then, the probability of hitting the next value within the boundaries is ≈ 95.5%, i.e. the odds are about 20 to 1, or there is a 4.5% chance of going out of bounds.

    1 sigma- then, the probability is ≈ 68.3%, i.e. the chances are about 2 to 1, or there is a 31.7% chance that the next value will fall outside the confidence interval.

We formulated 3 Sigma Rule,which says that hit probability another random value into the confidence interval with a given value three sigma is 99.7%.

The great Russian mathematician Chebyshev proved a theorem that there is a 10% chance of going beyond the boundaries of a forecast with a given value of three sigma. Those. the probability of falling into the 3 sigma confidence interval will be at least 90%, while an attempt to calculate the forecast and its boundaries “by eye” is fraught with much more significant errors.

How to independently calculate the confidence interval in Excel?

Let's consider the calculation of the confidence interval in Excel (ie the upper and lower bounds of the forecast) using an example. We have a time series - sales by months for 5 years. See attached file.

To calculate the boundaries of the forecast, we calculate:

  1. Sales forecast().
  2. Sigma - standard deviation forecast models from actual values.
  3. Three sigma.
  4. Confidence interval.

1. Sales forecast.

=(RC[-14] (data in time series)-RC[-1] (model value))^2(squared)


3. Sum for each month the deviation values ​​from stage 8 Sum((Xi-Ximod)^2), i.e. Let's sum January, February... for each year.

To do this, use the formula =SUMIF()

SUMIF(array with numbers of periods inside the cycle (for months from 1 to 12); reference to the number of the period in the cycle; reference to an array with squares of the difference between the initial data and the values ​​of the periods)


4. Calculate the standard deviation for each period in the cycle from 1 to 12 (stage 10 in the attached file).

To do this, from the value calculated at stage 9, we extract the root and divide by the number of periods in this cycle minus 1 = ROOT((Sum(Xi-Ximod)^2/(n-1))

Let's use formulas in Excel =ROOT(R8 (reference to (Sum(Xi-Ximod)^2)/(COUNTIF($O$8:$O$67 (reference to an array with cycle numbers); O8 (reference to a specific cycle number, which we consider in the array))-1))

Using the Excel formula = COUNTIF we count the number n


By calculating the standard deviation of the actual data from the forecast model, we obtained the sigma value for each month - stage 10 in the attached file .

3. Calculate 3 sigma.

At stage 11, we set the number of sigmas - in our example, "3" (stage 11 in the attached file):

Also practical sigma values:

1.64 sigma - 10% chance of going over the limit (1 chance in 10);

1.96 sigma - 5% chance of going out of bounds (1 chance in 20);

2.6 sigma - 1% chance of going out of bounds (1 in 100 chance).

5) We calculate three sigma, for this we multiply the “sigma” values ​​\u200b\u200bfor each month by “3”.

3. Determine the confidence interval.

  1. Upper forecast limit- sales forecast taking into account growth and seasonality + (plus) 3 sigma;
  2. Lower Forecast Bound- sales forecast taking into account growth and seasonality - (minus) 3 sigma;

For the convenience of calculating the confidence interval for a long period (see attached file), we use the Excel formula =Y8+VLOOKUP(W8;$U$8:$V$19;2;0), where

Y8- sales forecast;

W8- the number of the month for which we will take the value of 3 sigma;

Those. Upper forecast limit= "sales forecast" + "3 sigma" (in the example, VLOOKUP(month number; table with 3 sigma values; column from which we extract the sigma value equal to the month number in the corresponding row; 0)).

Lower Forecast Bound= "sales forecast" minus "3 sigma".

So, we have calculated the confidence interval in Excel.

Now we have a forecast and a range with boundaries within which the actual values ​​will fall with a given probability sigma.

In this article, we looked at what sigma and the three sigma rule are, how to determine a confidence interval, and what you can use this technique for in practice.

Accurate forecasts and success to you!

How Forecast4AC PRO can help youwhen calculating the confidence interval?:

    Forecast4AC PRO will automatically calculate the upper or lower forecast limits for more than 1000 time series at the same time;

    The ability to analyze the boundaries of the forecast in comparison with the forecast, trend and actual sales on the chart with one keystroke;

In the Forcast4AC PRO program, it is possible to set the sigma value from 1 to 3.

Join us!

Download Free Forecasting and Business Intelligence Apps:


  • Novo Forecast Lite- automatic forecast calculation in excel.
  • 4analytics- ABC-XYZ analysis and analysis of emissions in Excel.
  • Qlik Sense Desktop and QlikViewPersonal Edition - BI systems for data analysis and visualization.

Test the features of paid solutions:

  • Novo Forecast PRO- forecasting in Excel for large data arrays.

Often the appraiser has to analyze the real estate market of the segment in which the appraisal object is located. If the market is developed, it can be difficult to analyze the entire set of presented objects, therefore, a sample of objects is used for analysis. This sample is not always homogeneous, sometimes it is required to clear it of extremes - too high or too low market offers. For this purpose, it is applied confidence interval. The purpose of this study is to conduct a comparative analysis of two methods for calculating the confidence interval and choose the best calculation option when working with different samples in the estimatica.pro system.

Confidence interval - calculated on the basis of the sample, the interval of values ​​of the attribute, which with a known probability contains the estimated parameter of the general population.

The meaning of calculating the confidence interval is to build such an interval based on the sample data so that it can be asserted with a given probability that the value of the estimated parameter is in this interval. In other words, the confidence interval with a certain probability contains the unknown value of the estimated quantity. The wider the interval, the higher the inaccuracy.

There are different methods for determining the confidence interval. In this article, we will consider 2 ways:

  • through the median and standard deviation;
  • through the critical value of the t-statistic (Student's coefficient).

Stages of a comparative analysis of different methods for calculating CI:

1. form a data sample;

2. we process it with statistical methods: we calculate the mean value, median, variance, etc.;

3. we calculate the confidence interval in two ways;

4. Analyze the cleaned samples and the obtained confidence intervals.

Stage 1. Data sampling

The sample was formed using the estimatica.pro system. The sample included 91 offers for the sale of 1-room apartments in the 3rd price zone with the type of planning "Khrushchev".

Table 1. Initial sample

The price of 1 sq.m., c.u.

Fig.1. Initial sample



Stage 2. Processing of the initial sample

Sample processing by statistical methods requires the calculation of the following values:

1. Arithmetic mean

2. Median - a number that characterizes the sample: exactly half of the sample elements are greater than the median, the other half is less than the median

(for a sample with an odd number of values)

3. Range - the difference between the maximum and minimum values ​​in the sample

4. Variance - used to more accurately estimate the variation in data

5. The standard deviation for the sample (hereinafter referred to as RMS) is the most common indicator of the dispersion of adjustment values ​​around the arithmetic mean.

6. Coefficient of variation - reflects the degree of dispersion of adjustment values

7. oscillation coefficient - reflects the relative fluctuation of the extreme values ​​of prices in the sample around the average

Table 2. Statistical indicators of the original sample

The coefficient of variation, which characterizes the homogeneity of the data, is 12.29%, but the coefficient of oscillation is too large. Thus, we can state that the original sample is not homogeneous, so let's move on to calculating the confidence interval.

Stage 3. Calculation of the confidence interval

Method 1. Calculation through the median and standard deviation.

The confidence interval is determined as follows: the minimum value - the standard deviation is subtracted from the median; the maximum value - the standard deviation is added to the median.

Thus, the confidence interval (47179 CU; 60689 CU)

Rice. 2. Values ​​within confidence interval 1.



Method 2. Building a confidence interval through the critical value of t-statistics (Student's coefficient)

S.V. Gribovsky in the book "Mathematical methods for assessing the value of property" describes a method for calculating the confidence interval through the Student's coefficient. When calculating by this method, the estimator himself must set the significance level ∝, which determines the probability with which the confidence interval will be built. Significance levels of 0.1 are commonly used; 0.05 and 0.01. They correspond to confidence probabilities of 0.9; 0.95 and 0.99. With this method, the true values ​​of the mathematical expectation and variance are considered to be practically unknown (which is almost always true when solving practical evaluation problems).

Confidence interval formula:

n - sample size;

The critical value of t-statistics (Student's distributions) with a significance level ∝, the number of degrees of freedom n-1, which is determined by special statistical tables or using MS Excel (→"Statistical"→ STUDRASPOBR);

∝ - significance level, we take ∝=0.01.

Rice. 2. Values ​​within the confidence interval 2.

Step 4. Analysis of different ways to calculate the confidence interval

Two methods of calculating the confidence interval - through the median and Student's coefficient - led to different values ​​of the intervals. Accordingly, two different purified samples were obtained.

Table 3. Statistical indicators for three samples.

Indicator

Initial sample

1 option

Option 2

Mean

Dispersion

Coef. variations

Coef. oscillations

Number of retired objects, pcs.

Based on the calculations performed, we can say that the values ​​of the confidence intervals obtained by different methods intersect, so you can use any of the calculation methods at the discretion of the appraiser.

However, we believe that when working in the estimatica.pro system, it is advisable to choose a method for calculating the confidence interval, depending on the degree of market development:

  • if the market is not developed, apply the method of calculation through the median and standard deviation, since the number of retired objects in this case is small;
  • if the market is developed, apply the calculation through the critical value of t-statistics (Student's coefficient), since it is possible to form a large initial sample.

In preparing the article were used:

1. Gribovsky S.V., Sivets S.A., Levykina I.A. Mathematical methods for assessing the value of property. Moscow, 2014

2. Data from the estimatica.pro system