Calculate the confidence interval. Calculating a Confidence Interval in Microsoft Excel

"Katren-Style" continues to publish a cycle of Konstantin Kravchik on medical statistics. In two previous articles, the author touched on the explanation of such concepts as and.

Konstantin Kravchik

Mathematician-analyst. Specialist in the field of statistical research in medicine and the humanities

Moscow city

Very often in articles on clinical trials you can find a mysterious phrase: "confidence interval" (95% CI or 95% CI - confidence interval). For example, an article might say: "Student's t-test was used to assess the significance of differences, with a 95% confidence interval calculated."

What is the value of the "95% confidence interval" and why calculate it?

What is a confidence interval? - This is the range in which the true mean values ​​in the population fall. And what, there are "untrue" averages? In a sense, yes, they do. In we explained that it is impossible to measure the parameter of interest in the entire population, so the researchers are content with a limited sample. In this sample (for example, by body weight) there is one average value (a certain weight), by which we judge the average value in the entire general population. However, it is unlikely that the average weight in the sample (especially a small one) will coincide with the average weight in the general population. Therefore, it is more correct to calculate and use the range of average values ​​of the general population.

For example, suppose the 95% confidence interval (95% CI) for hemoglobin is between 110 and 122 g/L. This means that with a 95 % probability, the true mean value for hemoglobin in the general population will be in the range from 110 to 122 g/L. In other words, we do not know the average hemoglobin in the general population, but we can indicate the range of values ​​for this feature with 95% probability.

Confidence intervals are particularly relevant to the difference in means between groups, or what is called the effect size.

Suppose we compared the effectiveness of two iron preparations: one that has been on the market for a long time and one that has just been registered. After the course of therapy, the concentration of hemoglobin in the studied groups of patients was assessed, and the statistical program calculated for us that the difference between the average values ​​of the two groups with a probability of 95% is in the range from 1.72 to 14.36 g/l (Table 1).

Tab. 1. Criterion for independent samples
(groups are compared by hemoglobin level)

This should be interpreted as follows: in a part of patients in the general population who take a new drug, hemoglobin will be higher on average by 1.72–14.36 g/l than in those who took an already known drug.

In other words, in the general population, the difference in the average values ​​for hemoglobin in groups with a 95% probability is within these limits. It will be up to the researcher to judge whether this is a lot or a little. The point of all this is that we are not working with one average value, but with a range of values, therefore, we more reliably estimate the difference in a parameter between groups.

In statistical packages, at the discretion of the researcher, one can independently narrow or expand the boundaries of the confidence interval. By lowering the probabilities of the confidence interval, we narrow the range of means. For example, at 90% CI, the range of means (or mean differences) will be narrower than at 95% CI.

Conversely, increasing the probability to 99% widens the range of values. When comparing groups, the lower limit of the CI may cross the zero mark. For example, if we extended the boundaries of the confidence interval to 99 %, then the boundaries of the interval ranged from –1 to 16 g/L. This means that in the general population there are groups, the difference between the averages between which for the studied trait is 0 (M=0).

Confidence intervals can be used to test statistical hypotheses. If the confidence interval crosses the zero value, then the null hypothesis, which assumes that the groups do not differ in the studied parameter, is true. An example is described above, when we expanded the boundaries to 99%. Somewhere in the general population, we found groups that did not differ in any way.

95% confidence interval of difference in hemoglobin, (g/l)


The figure shows the 95% confidence interval of the mean hemoglobin difference between the two groups as a line. The line passes the zero mark, therefore, there is a difference between the means equal to zero, which confirms the null hypothesis that the groups do not differ. The difference between the groups ranges from -2 to 5 g/l, which means that hemoglobin can either decrease by 2 g/l or increase by 5 g/l.

The confidence interval is a very important indicator. Thanks to it, you can see if the differences in the groups were really due to the difference in the means or due to a large sample, because with a large sample, the chances of finding differences are greater than with a small one.

In practice, it might look like this. We took a sample of 1000 people, measured the hemoglobin level and found that the confidence interval for the difference in the means lies from 1.2 to 1.5 g/L. The level of statistical significance in this case p

We see that the hemoglobin concentration increased, but almost imperceptibly, therefore, the statistical significance appeared precisely due to the sample size.

Confidence intervals can be calculated not only for averages, but also for proportions (and risk ratios). For example, we are interested in the confidence interval of the proportions of patients who achieved remission while taking the developed drug. Assume that the 95% CI for the proportions, i.e. for the proportion of such patients, is in the range 0.60–0.80. Thus, we can say that our medicine has a therapeutic effect in 60 to 80% of cases.

The mind is not only in knowledge, but also in the ability to apply knowledge in practice. (Aristotle)

Confidence intervals

general review

Taking a sample from the population, we will obtain a point estimate of the parameter of interest to us and calculate the standard error in order to indicate the accuracy of the estimate.

However, for most cases, the standard error as such is not acceptable. It is much more useful to combine this measure of precision with an interval estimate for the population parameter.

This can be done by using knowledge of the theoretical probability distribution of the sample statistic (parameter) in order to calculate a confidence interval (CI - Confidence Interval, CI - Confidence Interval) for the parameter.

In general, the confidence interval extends the estimates in both directions by some multiple of the standard error (of a given parameter); the two values ​​(confidence limits) that define the interval are usually separated by a comma and enclosed in parentheses.

Confidence interval for mean

Using the normal distribution

The sample mean has a normal distribution if the sample size is large, so knowledge of the normal distribution can be applied when considering the sample mean.

In particular, 95% of the distribution of the sample means is within 1.96 standard deviations (SD) of the population mean.

When we have only one sample, we call this the standard error of the mean (SEM) and calculate the 95% confidence interval for the mean as follows:

If this experiment is repeated several times, then the interval will contain the true population mean 95% of the time.

This is usually a confidence interval, such as the range of values ​​within which the true population mean (general mean) lies with a 95% confidence level.

Although it is not quite strict (the population mean is a fixed value and therefore cannot have a probability related to it) to interpret the confidence interval in this way, it is conceptually easier to understand.

Usage t- distribution

You can use the normal distribution if you know the value of the variance in the population. Also, when the sample size is small, the sample mean follows a normal distribution if the data underlying the population are normally distributed.

If the data underlying the population are not normally distributed and/or the general variance (population variance) is unknown, the sample mean obeys Student's t-distribution.

Calculate the 95% confidence interval for the population mean as follows:

Where - percentage point (percentile) t- Student distribution with (n-1) degrees of freedom, which gives a two-tailed probability of 0.05.

In general, it provides a wider interval than when using a normal distribution, because it takes into account the additional uncertainty that is introduced by estimating the population standard deviation and/or due to the small sample size.

When the sample size is large (of the order of 100 or more), the difference between the two distributions ( t-student and normal) is negligible. However, always use t- distribution when calculating confidence intervals, even if the sample size is large.

Usually 95% CI is indicated. Other confidence intervals can be calculated, such as 99% CI for the mean.

Instead of product of standard error and table value t- distribution that corresponds to a two-tailed probability of 0.05 multiply it (standard error) by a value that corresponds to a two-tailed probability of 0.01. This is a wider confidence interval than the 95% case because it reflects increased confidence that the interval does indeed include the population mean.

Confidence interval for proportion

The sampling distribution of proportions has a binomial distribution. However, if the sample size n reasonably large, then the proportion sample distribution is approximately normal with mean .

Estimate by sampling ratio p=r/n(where r- the number of individuals in the sample with the characteristics of interest to us), and the standard error is estimated:

The 95% confidence interval for the proportion is estimated:

If the sample size is small (usually when np or n(1-p) smaller 5 ), then the binomial distribution must be used in order to calculate the exact confidence intervals.

Note that if p expressed as a percentage, then (1-p) replaced by (100p).

Interpretation of confidence intervals

When interpreting the confidence interval, we are interested in the following questions:

How wide is the confidence interval?

A wide confidence interval indicates that the estimate is imprecise; narrow indicates a fine estimate.

The width of the confidence interval depends on the size of the standard error, which in turn depends on the sample size, and when considering a numeric variable from the variability of the data, give wider confidence intervals than studies of a large data set of few variables.

Does the CI include any values ​​of particular interest?

You can check whether the likely value for a population parameter falls within a confidence interval. If yes, then the results are consistent with this likely value. If not, then it is unlikely (for a 95% confidence interval, the chance is almost 5%) that the parameter has this value.

Suppose we have a large number of items with a normal distribution of some characteristics (for example, a full warehouse of vegetables of the same type, the size and weight of which varies). You want to know the average characteristics of the entire batch of goods, but you have neither the time nor the inclination to measure and weigh each vegetable. You understand that this is not necessary. But how many pieces would you need to take for random inspection?

Before giving some formulas useful for this situation, we recall some notation.

First, if we did measure the entire warehouse of vegetables (this set of elements is called the general population), then we would know with all the accuracy available to us the average value of the weight of the entire batch. Let's call this average X cf .g en . - general average. We already know what is completely determined if its mean value and deviation s are known . True, so far we are neither X avg. nor s we do not know the general population. We can only take some sample, measure the values ​​we need and calculate for this sample both the mean value X sr. in sample and the standard deviation S sb.

It is known that if our custom check contains a large number of elements (usually n is greater than 30), and they are taken really random, then s the general population will almost not differ from S ..

In addition, for the case of a normal distribution, we can use the following formulas:

With a probability of 95%


With a probability of 99%



In general, with probability Р (t)


The relationship between the value of t and the value of the probability P (t), with which we want to know the confidence interval, can be taken from the following table:


Thus, we have determined in what range the average value for the general population is (with a given probability).

Unless we have a large enough sample, we cannot claim that the population has s = S sel. In addition, in this case, the closeness of the sample to the normal distribution is problematic. In this case, also use S sb instead s in the formula:




but the value of t for a fixed probability P(t) will depend on the number of elements in the sample n. The larger n, the closer the resulting confidence interval will be to the value given by formula (1). The t values ​​in this case are taken from another table (Student's t-test), which we provide below:

Student's t-test values ​​for probability 0.95 and 0.99


Example 3 30 people were randomly selected from the employees of the company. According to the sample, it turned out that the average salary (per month) is 30 thousand rubles with an average square deviation of 5 thousand rubles. With a probability of 0.99 determine the average salary in the firm.

Decision: By condition, we have n = 30, X cf. =30000, S=5000, P=0.99. To find the confidence interval, we use the formula corresponding to the Student's criterion. According to the table for n \u003d 30 and P \u003d 0.99 we find t \u003d 2.756, therefore,


those. desired trust interval 27484< Х ср.ген < 32516.

So, with a probability of 0.99, it can be argued that the interval (27484; 32516) contains the average salary in the company.

We hope that you will use this method without necessarily having a spreadsheet with you every time. Calculations can be carried out automatically in Excel. While in an Excel file, click the fx button on the top menu. Then, select among the functions the type "statistical", and from the proposed list in the box - STEUDRASP. Then, at the prompt, placing the cursor in the "probability" field, type the value of the reciprocal probability (that is, in our case, instead of the probability of 0.95, you need to type the probability of 0.05). Apparently, the spreadsheet is designed so that the result answers the question of how likely we can be wrong. Similarly, in the "degree of freedom" field, enter the value (n-1) for your sample.

One of the methods for solving statistical problems is the calculation of the confidence interval. It is used as a preferred alternative to point estimation when the sample size is small. It should be noted that the process of calculating the confidence interval is rather complicated. But the tools of the Excel program allow you to somewhat simplify it. Let's find out how this is done in practice.

This method is used in the interval estimation of various statistical quantities. The main task of this calculation is to get rid of the uncertainties of the point estimate.

In Excel, there are two main options to calculate using this method: when the variance is known, and when it is unknown. In the first case, the function is used for calculations CONFIDENCE NORM, and in the second TRUST.STUDENT.

Method 1: CONFIDENCE NORM function

Operator CONFIDENCE NORM, which refers to the statistical group of functions, first appeared in Excel 2010. Earlier versions of this program use its counterpart TRUST. The task of this operator is to calculate a confidence interval with a normal distribution for the population mean.

Its syntax is as follows:

CONFIDENCE NORM(alpha, standard_dev, size)

"Alpha" is an argument indicating the level of significance that is used to calculate the confidence level. The confidence level is equal to the following expression:

(1-"Alpha")*100

"Standard deviation" is an argument, the essence of which is clear from the name. This is the standard deviation of the proposed sample.

"The size" is an argument that determines the size of the sample.

All arguments to this operator are required.

Function TRUST has exactly the same arguments and possibilities as the previous one. Its syntax is:

TRUST(alpha, standard_dev, size)

As you can see, the differences are only in the name of the operator. This feature has been retained in Excel 2010 and newer versions in a special category for compatibility reasons. "Compatibility". In versions of Excel 2007 and earlier, it is present in the main group of statistical operators.

The confidence interval boundary is determined using the formula of the following form:

X+(-)CONFIDENCE NORM

Where X is the sample mean, which is located in the middle of the selected range.

Now let's look at how to calculate the confidence interval using a specific example. 12 tests were carried out, resulting in different results, which are listed in the table. This is our totality. The standard deviation is 8. We need to calculate the confidence interval at the 97% confidence level.

  1. Select the cell where the result of data processing will be displayed. Clicking on the button "Insert Function".
  2. Appears Function Wizard. Go to category "Statistical" and highlight the name "CONFIDENCE.NORM". After that click on the button OK.
  3. The arguments window opens. Its fields naturally correspond to the names of the arguments.
    Set the cursor to the first field - "Alpha". Here we should specify the level of significance. As we remember, our level of trust is 97%. At the same time, we said that it is calculated in this way:

    (1-trust level)/100

    That is, by substituting the value, we get:

    By simple calculations, we find out that the argument "Alpha" equals 0,03 . Enter this value in the field.

    As you know, the standard deviation is equal to 8 . Therefore, in the field "Standard deviation" just write down that number.

    In field "The size" you need to enter the number of elements of the tests performed. As we remember, they 12 . But in order to automate the formula and not edit it every time a new test is performed, let's set this value not to an ordinary number, but using the operator CHECK. So, we set the cursor in the field "The size", and then click on the triangle, which is located to the left of the formula bar.

    A list of recently used functions appears. If the operator CHECK used by you recently, it should be on this list. In this case, you just need to click on its name. Otherwise, if you do not find it, then go to the point "More features...".

  4. Appears already familiar to us Function Wizard. Moving back to the group "Statistical". We select the name there "CHECK". Click on the button OK.
  5. The argument window for the above operator appears. This function is designed to calculate the number of cells in the specified range that contain numeric values. Its syntax is the following:

    COUNT(value1, value2,…)

    Argument group "Values" is a reference to the range in which you want to calculate the number of cells filled with numeric data. In total, there can be up to 255 such arguments, but in our case we need only one.

    Set the cursor in the field "Value1" and, holding down the left mouse button, select the range on the sheet that contains our population. Then its address will be displayed in the field. Click on the button OK.

  6. After that, the application will perform the calculation and display the result in the cell where it is itself. In our particular case, the formula turned out like this:

    CONFIDENCE NORM(0.03,8,COUNT(B2:B13))

    The overall result of the calculations was 5,011609 .

  7. But that is not all. As we remember, the boundary of the confidence interval is calculated by adding and subtracting from the average sample value of the calculation result CONFIDENCE NORM. In this way, the right and left boundaries of the confidence interval are calculated, respectively. The sample mean itself can be calculated using the operator AVERAGE.

    This operator is designed to calculate the arithmetic mean of the selected range of numbers. It has the following rather simple syntax:

    AVERAGE(number1, number2,…)

    Argument "Number" can be either a single numeric value or a reference to cells or even entire ranges that contain them.

    So, select the cell in which the calculation of the average value will be displayed, and click on the button "Insert Function".

  8. opens Function Wizard. Back to category "Statistical" and select a name from the list "AVERAGE". As always, click on the button OK.
  9. The arguments window is launched. Set the cursor in the field "Number1" and with the left mouse button pressed, select the entire range of values. After the coordinates are displayed in the field, click on the button OK.
  10. After that AVERAGE outputs the result of the calculation to a sheet element.
  11. We calculate the right boundary of the confidence interval. To do this, select a separate cell, put the sign «=» and add the contents of the sheet elements in which the results of the calculation of functions are located AVERAGE and CONFIDENCE NORM. In order to perform the calculation, press the button Enter. In our case, we got the following formula:

    Calculation result: 6,953276

  12. In the same way, we calculate the left boundary of the confidence interval, only this time from the result of the calculation AVERAGE subtract the result of the calculation of the operator CONFIDENCE NORM. It turns out the formula for our example of the following type:

    Calculation result: -3,06994

  13. We tried to describe in detail all the steps for calculating the confidence interval, so we described each formula in detail. But you can combine all the actions in one formula. The calculation of the right bound of the confidence interval can be written as follows:

    AVERAGE(B2:B13)+CONFIDENCE(0.03,8,COUNT(B2:B13))

  14. A similar calculation of the left border would look like this:

    AVERAGE(B2:B13)-CONFIDENCE.NORM(0.03,8,COUNT(B2:B13))

Method 2: TRUST.STUDENT function

In addition, there is another function in Excel that is related to the calculation of the confidence interval - TRUST.STUDENT. It has only appeared since Excel 2010. This operator performs the calculation of the population confidence interval using Student's t-distribution. It is very convenient to use it in the case when the variance and, accordingly, the standard deviation are unknown. The operator syntax is:

TRUST.STUDENT(alpha,standard_dev,size)

As you can see, the names of the operators in this case remained unchanged.

Let's see how to calculate the boundaries of the confidence interval with an unknown standard deviation using the example of the same population that we considered in the previous method. The level of confidence, like last time, we will take 97%.

  1. Select the cell in which the calculation will be made. Click on the button "Insert Function".
  2. In the opened Function Wizard go to category "Statistical". Choose a name "TRUST.STUDENT". Click on the button OK.
  3. The argument window for the specified operator is launched.

    In field "Alpha", given that the confidence level is 97%, we write down the number 0,03 . The second time we will not dwell on the principles of calculating this parameter.

    After that, set the cursor in the field "Standard deviation". This time, this indicator is unknown to us and it needs to be calculated. This is done using a special function - STDEV.B. To call the window of this operator, click on the triangle to the left of the formula bar. If we do not find the desired name in the list that opens, then go to the item "More features...".

  4. is running Function Wizard. Moving to category "Statistical" and mark the name "STDEV.B". Then click on the button OK.
  5. The arguments window opens. operator task STDEV.B is the definition of standard deviation in sampling. Its syntax looks like this:

    STDEV.V(number1,number2,…)

    It is easy to guess that the argument "Number" is the address of the selection element. If the selection is placed in a single array, then using only one argument, you can give a link to this range.

    Set the cursor in the field "Number1" and, as always, holding down the left mouse button, select the set. After the coordinates are in the field, do not rush to press the button OK because the result will be incorrect. First we need to return to the operator arguments window TRUST.STUDENT to make the final argument. To do this, click on the appropriate name in the formula bar.

  6. The argument window of the already familiar function opens again. Set the cursor in the field "The size". Again, click on the triangle already familiar to us to go to the choice of operators. As you understand, we need a name "CHECK". Since we used this function in the calculations in the previous method, it is present in this list, so just click on it. If you do not find it, then follow the algorithm described in the first method.
  7. Getting into the arguments window CHECK, put the cursor in the field "Number1" and with the mouse button held down, select the collection. Then click on the button OK.
  8. After that, the program calculates and displays the value of the confidence interval.
  9. To determine the boundaries, we will again need to calculate the sample mean. But, given that the calculation algorithm using the formula AVERAGE the same as in the previous method, and even the result has not changed, we will not dwell on this in detail a second time.
  10. Adding up the results of the calculation AVERAGE and TRUST.STUDENT, we obtain the right boundary of the confidence interval.
  11. Subtracting from the calculation results of the operator AVERAGE calculation result TRUST.STUDENT, we have the left bound of the confidence interval.
  12. If the calculation is written in one formula, then the calculation of the right border in our case will look like this:

    AVERAGE(B2:B13)+STUDENT CONFIDENCE(0.03,STDV(B2:B13),COUNT(B2:B13))

  13. Accordingly, the formula for calculating the left border will look like this:

    AVERAGE(B2:B13)-STUDENT CONFIDENCE(0.03,STDV(B2:B13),COUNT(B2:B13))

As you can see, the tools of the Excel program make it possible to significantly facilitate the calculation of the confidence interval and its boundaries. For these purposes, separate operators are used for samples whose variance is known and unknown.

And others. All of them are estimates of their theoretical counterparts, which could be obtained if there was not a sample, but the general population. But alas, the general population is very expensive and often unavailable.

The concept of interval estimation

Any sample estimate has some scatter, because is a random variable depending on the values ​​in a particular sample. Therefore, for more reliable statistical inferences, one should know not only the point estimate, but also the interval, which with a high probability γ (gamma) covers the estimated indicator θ (theta).

Formally, these are two such values ​​(statistics) T1(X) and T2(X), what T1< T 2 , for which at a given level of probability γ condition is met:

In short, it is likely γ or more the true value is between the points T1(X) and T2(X), which are called the lower and upper bounds confidence interval.

One of the conditions for constructing confidence intervals is its maximum narrowness, i.e. it should be as short as possible. Desire is quite natural, because. the researcher tries to more accurately localize the finding of the desired parameter.

It follows that the confidence interval should cover the maximum probabilities of the distribution. and the score itself be at the center.

That is, the probability of deviation (of the true indicator from the estimate) upwards is equal to the probability of deviation downwards. It should also be noted that for skewed distributions, the interval on the right is not equal to the interval on the left.

The figure above clearly shows that the greater the confidence level, the wider the interval - a direct relationship.

This was a small introduction to the theory of interval estimation of unknown parameters. Let's move on to finding confidence limits for the mathematical expectation.

Confidence interval for mathematical expectation

If the original data are distributed over , then the average will be a normal value. This follows from the rule that a linear combination of normal values ​​also has a normal distribution. Therefore, to calculate the probabilities, we could use the mathematical apparatus of the normal distribution law.

However, this will require the knowledge of two parameters - the expected value and the variance, which are usually not known. You can, of course, use estimates instead of parameters (arithmetic mean and ), but then the distribution of the mean will not be quite normal, it will be slightly flattened down. Citizen William Gosset of Ireland adroitly noted this fact when he published his discovery in the March 1908 issue of Biometrica. For secrecy purposes, Gosset signed with Student. This is how the Student's t-distribution appeared.

However, the normal distribution of data, used by K. Gauss in the analysis of errors in astronomical observations, is extremely rare in terrestrial life and it is quite difficult to establish this (for high accuracy, about 2 thousand observations are needed). Therefore, it is best to drop the normality assumption and use methods that do not depend on the distribution of the original data.

The question arises: what is the distribution of the arithmetic mean if it is calculated from the data of an unknown distribution? The answer is given by the well-known in probability theory Central limit theorem(CPT). In mathematics, there are several versions of it (the formulations have been refined over the years), but all of them, roughly speaking, come down to the statement that the sum of a large number of independent random variables obeys the normal distribution law.

When calculating the arithmetic mean, the sum of random variables is used. From this it turns out that the arithmetic mean has a normal distribution, in which the expected value is the expected value of the original data, and the variance is .

Smart people know how to prove the CLT, but we will verify this with the help of an experiment conducted in Excel. Let's simulate a sample of 50 uniformly distributed random variables (using the Excel function RANDOMBETWEEN). Then we will make 1000 such samples and calculate the arithmetic mean for each. Let's look at their distribution.

It can be seen that the distribution of the average is close to the normal law. If the volume of samples and their number are made even larger, then the similarity will be even better.

Now that we have seen for ourselves the validity of the CLT, we can, using , calculate the confidence intervals for the arithmetic mean, which cover the true mean or mathematical expectation with a given probability.

To establish the upper and lower bounds, it is required to know the parameters of the normal distribution. As a rule, they are not, therefore, estimates are used: arithmetic mean and sample variance. Again, this method gives a good approximation only for large samples. When the samples are small, it is often recommended to use Student's distribution. Don't believe! Student's distribution for the mean occurs only when the original data has a normal distribution, that is, almost never. Therefore, it is better to immediately set the minimum bar for the amount of required data and use asymptotically correct methods. They say 30 observations are enough. Take 50 - you can't go wrong.

T 1.2 are the lower and upper bounds of the confidence interval

– sample arithmetic mean

s0– sample standard deviation (unbiased)

n – sample size

γ – confidence level (usually equal to 0.9, 0.95 or 0.99)

c γ =Φ -1 ((1+γ)/2) is the reciprocal of the standard normal distribution function. In simple terms, this is the number of standard errors from the arithmetic mean to the lower or upper bound (the indicated three probabilities correspond to the values ​​\u200b\u200bof 1.64, 1.96 and 2.58).

The essence of the formula is that the arithmetic mean is taken and then a certain amount is set aside from it ( with γ) standard errors ( s 0 /√n). Everything is known, take it and count.

Before the mass use of PCs, to obtain the values ​​​​of the normal distribution function and its inverse, they used . They are still used, but it is more efficient to turn to ready-made Excel formulas. All elements from the formula above ( , and ) can be easily calculated in Excel. But there is also a ready-made formula for calculating the confidence interval - CONFIDENCE NORM. Its syntax is the following.

CONFIDENCE NORM(alpha, standard_dev, size)

alpha– significance level or confidence level, which in the above notation is equal to 1-γ, i.e. the probability that the mathematicalthe expectation will be outside the confidence interval. With a confidence level of 0.95, alpha is 0.05, and so on.

standard_off is the standard deviation of the sample data. You don't need to calculate the standard error, Excel will divide by the root of n.

the size– sample size (n).

The result of the CONFIDENCE.NORM function is the second term from the formula for calculating the confidence interval, i.e. half-interval. Accordingly, the lower and upper points are the average ± the obtained value.

Thus, it is possible to build a universal algorithm for calculating confidence intervals for the arithmetic mean, which does not depend on the distribution of the initial data. The price for universality is its asymptotic nature, i.e. the need to use relatively large samples. However, in the age of modern technology, collecting the right amount of data is usually not difficult.

Testing Statistical Hypotheses Using a Confidence Interval

(module 111)

One of the main problems solved in statistics is. In a nutshell, its essence is this. An assumption is made, for example, that the expectation of the general population is equal to some value. Then the distribution of sample means is constructed, which can be observed with a given expectation. Next, we look at where in this conditional distribution the real average is located. If it goes beyond the allowable limits, then the appearance of such an average is very unlikely, and with a single repetition of the experiment it is almost impossible, which contradicts the hypothesis put forward, which is successfully rejected. If the average does not go beyond the critical level, then the hypothesis is not rejected (but it is not proved either!).

So, with the help of confidence intervals, in our case for the expectation, you can also test some hypotheses. It's very easy to do. Suppose the arithmetic mean for some sample is 100. The hypothesis is being tested that the expected value is, say, 90. That is, if we put the question primitively, it sounds like this: can it be that with the true value of the average equal to 90, the observed the average was 100?

To answer this question, additional information on standard deviation and sample size will be required. Let's say the standard deviation is 30, and the number of observations is 64 (to easily extract the root). Then the standard error of the mean is 30/8 or 3.75. To calculate the 95% confidence interval, you will need to set aside two standard errors on both sides of the mean (more precisely, 1.96). The confidence interval will be approximately 100 ± 7.5, or from 92.5 to 107.5.

Further reasoning is as follows. If the tested value falls within the confidence interval, then it does not contradict the hypothesis, since fits within the limits of random fluctuations (with a probability of 95%). If the tested point is outside the confidence interval, then the probability of such an event is very small, in any case below the acceptable level. Hence, the hypothesis is rejected as contradicting the observed data. In our case, the expectation hypothesis is outside the confidence interval (the tested value of 90 is not included in the interval of 100±7.5), so it should be rejected. Answering the primitive question above, one should say: no, it cannot, in any case, this happens extremely rarely. Often, this indicates a specific probability of erroneous rejection of the hypothesis (p-level), and not a given level, according to which the confidence interval was built, but more on that another time.

As you can see, it is not difficult to build a confidence interval for the mean (or mathematical expectation). The main thing is to catch the essence, and then things will go. In practice, most use the 95% confidence interval, which is about two standard errors wide on either side of the mean.

That's all for now. All the best!