Geometric distribution. Discrete distributions in MS EXCEL

Statistics comes to our aid in solving many problems, for example: when it is not possible to build a deterministic model, when there are too many factors, or when we need to estimate the likelihood of a model built taking into account the available data. The relation to statistics is ambiguous. It is believed that there are three types of lies: lies, blatant lies and statistics. On the other hand, many "users" of statistics believe it too much, not fully understanding how it works: applying, for example, a test to any data without checking its normality. Such negligence can generate serious errors and turn the “fans” of the test into haters of statistics. Let's try to put currents over i and figure out which models of random variables should be used to describe certain phenomena and what kind of genetic relationship exists between them.

First of all, this material will be of interest to students studying probability theory and statistics, although "mature" specialists will be able to use it as a reference. In one of the following works, I will show an example of using statistics to build a test for assessing the significance of indicators of exchange trading strategies.

The work will consider:

At the end of the article will be given for reflection. I will share my thoughts on this in my next article.

Some of the given continuous distributions are special cases.

Discrete distributions

Discrete distributions are used to describe events with non-differentiable characteristics defined at isolated points. Simply put, for events whose outcome can be attributed to some discrete category: success or failure, an integer (for example, a game of roulette, dice), heads or tails, etc.

A discrete distribution is described by the probability of occurrence of each of the possible outcomes of an event. As for any distribution (including continuous), the concepts of expectation and variance are defined for discrete events. However, it should be understood that the expectation for a discrete random event is generally unrealizable as the outcome of a single random event, but rather as a value to which the arithmetic mean of the outcomes of events will tend to increase as their number increases.

In modeling discrete random events, combinatorics plays an important role, since the probability of an event outcome can be defined as the ratio of the number of combinations that give the desired outcome to the total number of combinations. For example: there are 3 white balls and 7 black ones in the basket. When we choose 1 ball from the basket, we can do it in 10 different ways (total number of combinations), but only 3 ways in which the white ball is chosen (3 combinations that give the required outcome). Thus, the probability of choosing a white ball is: ().

It is also necessary to distinguish between samples with replacement and without replacement. For example, to describe the probability of choosing two white balls, it is important to determine whether the first ball will be returned to the basket. If not, then we are dealing with a sample without replacement () and the probability will be as follows: - the probability of choosing a white ball from the initial sample multiplied by the probability of choosing a white ball again from those remaining in the basket. If the first ball is returned to the basket, then this is a return fetch (). In this case, the probability of choosing two white balls is .

If we slightly formalize the basket example as follows: let the outcome of an event take one of two values 0 or 1 with probabilities and respectively, then the distribution of the probability of obtaining each of the proposed outcomes will be called the Bernoulli distribution:

Traditionally, an outcome with a value of 1 is called "success", and an outcome with a value of 0 is called "failure". It is obvious that obtaining the outcome "success or failure" occurs with probability .

Expectation and variance of the Bernoulli distribution:

The number of successes in the trials, the outcome of which is distributed over with the probability of success (example with returning the balls to the basket), is described by the binomial distribution:

In another way, we can say that the binomial distribution describes the sum of independent random variables that can be distributed with the probability of success .
Expectation and variance:

The binomial distribution is valid only for reentrant sampling, that is, when the probability of success remains constant for the entire series of trials.

If the quantities and have binomial distributions with parameters and respectively, then their sum will also be distributed binomially with parameters .

Imagine a situation where we draw balls from the basket and return them back until a white ball is drawn. The number of such operations is described by a geometric distribution. In other words: the geometric distribution describes the number of trials to the first success given the probability of success in each trial. If the number of the trial in which success occurred is implied, then the geometric distribution will be described by the following formula:

Expectation and variance of the geometric distribution:

The geometric distribution is genetically related to the distribution, which describes a continuous random variable: the time before the event, with a constant intensity of events. The geometric distribution is also a special case.

The Pascal distribution is a generalization of the distribution: it describes the distribution of the number of failures in independent trials, the outcome of which is distributed over the probability of success before the sum of successes. For , we obtain a distribution for the quantity .

where is the number of combinations from to .

Expectation and variance of the negative binomial distribution:

The sum of independent random variables distributed according to Pascal is also distributed according to Pascal: let it have distribution , and - . Let also be independent, then their sum will have distribution

So far, we have looked at examples of reentrant samples, that is, the probability of an outcome does not change from trial to trial.

Now consider the situation without replacement and describe the probability of the number of successful samples from the population with a predetermined number of successes and failures (a predetermined number of white and black balls in the basket, trump cards in the deck, defective parts in the game, etc.).

Let the total collection contain objects, of which are labeled as "1" and as "0". We will consider the selection of an object with the label "1" as success, and with the label "0" as failure. Let's carry out n tests, and the selected objects will no longer participate in further tests. The probability of success will follow a hypergeometric distribution:

where is the number of combinations from to .

Expectation and variance:

Poisson distribution

(taken from here)

The Poisson distribution differs significantly from the distributions considered above in its “subject” area: now it is not the probability of a particular test outcome that is considered, but the intensity of events, that is, the average number of events per unit time.

The Poisson distribution describes the probability of occurrence of independent events over time with an average intensity of events :

The expectation and variance of the Poisson distribution:

The variance and mean of the Poisson distribution are identically equal.

The Poisson distribution in combination with , which describes the time intervals between the onset of independent events, form the mathematical basis of the theory of reliability.

The probability density of the product of random variables x and y () with distributions and can be calculated as follows:

Some of the distributions below are special cases of the Pearson distribution, which in turn is a solution to the equation:

where and are distribution parameters. There are 12 types of Pearson distribution, depending on the values of the parameters.

The distributions that will be discussed in this section have close relationships with each other. These connections are expressed in the fact that some distributions are special cases of other distributions, or describe transformations of random variables with other distributions.

The diagram below shows the relationships between some of the continuous distributions that will be discussed in this paper. In the diagram, the solid arrows show the transformation of random variables (the beginning of the arrow indicates the initial distribution, the end of the arrow - the resulting one), and the dotted arrows show the generalization relation (the beginning of the arrow indicates the distribution, which is a special case of the one indicated by the end of the arrow). For special cases of the Pearson distribution above the dotted arrows, the corresponding type of the Pearson distribution is indicated.

The overview of distributions offered below covers many cases that occur in data analysis and process modeling, although, of course, it does not contain absolutely all distributions known to science.

Normal distribution (Gaussian distribution)

(taken from here)

The probability density of a normal distribution with parameters and is described by the Gaussian function:

If and , then such a distribution is called standard.

Expectation and variance of the normal distribution:

The domain of definition of a normal distribution is the set of real numbers.

The normal distribution is a type VI distribution.

The sum of squares of independent normal values has , and the ratio of independent Gaussian values is distributed over .

The normal distribution is infinitely divisible: the sum of normally distributed quantities and with parameters and respectively also has a normal distribution with parameters , where and .

The normal distribution well models quantities that describe natural phenomena, noise of a thermodynamic nature, and measurement errors.

In addition, according to the central limit theorem, the sum of a large number of independent terms of the same order converges to a normal distribution, regardless of the distributions of the terms. Due to this property, the normal distribution is popular in statistical analysis, many statistical tests are designed for normally distributed data.

The z-test is based on the infinite divisibility of the normal distribution. This test is used to check if the expectation of a sample of normally distributed variables is equal to some value. The variance value should be known. If the value of the variance is unknown and is calculated based on the analyzed sample, then a t-test based on .

Let us have a sample of n independent normally distributed values from the general population with a standard deviation, let's hypothesize that . Then the value will have a standard normal distribution. By comparing the obtained z value with the quantiles of the standard distribution, one can accept or reject the hypothesis with the required level of significance.

Due to the prevalence of the Gaussian distribution, many researchers who do not know statistics very well forget to check the data for normality, or evaluate the distribution density plot "by eye", blindly believing that they are dealing with Gaussian data. Accordingly, boldly applying tests designed for a normal distribution and getting completely incorrect results. Probably, this is where the rumor about statistics as the most terrible type of lie came from.

Consider an example: we need to measure the resistance of a set of resistors of a certain value. Resistance has a physical nature, it is logical to assume that the distribution of resistance deviations from the nominal value will be normal. We measure, we get a bell-shaped probability density function for the measured values with a mode in the vicinity of the resistor rating. Is this a normal distribution? If yes, then we will look for defective resistors using , or a z-test if we know the distribution variance in advance. I think that many will do just that.

But let's take a closer look at resistance measurement technology: resistance is defined as the ratio of applied voltage to current flow. We measured the current and voltage with instruments, which, in turn, have normally distributed errors. That is, the measured values of current and voltage are normally distributed random variables with mathematical expectations corresponding to the true values of the measured quantities. And this means that the obtained resistance values are distributed along, and not according to Gauss.

The distribution describes the sum of squares of random variables, each of which is distributed according to the standard normal law:

Where is the number of degrees of freedom, .

The expectation and variance of the distribution:

The domain of definition is the set of non-negative natural numbers. is an infinitely divisible distribution. If and - are distributed over and have and degrees of freedom, respectively, then their sum will also be distributed over and have degrees of freedom.

It is a special case (and therefore a type III distribution) and a generalization. The ratio of quantities distributed over distributed over .

Pearson's goodness-of-fit test is based on distribution. This criterion can be used to check whether a sample of a random variable belongs to a certain theoretical distribution.

Suppose we have a sample of some random variable . Based on this sample, we calculate the probabilities that the values will fall into the intervals (). Let there also be an assumption about the analytical expression of the distribution, according to which, the probabilities of falling into the selected intervals should be . Then the quantities will be distributed according to the normal law.

We bring to the standard normal distribution: ,
where and .

The obtained quantities have a normal distribution with parameters (0, 1), and therefore, the sum of their squares is distributed over with a degree of freedom. The decrease in the degree of freedom is associated with an additional restriction on the sum of the probabilities of values falling into intervals: it must be equal to 1.

By comparing the value with the quantiles of the distribution, one can accept or reject the hypothesis about the theoretical distribution of the data with the required level of significance.

The Student's distribution is used to conduct a t-test: a test for the equality of the expected value of a sample of distributed random variables to a certain value, or the equality of the expected values of two samples with the same variance (equality of variances must be checked). Student's t-distribution describes the ratio of a distributed random variable to a value distributed over .

Let and be independent random variables with degrees of freedom and respectively. Then the quantity will have a Fisher distribution with degrees of freedom, and the quantity will have a Fisher distribution with degrees of freedom.
The Fisher distribution is defined for real non-negative arguments and has a probability density:

Expectation and variance of the Fisher distribution:

The expectation is defined for and the variance is defined for .

A number of statistical tests are based on the Fisher distribution, such as the assessment of the significance of regression parameters, the test for heteroscedasticity, and the test for equality of sample variances (f-test, to be distinguished from accurate Fisher test).

F-test: let there be two independent samples and distributed data volumes and respectively. Let us put forward a hypothesis about the equality of sample variances and test it statistically.

Let's calculate the value. It will have a Fisher distribution with degrees of freedom.

By comparing the value with the quantiles of the corresponding Fisher distribution, we can accept or reject the hypothesis that the sample variances are equal to the required level of significance.

Exponential (exponential) distribution and Laplace distribution (double exponential, double exponential)

(taken from here)

The exponential distribution describes the time intervals between independent events that occur at a mean intensity . The number of occurrences of such an event over a certain period of time is described by discrete . Exponential distribution together with form the mathematical basis of the theory of reliability.

In addition to the theory of reliability, the exponential distribution is used in the description of social phenomena, in economics, in the theory of queuing, in transport logistics - wherever it is necessary to model the flow of events.

The exponential distribution is a special case (for n=2), and hence . Since the exponentially distributed quantity is a chi-square quantity with 2 degrees of freedom, it can be interpreted as the sum of the squares of two independent normally distributed quantities.

Also, the exponential distribution is an honest case

Let the target be fired at before the first hit, with the probability p hitting the target in each shot is the same and does not depend on the results of previous shots. In other words, the Bernoulli scheme is implemented in the experiment under consideration. As a random variable X we will consider the number of shots fired. Obviously, the possible values of the random variable X are natural numbers: x 1 =1, x 2 = 2, ... then the probability that k shots will be equal to

Putting in this formula k=1,2, ... we get a geometric progression with the first term p and multiplier q:

For this reason, the distribution defined by formula (6.11) is called geometric .

Using the formula for the sum of an infinitely decreasing geometric progression, it is easy to verify that

Let us find the numerical characteristics of the geometric distribution.

By the definition of mathematical expectation for DSW, we have

We calculate the dispersion by the formula

For this we find

Hence,

So, the mathematical expectation and variance of the geometric distribution is

. (6.12)

6.4.* Generating function

When solving problems related to DSV, combinatorics methods are often used. One of the most developed theoretical methods of combinatorial analysis is the method of generating functions, which is one of the most powerful methods in applications. Let's get to know him briefly.

If the random variable  takes only non-negative integer values, i.e.

then generating function the probability distribution of a random variable  is called the function

, (6.13)

where z is a real or complex variable. Note that between the set of generating functions  ( x)and many distributions(P(= k)} there is a one-to-one correspondence.

Let the random variable  have binomial distribution

Then, using Newton's binomial formula, we obtain

those. generating function of the binomial distribution has the form

. (6.14)

Addendum. Poisson distribution generating function

has the form

. (6.15)

Generating function of geometric distribution

has the form

. (6.16)

With the help of generating functions, it is convenient to find the main numerical characteristics of the DSW. For example, the first and second initial moments are related to the generating function by the following equalities:

, (6.17)

. (6.18)

The method of generating functions is often convenient because in some cases the distribution function of the DSW is very difficult to determine, while the generating function is sometimes easy to find. For example, consider the scheme of consecutive independent Bernoulli trials, but make one change to it. Let the probability of the event A varies from test to test. This means that the Bernoulli formula for such a scheme becomes inapplicable. The task of finding the distribution function in this case presents considerable difficulties. However, for a given circuit, the generating function is easily found, and, consequently, the corresponding numerical characteristics are also easily found.

The widespread use of generating functions is based on the fact that the study of sums of random variables can be replaced by the study of products of the corresponding generating functions. So, if  1 ,  2 , …,  n independent, then

Let be p k =P k (A) is the probability of "success" in k-th test in the Bernoulli scheme (respectively, q k =1–p k- the probability of "failure" in k th test). Then, in accordance with formula (6.19), the generating function will have the form

. (6.20)

Using this generating function, we can write

It is taken into account here that p k + q k=1. Now, using formula (6.1), we find the second initial moment. To do this, we first compute

and
.

In a particular case p 1 =p 2 =…=p n =p(i.e. in the case of a binomial distribution) it follows from the obtained formulas that M= np, D= npq.

In the geometric distribution, experiments in the Bernoulli scheme are carried out until the first success, with a probability of success p in a single experiment.
Examples of such values can be:

number of shots before the first hit;
number of tests of the device before the first failure;
the number of balls before the first occurrence of white. see solution ;
the number of tosses of a coin before the first tails, etc.

The geometric distribution series of the DSW has the form:

X	1	2	3	…	m	…
p	p	qp	q 2 p	…	q m-1 p	…

The probabilities form a geometric progression with the first term p and the denominator q.
The mathematical expectation and variance of a random variable X, which has a geometric distribution with parameter p, are equal to:

Hypergeometric distribution

A discrete random variable has a hypergeometric distribution with parameters n, k, m if it takes the values 0, 1, 2, ... with probabilities

.
The hypergeometric distribution has a random variable X equal to the number of objects with a given property among m objects randomly extracted (without replacement) from a set of n objects, k of which have this property.
For example:

In a batch of 10 parts, 3 are defective. 4 items are removed. X is the number of good parts among those extracted. (m = 4, n = 10, k = 3). see solution

The mathematical expectation of a random variable X, which has a hypergeometric distribution, and its variance are equal to:

Example #1. An urn contains 2 white and 3 black balls. Balls are drawn at random from the urn without replacement until a white ball appears. As soon as this happens, the process stops. Make a distribution table of a random variable X - the number of experiments performed, find F(x), P(X ≤ 2), M(X), D(X).
Decision: Denote by A - the appearance of a white ball. An experiment can only be performed once if the white ball appears immediately: . If the first time the white ball did not appear, but appeared during the second extraction, then X=2. The probability of such an event is . Similarly: , , . Let's write the data to the table:

X	1	2	3	4
P	0,4	0,3	0,2	0,1

Find F(x):

Find P(X ≤ 2) = P(X = 1 or X = 2) = 0.4 + 0.3 = 0.7
M(X) = 1 0.4 + 2 0.3 + 3 0.2 + 4 0.1 = 2.
D(X) = (1-2) 2 0.4 + (2-2) 2 0.3 + (3-2) 2 0.2 + (4-2) 2 0.1 = 1 .

Example #2. The box contains 11 parts, 5 of which are defective. The assembler draws 4 pieces at random.
1. Find the probability that among the extracted parts: a) 4 defective; b) one defective; c) two defective; d) at least one is defective.
2. Draw up the law of distribution of a random variable X- the number of defective parts among the extracted ones.
3. Find M(X), D(X), σ(X).
4. Calculate P(1
Decision:
1. Find the probability that among the extracted parts:
a) 4 defective;

b) one defective;
The total number of possible elemental outcomes for these trials is equal to the number of ways in which 4 parts out of 11 can be extracted:

Let's calculate the number of outcomes that favor this event (among 4 parts, exactly 1 part is defective):

The remaining 3 parts can be selected from 7:

Therefore, the number of favorable outcomes is: 5*20 = 100
The desired probability is equal to the ratio of the number of outcomes that favor the event to the number of all elementary outcomes: P(1) = 100/330 = 0.303
c) two defective;

d) at least one is defective.
Probability that there are no defective parts. X = 0.

Then the probability that at least one defective is:
P = 1 - P(0) = 1 - 0.0455 = 0.95

2. Compose the distribution law P(x), X - the number of defective parts among the extracted ones.
Find the probability of three defective products.

X	0	1	2	3	4
P	0,0455	0,303	0,4545	0,182	0,015

2. Find M(X), D(X),σ(X).
The mathematical expectation is found by the formula m = ∑x i p i .
Mathematical expectation M[X].
M[x] = 0*0.0455 + 1*0.303 + 2*0.4545 + 3*0.182 + 4*0.015 = 1.818
The dispersion is found by the formula d = ∑x 2 i p i - M[x] 2 .
Dispersion D[X].
D[X] = 0 2 *0.0455 + 1 2 *0.303 + 2 2 *0.4545 + 3 2 *0.182 + 4 2 *0.015 - 1.818 2 = 0.694
Standard deviation σ(x).

3. Calculate P(1 F(x≤0) = 0
F(0< x ≤1) = 0.0455
F(1< x ≤2) = 0.303 + 0.0455 = 0.349
F(2< x ≤3) = 0.455 + 0.349 = 0.803
F(3< x ≤4) = 0.182 + 0.803 = 0.985
F(x>4) = 1
The probability of a SW falling into a particular interval is found by the formula:
P(a ≤ X< b) = F(b) - F(a)
Find the probability that the SW will be in the interval 1 ≤ X< 4
P(1 ≤ X< 4) = F(4) - F(1) = 0.985 - 0.0455 = 0.9395

Example #3. There are 7 parts in the lot, 3 are defective. The controller draws 4 parts at random. Make a distribution law for a random variable X - the number of good parts in the sample. Find the mathematical expectation and variance X. Plot the distribution function.
Total good parts: 7-3 = 4
1. Find the probability that among the selected 4 parts one is serviceable.
The total number of possible elemental outcomes for these trials is equal to the number of ways in which 4 parts out of 7 can be extracted:

Let us calculate the number of outcomes favoring this event.

Consider the Geometric distribution, calculate its mathematical expectation and variance. Using the MS EXCEL function OTRBINOM.DIST(), we will plot the distribution function and probability density graphs.

Geometric distribution(English) Geometric distribution) is a special case (for r=1).

Let tests be carried out, in each of which only the event "success" can occur with probability p or the event "failure" with the probability q =1-p().

Let's define x as the number of the trial in which it was registered first success. In this case, the random variable x will have Geometric distribution:

Geometric distribution in MS EXCEL

In MS EXCEL, starting from version 2010, for Negative Binomial distribution there is a function NEGBINOM.DIST() , the English name is NEGBINOM.DIST(), which allows you to calculate the probability of occurrence number of failures until a given number of successes is obtained for a given probability of success.

For geometric distribution the second argument to this function must be 1, because we are only interested in the first success.

This definition is slightly different from the one above, which calculates the probability that the first success will occur after xtests. The difference comes down to the range of the range change x: if the probability is defined in terms of the number of trials, then X can take values starting from 1, and if through the number of failures, then starting from 0. Therefore, the following formula is valid: p(x_ failures)=p(x_ tests-one). Cm. example file sheet Example, where 2 methods of calculation are given.

The approach taken in the MS EXCEL function is used below: through the number of failures.

To calculate probability density function p(x), see the formula above, you need to set the fourth argument in the INTBINOM.DIST() function to FALSE. To calculate , you must set the fourth argument to TRUE.

Note : Prior to MS EXCEL 2010, EXCEL had a function INTERBINOMDIST() that allows you to calculate only probability density. The sample file contains a formula based on the INTBINOMDIST() function to calculate integral distribution function. There is also a formula for calculating the probability through the definition.

The example file contains graphs probability distribution density and integral distribution function.

Note: For the convenience of writing formulas for the p parameter, a .

Note: In function DISTBINOM.DIST( ) with non-integer value X, . For example, the following formulas will return the same value:
DISTBINOM.DIST( 2 ; one; 0.4; TRUE)=
DISTBINOM.DIST( 2,9 ; one; 0.4; TRUE)

Tasks

Problem solutions are given in example file on sheet Example.

Task1. An oil company drills wells to extract oil. The probability of finding oil in a well is 20%.
What is the probability that the first oil will be obtained on the third attempt?
What is the probability that it will take three attempts to find the first oil?
Solution1:
=INTERBINOM.DIST(3-1, 1, 0.2, FALSE)
=INTERBINOM.DIST(3-1, 1, 0.2, TRUE)

Task2. The rating agency makes a survey of random passers-by in the city about their favorite brand of car. Let it be known that 1% of citizens have a favorite car LadaGranta. What is the probability that you will meet the first admirer of this brand of car after a survey of 10 people?
Solution2: \u003d OTRBINOM.DIST (10-1, 1, 0.01; TRUE)=9,56%

We can single out the most common laws of distribution of discrete random variables:

Binomial distribution law

Poisson distribution law

Geometric distribution law

Hypergeometric distribution law

For given distributions of discrete random variables, the calculation of the probabilities of their values, as well as numerical characteristics (mathematical expectation, variance, etc.) is carried out according to certain "formulas". Therefore, it is very important to know these types of distributions and their basic properties.

1. Binomial distribution law.

A discrete random variable $X$ is subject to the binomial probability distribution if it takes the values $0,\ 1,\ 2,\ \dots ,\ n$ with probabilities $P\left(X=k\right)=C^k_n\cdot p^k\cdot (\left(1-p\right))^(n-k)$. In fact, the random variable $X$ is the number of occurrences of the event $A$ in $n$ independent trials. Probability distribution law for the random variable $X$:

$\begin(array)(|c|c|)
\hline
X_i & 0 & 1 & \dots & n \\
\hline
p_i & P_n\left(0\right) & P_n\left(1\right) & \dots & P_n\left(n\right) \\
\hline
\end(array)$

For such a random variable, the expectation is $M\left(X\right)=np$, the variance is $D\left(X\right)=np\left(1-p\right)$.

Example . There are two children in the family. Assuming the birth probabilities of a boy and a girl equal to $0.5$, find the law of distribution of the random variable $\xi $ - the number of boys in the family.

Let the random variable $\xi $ be the number of boys in the family. The values that $\xi:\ 0,\ 1,\ 2$ can take. The probabilities of these values can be found by the formula $P\left(\xi =k\right)=C^k_n\cdot p^k\cdot (\left(1-p\right))^(n-k)$, where $n =2$ - number of independent trials, $p=0.5$ - probability of occurrence of an event in a series of $n$ trials. We get:

$P\left(\xi =0\right)=C^0_2\cdot (0.5)^0\cdot (\left(1-0.5\right))^(2-0)=(0, 5)^2=0.25;$

$P\left(\xi =1\right)=C^1_2\cdot 0.5\cdot (\left(1-0.5\right))^(2-1)=2\cdot 0.5\ cdot 0.5=0.5;$

$P\left(\xi =2\right)=C^2_2\cdot (0,5)^2\cdot (\left(1-0,5\right))^(2-2)=(0, 5)^2=0.25.$

Then the distribution law of the random variable $\xi $ is the correspondence between the values $0,\ 1,\ 2$ and their probabilities, i.e.:

$\begin(array)(|c|c|)
\hline
\xi & 0 & 1 & 2 \\
\hline
P(\xi) & 0.25 & 0.5 & 0.25 \\
\hline
\end(array)$

The sum of probabilities in the distribution law must be equal to $1$, i.e. $\sum _(i=1)^(n)P(\xi _((\rm i)))=0.25+0.5+0, 25=$1.

Expectation $M\left(\xi \right)=np=2\cdot 0.5=1$, variance $D\left(\xi \right)=np\left(1-p\right)=2\ cdot 0.5\cdot 0.5=0.5$, standard deviation $\sigma \left(\xi \right)=\sqrt(D\left(\xi \right))=\sqrt(0.5 )\approx $0.707.

2. Poisson distribution law.

If a discrete random variable $X$ can take only non-negative integer values $0,\ 1,\ 2,\ \dots ,\ n$ with probabilities $P\left(X=k\right)=(((\lambda )^k )\over (k}\cdot e^{-\lambda }$, то говорят, что она подчинена закону распределения Пуассона с параметром $\lambda $. Для такой случайной величины математическое ожидание и дисперсия равны между собой и равны параметру $\lambda $, то есть $M\left(X\right)=D\left(X\right)=\lambda $.!}

Comment. The peculiarity of this distribution is that, based on experimental data, we find the estimates $M\left(X\right),\ D\left(X\right)$, if the obtained estimates are close to each other, then we have reason to assert that that the random variable is subject to the Poisson distribution law.

Example . Examples of random variables subject to the Poisson distribution law can be: the number of cars that will be serviced tomorrow by a gas station; the number of defective items in the manufactured product.

Example . The plant sent $500$ of products to the base. The probability of product damage in transit is $0.002$. Find the distribution law of the random variable $X$ equal to the number of damaged products; which is equal to $M\left(X\right),\ D\left(X\right)$.

Let a discrete random variable $X$ be the number of damaged products. Such a random variable is subject to the Poisson distribution law with the parameter $\lambda =np=500\cdot 0.002=1$. The probabilities of the values are $P\left(X=k\right)=(((\lambda )^k)\over (k}\cdot e^{-\lambda }$. Очевидно, что все вероятности всех значений $X=0,\ 1,\ \dots ,\ 500$ перечислить невозможно, поэтому мы ограничимся лишь первыми несколькими значениями.!}

$P\left(X=0\right)=((1^0)\over (0}\cdot e^{-1}=0,368;$!}

$P\left(X=1\right)=((1^1)\over (1}\cdot e^{-1}=0,368;$!}

$P\left(X=2\right)=((1^2)\over (2}\cdot e^{-1}=0,184;$!}

$P\left(X=3\right)=((1^3)\over (3}\cdot e^{-1}=0,061;$!}

$P\left(X=4\right)=((1^4)\over (4}\cdot e^{-1}=0,015;$!}

$P\left(X=5\right)=((1^5)\over (5}\cdot e^{-1}=0,003;$!}

$P\left(X=6\right)=((1^6)\over (6}\cdot e^{-1}=0,001;$!}

$P\left(X=k\right)=(((\lambda )^k)\over (k}\cdot e^{-\lambda }$!}

The distribution law of the random variable $X$:

$\begin(array)(|c|c|)
\hline
X_i & 0 & 1 & 2 & 3 & 4 & 5 & 6 & ... & k \\
\hline
P_i & 0.368; & 0.368 & 0.184 & 0.061 & 0.015 & 0.003 & 0.001 & ... & (((\lambda )^k)\over (k}\cdot e^{-\lambda } \\!}
\hline
\end(array)$

For such a random variable, the mathematical expectation and variance are equal to each other and equal to the parameter $\lambda $, i.e. $M\left(X\right)=D\left(X\right)=\lambda =1$.

3. Geometric law of distribution.

If a discrete random variable $X$ can take only natural values $1,\ 2,\ \dots ,\ n$ with probabilities $P\left(X=k\right)=p(\left(1-p\right)) ^(k-1),\ k=1,\ 2,\ 3,\ \dots $, then we say that such a random variable $X$ is subject to the geometric law of probability distribution. In fact, the geometric distribution appears to be Bernoulli's trials to the first success.

Example . Examples of random variables that have a geometric distribution can be: the number of shots before the first hit on the target; number of tests of the device before the first failure; the number of coin tosses before the first heads up, and so on.

The mathematical expectation and variance of a random variable subject to a geometric distribution are respectively equal to $M\left(X\right)=1/p$, $D\left(X\right)=\left(1-p\right)/p^ 2$.

Example . On the way of fish movement to the spawning place there is a $4$ lock. The probability of a fish passing through each lock is $p=3/5$. Construct a distribution series of the random variable $X$ - the number of locks passed by the fish before the first stop at the lock. Find $M\left(X\right),\ D\left(X\right),\ \sigma \left(X\right)$.

Let the random variable $X$ be the number of sluices passed by the fish before the first stop at the sluice. Such a random variable is subject to the geometric law of probability distribution. The values that the random variable $X can take are: 1, 2, 3, 4. The probabilities of these values are calculated by the formula: $P\left(X=k\right)=pq^(k-1)$, where: $ p=2/5$ - probability of fish being caught through the lock, $q=1-p=3/5$ - probability of fish passing through the lock, $k=1,\ 2,\ 3,\ 4$.

$P\left(X=1\right)=((2)\over (5))\cdot (\left(((3)\over (5))\right))^0=((2)\ over(5))=0.4;$

$P\left(X=2\right)=((2)\over (5))\cdot ((3)\over (5))=((6)\over (25))=0.24; $

$P\left(X=3\right)=((2)\over (5))\cdot (\left(((3)\over (5))\right))^2=((2)\ over (5))\cdot ((9)\over (25))=((18)\over (125))=0.144;$

$P\left(X=4\right)=((2)\over (5))\cdot (\left(((3)\over (5))\right))^3+(\left(( (3)\over (5))\right))^4=((27)\over (125))=0.216.$

$\begin(array)(|c|c|)
\hline
X_i & 1 & 2 & 3 & 4 \\
\hline
P\left(X_i\right) & 0.4 & 0.24 & 0.144 & 0.216 \\
\hline
\end(array)$

Expected value:

$M\left(X\right)=\sum^n_(i=1)(x_ip_i)=1\cdot 0.4+2\cdot 0.24+3\cdot 0.144+4\cdot 0.216=2.176.$

Dispersion:

$D\left(X\right)=\sum^n_(i=1)(p_i(\left(x_i-M\left(X\right)\right))^2=)0,4\cdot (\ left(1-2,176\right))^2+0,24\cdot (\left(2-2,176\right))^2+0,144\cdot (\left(3-2,176\right))^2+$

$+\ 0.216\cdot (\left(4-2.176\right))^2\approx 1.377.$

Standard deviation:

$\sigma \left(X\right)=\sqrt(D\left(X\right))=\sqrt(1,377)\approx 1,173.$

4. Hypergeometric distribution law.

If there are $N$ objects, among which $m$ objects have the given property. Randomly, without replacement, $n$ objects are extracted, among which there are $k$ objects that have a given property. The hypergeometric distribution makes it possible to estimate the probability that exactly $k$ objects in a sample have a given property. Let the random variable $X$ be the number of objects in the sample that have a given property. Then the probabilities of the values of the random variable $X$:

$P\left(X=k\right)=((C^k_mC^(n-k)_(N-m))\over (C^n_N))$

Comment. The HYPERGEOMET statistical function of the Excel $f_x$ Function Wizard allows you to determine the probability that a certain number of trials will be successful.

$f_x\to $ statistical$\to $ HYPERGEOMET$\to $ OK. A dialog box will appear that you need to fill out. In the graph Number_of_successes_in_sample specify the value of $k$. sample_size equals $n$. In the graph Number_of_successes_in_population specify the value of $m$. Population_size equals $N$.

The mathematical expectation and variance of a discrete random variable $X$ subject to a geometric distribution law are $M\left(X\right)=nm/N$, $D\left(X\right)=((nm\left(1 -((m)\over (N))\right)\left(1-((n)\over (N))\right))\over (N-1))$.

Example . The credit department of the bank employs 5 specialists with higher financial education and 3 specialists with higher legal education. The bank's management decided to send 3 specialists for advanced training, selecting them randomly.

a) Make a distribution series of the number of specialists with higher financial education who can be directed to advanced training;

b) Find the numerical characteristics of this distribution.

Let the random variable $X$ be the number of specialists with higher financial education among the three selected. Values that $X:0,\ 1,\ 2,\ 3$ can take. This random variable $X$ is distributed according to the hypergeometric distribution with the following parameters: $N=8$ - population size, $m=5$ - number of successes in the population, $n=3$ - sample size, $k=0,\ 1, \ 2,\ 3$ - number of successes in the sample. Then the probabilities $P\left(X=k\right)$ can be calculated using the formula: $P(X=k)=(C_(m)^(k) \cdot C_(N-m)^(n-k) \over C_( N)^(n) ) $. We have:

$P\left(X=0\right)=((C^0_5\cdot C^3_3)\over (C^3_8))=((1)\over (56))\approx 0.018;$

$P\left(X=1\right)=((C^1_5\cdot C^2_3)\over (C^3_8))=((15)\over (56))\approx 0.268;$

$P\left(X=2\right)=((C^2_5\cdot C^1_3)\over (C^3_8))=((15)\over (28))\approx 0.536;$

$P\left(X=3\right)=((C^3_5\cdot C^0_3)\over (C^3_8))=((5)\over (28))\approx 0.179.$

Then the distribution series of the random variable $X$:

$\begin(array)(|c|c|)
\hline
X_i & 0 & 1 & 2 & 3 \\
\hline
p_i & 0.018 & 0.268 & 0.536 & 0.179 \\
\hline
\end(array)$

Let us calculate the numerical characteristics of the random variable $X$ using the general formulas of the hypergeometric distribution.

$M\left(X\right)=((nm)\over (N))=((3\cdot 5)\over (8))=((15)\over (8))=1,875.$

$D\left(X\right)=((nm\left(1-((m)\over (N))\right)\left(1-((n)\over (N))\right)) \over (N-1))=((3\cdot 5\cdot \left(1-((5)\over (8))\right)\cdot \left(1-((3)\over (8 ))\right))\over (8-1))=((225)\over (448))\approx 0.502.$

$\sigma \left(X\right)=\sqrt(D\left(X\right))=\sqrt(0.502)\approx 0.7085.$

Portal for the student. Self-training

Discrete distributions

Poisson distribution

Normal distribution (Gaussian distribution)

Exponential (exponential) distribution and Laplace distribution (double exponential, double exponential)

6.4.* Generating function

Hypergeometric distribution

Geometric distribution in MS EXCEL

Tasks

1. Binomial distribution law.

2. Poisson distribution law.

3. Geometric law of distribution.

4. Hypergeometric distribution law.

RELATED ARTICLES