Probability distribution. Binomial distribution of a discrete random variable

Probability theory is a branch of mathematics that studies the patterns of random phenomena: random events, random variables, their properties and operations on them.

Long time the theory of probability did not have a clear definition. It was formulated only in 1929. The emergence of probability theory as a science is attributed to the Middle Ages and the first attempts mathematical analysis gambling (toss, dice, roulette). French mathematicians XVII century Blaise Pascal and Pierre Fermat, exploring the prediction of winnings in gambling, discovered the first probabilistic patterns that arise when throwing dice.

The theory of probability arose as a science from the belief that certain regularities underlie massive random events. Probability theory studies these patterns.

Probability theory deals with the study of events, the occurrence of which is not known for certain. It allows you to judge the degree of probability of the occurrence of some events compared to others.

For example: it is impossible to unambiguously determine the result of the loss of “heads” or “tails” as a result of tossing a coin, but with multiple tossing, approximately the same number heads and tails, which means there is a 50% chance of getting heads or tails.

test in this case, the implementation of a certain set of conditions is called, that is, in this case coin toss. The challenge can be played an unlimited number of times. In this case, the complex of conditions includes random factors.

The test result is event. The event happens:

Reliable (always occurs as a result of testing).
Impossible (never happens).
Random (may or may not occur as a result of the test).

For example, when tossing a coin impossible event- the coin will be on edge, a random event - the loss of "heads" or "tails". The specific test result is called elementary event. As a result of the test, only elementary events occur. The totality of all possible, different, specific test outcomes is called elementary event space.

Basic concepts of the theory

Probability- the degree of possibility of the occurrence of the event. When the reasons for some possible event to actually occur outweigh the opposite reasons, then this event is called probable, otherwise - unlikely or improbable.

Random value- this is a value that, as a result of the test, can take one or another value, and it is not known in advance which one. For example: the number of fire stations per day, the number of hits with 10 shots, etc.

Random variables can be divided into two categories.

Discrete random variable called such a value, which as a result of the test can take certain values with a certain probability, forming a countable set (a set whose elements can be numbered). This set can be either finite or infinite. For example, the number of shots before the first hit on the target is a discrete random variable, because this value can take on an infinite, although countable, number of values.
Continuous random variable is a quantity that can take any value from some finite or infinite interval. Obviously, the number of possible values of a continuous random variable is infinite.

Probability space- the concept introduced by A.N. Kolmogorov in the 30s of the XX century to formalize the concept of probability, which gave rise to rapid development probability theory as a rigorous mathematical discipline.

The probability space is a triple (sometimes framed in angle brackets: , where

This is an arbitrary set, the elements of which are called elementary events, outcomes or points;
- sigma-algebra of subsets called (random) events;
- probabilistic measure or probability, i.e. sigma-additive finite measure such that .

De Moivre-Laplace theorem- one of the limiting theorems of probability theory, established by Laplace in 1812. She states that the number of successes in repeating the same random experiment with two possible outcomes is approximately normally distributed. It allows you to find an approximate value of the probability.

If, for each of the independent trials, the probability of the occurrence of some random event is equal to () and is the number of trials in which it actually occurs, then the probability of the validity of the inequality is close (for large ) to the value of the Laplace integral.

Distribution function in probability theory- a function characterizing the distribution of a random variable or a random vector; the likelihood that random value X will take on a value less than or equal to x, where x is arbitrary real number. Under certain conditions, it completely determines a random variable.

Expected value- the average value of a random variable (this is the probability distribution of a random variable, considered in probability theory). In English literature, it is denoted by, in Russian -. In statistics, the notation is often used.

Let a probability space and a random variable defined on it be given. That is, by definition, a measurable function. Then, if there is a Lebesgue integral of over space , then it is called the mathematical expectation, or mean value, and is denoted by .

Variance of a random variable- a measure of the spread of a given random variable, i.e. its deviation from the mathematical expectation. Designated in Russian literature and in foreign. In statistics, the designation or is often used. Square root of the variance is called the standard deviation, standard deviation, or standard spread.

Let be a random variable defined on some probability space. Then

where the symbol stands for expected value.

In probability theory, two random events called independent if the occurrence of one of them does not change the probability of the occurrence of the other. Similarly, two random variables are called dependent if the value of one of them affects the probability of the values of the other.

The simplest form of law big numbers- this is Bernoulli's theorem, stating that if the probability of an event is the same in all trials, then with an increase in the number of trials, the frequency of the event tends to the probability of the event and ceases to be random.

The law of large numbers in probability theory states that the arithmetic mean of a finite sample from a fixed distribution is close to the theoretical mean of that distribution. Depending on the type of convergence, a weak law of large numbers is distinguished, when convergence in probability takes place, and a strong law of large numbers, when convergence almost certainly takes place.

General meaning of the law of large numbers - joint action a large number identical and independent random factors leads to a result that does not depend on the case in the limit.

Methods for estimating probability based on the analysis of a finite sample are based on this property. good example is a prediction of election results based on a poll of a sample of voters.

Central limit theorems- a class of theorems in probability theory stating that the sum of a sufficiently large number of weakly dependent random variables that have approximately the same scale (none of the terms dominates, does not make a decisive contribution to the sum) has a distribution close to normal.

Since many random variables in applications are formed under the influence of several weakly dependent random factors, their distribution is considered normal. In this case, the condition must be observed that none of the factors is dominant. Central limit theorems in these cases justify the application of the normal distribution.

Section 6. Typical distribution laws and numerical characteristics of random variables

The form of the functions F(x), p(x), or the enumeration p(x i) is called the distribution law of the random variable. While one can imagine an infinite variety of random variables, there are far fewer laws of distribution. First, different random variables can have exactly the same distribution laws. For example: let y take only 2 values 1 and -1 with probabilities 0.5; the value z = -y has exactly the same distribution law.
Secondly, very often random variables have similar distribution laws, i.e., for example, p(x) for them is expressed by formulas of the same form, differing only in one or more constants. These constants are called distribution parameters.

Although, in principle, the most different laws distribution, some of the most typical laws will be considered here. It is important to pay attention to the conditions under which they arise, the parameters and properties of these distributions.

one . Uniform distribution
This is the name of the distribution of a random variable that can take any values in the interval (a,b), and the probability of falling into any segment inside (a,b) is proportional to the length of the segment and does not depend on its position, and the probability of values outside (a,b ) is equal to 0.

Fig 6.1 Function and density of uniform distribution

Distribution parameters: a , b

2. Normal distribution
Distribution with density described by the formula

(6.1)

called normal.
Distribution parameters: a , σ

Figure 6.2 Typical view of density and normal distribution function

3 . Bernoulli distribution
If a series of independent trials is made, in each of which event A can appear with the same probability p, then the number of occurrences of the event is a random variable distributed according to the Bernoulli law, or according to the binomial law (another distribution name).

Here n is the number of trials in the series, m is a random variable (the number of occurrences of event A), P n (m) is the probability that A will happen exactly m times, q \u003d 1 - p (the probability that A will not appear in the test ).

Example 1: A die is rolled 5 times, what is the probability that a 6 will be rolled twice?
n=5, m=2, p=1/6, q=5/6

Distribution parameters: n, p

4 . Poisson distribution
The Poisson distribution is obtained as the limiting case of the Bernoulli distribution if p tends to zero and n tends to infinity, but in such a way that their product remains constant: np = a. Formally, such passage to the limit leads to the formula

Distribution parameter: a

The Poisson distribution is subject to many random variables encountered in science and practical life.

Example 2: Number of calls received at the ambulance station in an hour.
Let us divide the time interval T (1 hour) into small intervals dt, such that the probability of two or more calls during dt is negligible, and the probability of one call p is proportional to dt: p = μdt ;
we will consider the observation during the moments dt as independent trials, the number of such trials during the time T: n = T / dt;
if we assume that the probabilities of receiving calls do not change during the hour, then total number calls obeys Bernoulli's law with parameters: n = T / dt, p = μdt. Letting dt tend to zero, we get that n tends to infinity, and the product n × p remains constant: a = n × p = μT.

Example 3: number of molecules ideal gas in some fixed volume V.
Let us divide the volume V into small volumes dV such that the probability of finding two or more molecules in dV is negligible, and the probability of finding one molecule is proportional to dV: р = μdV; we will consider the observation of each volume dV as independent test, the number of such tests n=V/dV; if we assume that the probabilities of finding a molecule anywhere inside V are the same, the total number of molecules in the volume V obeys Bernoulli's law with parameters: n = V / dV, p = μdV. Letting dV tend to zero, we get that n tends to infinity, and the product n × p remains constant: a = n × p = μV.

Numerical characteristics of random variables

one . Mathematical expectation (average value)

Definition:
The mathematical expectation is
(6.4)

The sum is taken over all the values that the random variable takes. The series must be absolutely convergent (otherwise, the random variable is said to have no mathematical expectation)

; (6.5)

The integral must be absolutely convergent (otherwise the random variable is said to have no expected value)

Properties of mathematical expectation:

a. If with - constant, then MS = C
b. Mx = Smx
c. The mathematical expectation of the sum of random variables is always equal to the sum of their mathematical expectations: М(х+y) = Мх + Мy d . The concept of conditional mathematical expectation is introduced. If a random variable takes its values x i with different probabilities p(x i /H j) at different conditions H j , then the conditional expectation is determined

as or ; (6.6)

If the probabilities of events H j are known, the complete

expected value: ; (6.7)

Example 4: How many times, on average, do you need to toss a coin before the first coat of arms appears? This problem can be solved "on the forehead"

x i	1 2 3 ... k..
p(x i) :		,

but this amount still needs to be calculated. You can do it easier, using the concepts of conditional and full mathematical expectation. Consider the hypotheses H 1 - the coat of arms fell out for the first time, H 2 - it did not fall out the first time. Obviously, p (H 1) \u003d p (H 2) \u003d ½; Mx / H 1 \u003d 1;
Mx / H 2 is 1 more than the desired full expectation, because after the first toss of the coin, the situation has not changed, but once it has already been tossed. Using the formula of the full mathematical expectation, we have Mx \u003d Mx / H 1 × p (H 1) + Mx / H 2 × p (H 2) \u003d 1 × 0.5 + (Mx + 1) × 0.5, solving the equation for Mx, we immediately obtain Mx = 2.

e. If f(x) is a function of a random variable x, then the concept of the mathematical expectation of a function of a random variable is defined:

For a discrete random variable: ; (6.8)

The sum is taken over all the values that the random variable takes. The series must be absolutely convergent.

For a continuous random variable: ; (6.9)

The integral must be absolutely convergent.

2. Variance of a random variable
Definition:
The dispersion of a random variable x is the mathematical expectation of the squared deviation of the value of the quantity from its mathematical expectation: Dx = M(x-Mx) 2

For a discrete random variable: ; (6.10)

The sum is taken over all the values that the random variable takes. The series must be convergent (otherwise the random variable is said to have no variance)

For a continuous random variable: ; (6.11)

The integral must converge (otherwise the random variable is said to have no variance)

Dispersion properties:
a. If C is a constant value, then DC = 0
b. DСх = С 2 Dх
c. The variance of the sum of random variables is always equal to the sum of their variances only if these variables are independent (definition of independent variables)
d. To calculate the variance, it is convenient to use the formula:

Dx = Mx 2 - (Mx) 2 (6.12)

Relationship of numerical characteristics
and parameters of typical distributions

distribution	options	formula	Mx	Dx
uniform	a , b		(b+a) / 2	(b-a) 2 / 12
normal	a , σ		a	σ2
Bernoulli	n,p		np	npq
Poisson	a		a	a

In practice, most random variables affected by a large number of random factors, obey normal law probability distributions. Therefore, in various applications of probability theory, this law is of particular importance.

A random variable $X$ obeys the normal probability distribution law if its probability distribution density has the following form

$$f\left(x\right)=((1)\over (\sigma \sqrt(2\pi )))e^(-(((\left(x-a\right))^2)\over ( 2(\sigma )^2)))$$

Schematically, the graph of the function $f\left(x\right)$ is shown in the figure and has the name "Gaussian curve". To the right of this graphic is the German 10 Mark banknote, which was in use even before the introduction of the euro. If you look closely, you can see the Gaussian curve and its discoverer on this banknote the greatest mathematician Carl Friedrich Gauss.

Let's go back to our density function $f\left(x\right)$ and give some explanation about the distribution parameters $a,\ (\sigma )^2$. The parameter $a$ characterizes the center of dispersion of the values of the random variable, that is, it has the meaning of the mathematical expectation. When the parameter $a$ changes and the parameter $(\sigma )^2$ remains unchanged, we can observe the shift of the graph of the function $f\left(x\right)$ along the abscissa axis, while the density graph itself does not change its shape.

The parameter $(\sigma )^2$ is the variance and characterizes the shape of the density curve $f\left(x\right)$. When changing the parameter $(\sigma )^2$ with the parameter $a$ unchanged, we can observe how the density graph changes its shape, shrinking or stretching, while not shifting along the abscissa.

Probability of a normally distributed random variable falling into a given interval

As is known, the probability that a random variable $X$ falls into the interval $\left(\alpha ;\ \beta \right)$ can be calculated $P\left(\alpha< X < \beta \right)=\int^{\beta }_{\alpha }{f\left(x\right)dx}$. Для нормального распределения случайной величины $X$ с параметрами $a,\ \sigma $ справедлива следующая формула:

$$P\left(\alpha< X < \beta \right)=\Phi \left({{\beta -a}\over {\sigma }}\right)-\Phi \left({{\alpha -a}\over {\sigma }}\right)$$

Here the function $\Phi \left(x\right)=((1)\over (\sqrt(2\pi )))\int^x_0(e^(-t^2/2)dt)$ is the Laplace function . The values of this function are taken from . The following properties of the function $\Phi \left(x\right)$ can be noted.

1 . $\Phi \left(-x\right)=-\Phi \left(x\right)$, i.e. the function $\Phi \left(x\right)$ is odd.

2 . $\Phi \left(x\right)$ is a monotonically increasing function.

3 . $(\mathop(lim)_(x\to +\infty ) \Phi \left(x\right)\ )=0.5$, $(\mathop(lim)_(x\to -\infty ) \ Phi \left(x\right)\ )=-0.5$.

To calculate the values of the $\Phi \left(x\right)$ function, you can also use the $f_x$ function wizard of the Excel package: $\Phi \left(x\right)=NORMDIST\left(x;0;1;1\right )-0.5$. For example, let's calculate the values of the function $\Phi \left(x\right)$ for $x=2$.

The probability that a normally distributed random variable $X\in N\left(a;\ (\sigma )^2\right)$ falls into an interval symmetric with respect to the expectation $a$ can be calculated by the formula

$$P\left(\left|X-a\right|< \delta \right)=2\Phi \left({{\delta }\over {\sigma }}\right).$$

Three sigma rule. It is practically certain that a normally distributed random variable $X$ falls into the interval $\left(a-3\sigma ;a+3\sigma \right)$.

Example 1 . The random variable $X$ is subject to the normal probability distribution law with parameters $a=2,\ \sigma =3$. Find the probability that $X$ falls into the interval $\left(0,5;1\right)$ and the probability that the inequality $\left|X-a\right|< 0,2$.

Using the formula

$$P\left(\alpha< X < \beta \right)=\Phi \left({{\beta -a}\over {\sigma }}\right)-\Phi \left({{\alpha -a}\over {\sigma }}\right),$$

find $P\left(0,5;1\right)=\Phi \left(((1-2)\over (3))\right)-\Phi \left(((0,5-2)\ over (3))\right)=\Phi \left(-0.33\right)-\Phi \left(-0.5\right)=\Phi \left(0.5\right)-\Phi \ left(0.33\right)=0.191-0.129=$0.062.

$$P\left(\left|X-a\right|< 0,2\right)=2\Phi \left({{\delta }\over {\sigma }}\right)=2\Phi \left({{0,2}\over {3}}\right)=2\Phi \left(0,07\right)=2\cdot 0,028=0,056.$$

Example 2 . Suppose that during the year the price of shares of a certain company is a random variable distributed according to the normal law with a mathematical expectation equal to 50 conventional monetary units and a standard deviation equal to 10. What is the probability that, on a randomly chosen day of the period under discussion, the price for the share will be:

a) more than 70 conventional monetary units?

b) below 50 per share?

c) between 45 and 58 conditional monetary units per share?

Let the random variable $X$ be the price of shares of some company. By condition, $X$ is subject to a normal distribution with parameters $a=50$ - mathematical expectation, $\sigma =10$ - standard deviation. Probability $P\left(\alpha< X < \beta \right)$ попадания $X$ в интервал $\left(\alpha ,\ \beta \right)$ будем находить по формуле:

$$P\left(\alpha< X < \beta \right)=\Phi \left({{\beta -a}\over {\sigma }}\right)-\Phi \left({{\alpha -a}\over {\sigma }}\right).$$

$$a)\ P\left(X>70\right)=\Phi \left(((\infty -50)\over (10))\right)-\Phi \left(((70-50)\ over (10))\right)=0.5-\Phi \left(2\right)=0.5-0.4772=0.0228.$$

$$b)\ P\left(X< 50\right)=\Phi \left({{50-50}\over {10}}\right)-\Phi \left({{-\infty -50}\over {10}}\right)=\Phi \left(0\right)+0,5=0+0,5=0,5.$$

$$c)\ P\left(45< X < 58\right)=\Phi \left({{58-50}\over {10}}\right)-\Phi \left({{45-50}\over {10}}\right)=\Phi \left(0,8\right)-\Phi \left(-0,5\right)=\Phi \left(0,8\right)+\Phi \left(0,5\right)=$$

Despite the exotic names, common distributions are related to each other in quite intuitive and interesting ways that make it easy to remember them and talk about them confidently. Some naturally follow, for example, from the Bernoulli distribution. Time to show the map of these connections.

Each distribution is illustrated by an example of its distribution density function (DDF). This article is only about those distributions whose outcomes are − single numbers. So, horizontal axis each graph is a set of possible numbers-outcomes. Vertical - the probability of each outcome. Some distributions are discrete - their outcomes must be integers, such as 0 or 5. These are indicated by sparse lines, one for each outcome, with a height corresponding to the probability of this outcome. Some are continuous, their outcomes can take any numerical value, such as -1.32 or 0.005. These are shown as dense curves with areas under the sections of the curve that give the probabilities. The sum of the heights of the lines and areas under the curves is always 1.

Print it out, cut along the dotted line, and carry it with you in your wallet. This is your guide to the country of distributions and their relatives.

Bernoulli and uniform

You have already met the Bernoulli distribution above, with two outcomes - heads or tails. Imagine it now as a distribution over 0 and 1, 0 being heads and 1 being tails. As is already clear, both outcomes are equally likely, and this is reflected in the diagram. The Bernoulli PDF contains two lines the same height representing 2 equally likely outcomes: 0 and 1, respectively.

The Bernoulli distribution can also represent unequal outcomes, such as flipping the wrong coin. Then the probability of heads will be not 0.5, but some other value p, and the probability of tails will be 1-p. Like many other distributions, it is actually a whole family of distributions given certain parameters, like p above. When you think "Bernoulli" - think about "tossing a (possibly wrong) coin."

Hence very small step before presenting a distribution over several equiprobable outcomes: a uniform distribution characterized by a flat PDF. Represent the correct dice. His outcomes 1-6 are equally likely. It can be set for any number of outcomes n, and even as a continuous distribution.

think about uniform distribution as a "correct dice".

Binomial and hypergeometric

The binomial distribution can be thought of as the sum of the outcomes of those things that follow the Bernoulli distribution.

Flip an honest coin twice - how many times will it be heads? This is a number that obeys the binomial distribution. Its parameters are n, the number of trials, and p is the probability of "success" (in our case, heads or 1). Each roll is a Bernoulli distributed outcome, or test. Use the binomial distribution when counting the number of successes in things like tossing a coin, where each toss is independent of the others and has the same probability of success.

Or imagine an urn with the same number of white and black balls. Close your eyes, pull out the ball, write down its color and return it back. Repeat. How many times has the black ball been drawn? This number also follows the binomial distribution.

This strange situation we have introduced to make it easier to understand the meaning of the hypergeometric distribution. This is the distribution of the same number, but in a situation if we not return the balls. It certainly cousin binomial distribution, but not the same, since the probability of success changes with each ball drawn. If the number of balls is large enough compared to the number of draws, then these distributions are almost the same, since the chance of success changes very little with each draw.

When someone talks about drawing balls out of urns without returning, it is almost always safe to say “yes, hypergeometric distribution”, because in my life I have not yet met anyone who would actually fill urns with balls and then take them out and return them, or vice versa. I don't even have friends with urns. Even more often, this distribution should come up when choosing a significant subset of some general population as a sample.

Note. transl.

It may not be very clear here, but since the tutorial and the express course for beginners, it would be necessary to explain. The population is something that we want to evaluate statistically. To estimate, we select a certain part (subset) and make the required estimate on it (then this subset is called a sample), assuming that the estimate will be similar for the entire population. But for this to be true, additional restrictions are often required on the definition of a subset of the sample (or vice versa, from a known sample, we need to evaluate whether it describes the population accurately enough).

A practical example - we need to select representatives from a company of 100 people to travel to E3. It is known that 10 people have already traveled in it last year (but no one is recognized). How much minimum should be taken so that at least one experienced comrade is likely to be in the group? In this case population- 100, selection - 10, selection requirements - at least one who has already traveled to E3.

Wikipedia has a less funny but more practical example about defective parts in a batch.

poisson

What about the number of customers calling hotline to technical support every minute? This is an outcome whose distribution is at first glance binomial, if we consider every second as a Bernoulli trial, during which the customer either does not call (0) or calls (1). But power supply organizations know very well: when the electricity is turned off, two people can call in a second. or even more than a hundred of people. Presenting it as 60,000 millisecond trials doesn't help either - there are more trials, the probability of a call per millisecond is less, even if you do not count two or more at the same time, but, technically, this is still not a Bernoulli test. However, logical reasoning works with the transition to infinity. Let n go to infinity and p go to 0, so that np is constant. It's like dividing into smaller and smaller fractions of time with less and less chance of a call. In the limit, we get the Poisson distribution.

Just like the binomial distribution, the Poisson distribution is a quantity distribution: the number of times something happens. It is parametrized not by the probability p and the number of trials n, but by the average intensity λ, which, in analogy with the binomial, is simply constant value n.p. The Poisson distribution is what necessary remember when it comes to counting events for certain time at a constant given intensity.

When there is something like packets arriving at a router or customers appearing in a store or something waiting in line, think Poisson.

Geometric and negative binomial

From simple tests Bernoulli appears another distribution. How many times does a coin come up tails before it comes up heads? The number of tails follows a geometric distribution. Like the Bernoulli distribution, it is parametrized by the probability of a successful outcome, p. It is not parametrized by the number n, the number of trials, because the number of failed trials is precisely the outcome.

If the binomial distribution is "how many successes", then the geometric distribution is "How many failures before success?".

The negative binomial distribution is a simple generalization of the previous one. This is the number of failures before there are r, not 1, successes. Therefore, it is additionally parametrized by this r. It is sometimes described as the number of successes before r failures. But, as my life coach says: “You decide what is success and what is failure”, so this is the same, if you do not forget that the probability p must also correct probability success or failure, respectively.

If you need a joke to relieve tension, you can mention that the binomial and hypergeometric distributions are an obvious pair, but the geometric and negative binomial distributions are also quite similar, and then state “Well, who calls them all like that, huh?”

Exponential and Weibull

Again about calls to technical support: how long will it take before the next call? The distribution of this waiting time seems to be geometric, because every second until no one calls is like a failure, until the second, until, finally, the call occurs. The number of failures is like the number of seconds until no one called, and this is practically time until the next call, but "practically" is not enough for us. The bottom line is that this time will be the sum of whole seconds, and thus it will not be possible to calculate the wait within this second before the actual call.

Well, as before, we go to geometric distribution to the limit, regarding time shares - and voila. We get an exponential distribution , which accurately describes the time before the call. This is continuous distribution, we have the first one, because the outcome is not necessarily in whole seconds. Like the Poisson distribution, it is parametrized by the intensity λ.

Echoing the connection between the binomial and the geometric, Poisson's "how many events in a time?" is related to the exponential "how long before the event?". If there are events whose number per unit time obeys the Poisson distribution, then the time between them obeys the exponential distribution with the same parameter λ. This correspondence between the two distributions must be noted when either is discussed.

The exponential distribution should come to mind when thinking about "time to event", perhaps "time to failure". In fact, this is such an important situation that there are more generalized distributions to describe MTBF, such as the Weibull distribution. While the exponential distribution is appropriate when the wear or failure rate is, for example, constant, the Weibull distribution can model an increasing (or decreasing) failure rate over time. Exponential, in general, a special case.

Think Weibull when it comes to MTBF.

Normal, lognormal, Student's and chi-square

The normal, or Gaussian, distribution is probably one of the most important. Its bell-shaped shape is immediately recognizable. Like , this is a particularly curious entity that manifests itself everywhere, even from the most outwardly simple sources. Take a set of values that obey the same distribution - any! - and fold them up. The distribution of their sum is subject to (approximately) normal distribution. The more things are summed up, the closer their sum corresponds to a normal distribution (trick: the distribution of terms must be predictable, be independent, it tends only to normal). That this is so, despite the original distribution, is amazing.

Note. transl.

I was surprised that the author does not write about the need for a comparable scale of summable distributions: if one significantly dominates the others, it will converge extremely badly. And, in general, absolute mutual independence is not necessary, a weak dependence is sufficient.

Well, it's probably for parties, as he wrote.

This is called " central limit theorem", and you need to know what it is, why it is called that and what it means, otherwise they will instantly laugh at it.

In its context, the normal is related to all distributions. Although, basically, it is associated with the distribution of all amounts. The sum of the Bernoulli trials follows a binomial distribution and, as the number of trials increases, this binomial distribution gets closer and closer to a normal distribution. Similarly, its cousin is the hypergeometric distribution. Poisson distribution - limit form binomial - also approaches normal with an increase in the intensity parameter.

Outcomes that follow a lognormal distribution give values whose logarithm is normally distributed. Or in another way: the exponent of a normally distributed value is lognormally distributed. If the sums are normally distributed, then remember also that the products are lognormally distributed.

Student's t-distribution is the basis of the t-test, which many non-statisticians study in other fields. It is used to make assumptions about the mean of a normal distribution and also tends to a normal distribution as its parameter increases. Distinctive feature The t-distribution is its tails, which are thicker than those of the normal distribution.

If the fat-tailed anecdote hasn't shaken your neighbor enough, move on to a rather funny beer tale. Over 100 years ago, Guinness used statistics to improve its stout. Then William Seely Gosset invented a completely new statistical theory for improved barley cultivation. Gosset convinced the boss that other brewers would not understand how to use his ideas and got permission to publish it, but under the pseudonym "Student". Most famous achievement Gosset is just this very t-distribution, which, one might say, is named after him.

Finally, the chi-square distribution is the distribution of the sums of squares of normally distributed quantities. A chi-square test is built on this distribution, itself based on the sum of the squared differences, which should be normally distributed.

Gamma and beta

At this point, if you're already talking about something chi-square, the conversation starts in earnest. You're probably already talking to real statisticians, and it's probably worth bowing out already, as things like the gamma distribution might come up. This is a generalization and exponential and chi-squared distribution. Like the exponential distribution, it is used for complex latency models. For example, the gamma distribution appears when the time to the next n events is simulated. It appears in machine learning as a "conjugate prior" to a couple of other distributions.

Don't get into the conversation about these conjugate distributions, but if you do, don't forget to mention the beta distribution, because it's the conjugate prior of most of the distributions mentioned here. Data scientists are sure that this is exactly what it was made for. Mention this inadvertently and go to the door.

The Beginning of Wisdom

Probability distributions are something you can't know too much about. The truly interested can refer to this super-detailed map of all probability distributions Add tags

As is known, random variable called variable, which can take on certain values depending on the case. Random variables denote capital letters Latin alphabet(X, Y, Z) and their values in their respective lowercase letters (x, y, z). Random variables are divided into discontinuous (discrete) and continuous.

Discrete random variable is a random variable that takes only a finite or infinite (countable) set of values with certain non-zero probabilities.

The distribution law of a discrete random variable is a function that connects the values of a random variable with their corresponding probabilities. The distribution law can be specified in one of the following ways.

1 . The distribution law can be given by the table:

where λ>0, k = 0, 1, 2, … .

in) via distribution function F(x) , which determines for each value x the probability that the random variable X takes a value less than x, i.e. F(x) = P(X< x).

Properties of the function F(x)

3 . The distribution law can be set graphically – distribution polygon (polygon) (see problem 3).

Note that in order to solve some problems, it is not necessary to know the distribution law. In some cases, it is enough to know one or more numbers that reflect the most important features distribution law. It can be a number that has the meaning of the "average value" of a random variable, or a number that shows the average size deviation of a random variable from its mean value. Numbers of this kind are called numerical characteristics of a random variable.

Main numerical characteristics discrete random variable :

Mathematical expectation (mean value) of a discrete random variable M(X)=Σ x i p i.
For binomial distribution M(X)=np, for Poisson distribution M(X)=λ
Dispersion discrete random variable D(X)=M2 or D(X) = M(X 2) − 2. The difference X–M(X) is called the deviation of a random variable from its mathematical expectation.
For binomial distribution D(X)=npq, for Poisson distribution D(X)=λ
Standard deviation (standard deviation) σ(X)=√D(X).

Examples of solving problems on the topic "The law of distribution of a discrete random variable"

Task 1.

Issued 1000 lottery tickets: 5 of them get a win in the amount of 500 rubles, 10 - a win of 100 rubles, 20 - a win of 50 rubles, 50 - a win of 10 rubles. Determine the law of probability distribution of the random variable X - winnings per ticket.

Decision. According to the condition of the problem, it is possible the following values random variable X: 0, 10, 50, 100 and 500.

The number of tickets without winning is 1000 - (5+10+20+50) = 915, then P(X=0) = 915/1000 = 0.915.

Similarly, we find all other probabilities: P(X=0) = 50/1000=0.05, P(X=50) = 20/1000=0.02, P(X=100) = 10/1000=0.01 , P(X=500) = 5/1000=0.005. We present the resulting law in the form of a table:

Find the mathematical expectation of X: M(X) = 1*1/6 + 2*1/6 + 3*1/6 + 4*1/6 + 5*1/6 + 6*1/6 = (1+ 2+3+4+5+6)/6 = 21/6 = 3.5

Task 3.

The device consists of three independently operating elements. The probability of failure of each element in one experiment is 0.1. Draw up a distribution law for the number of failed elements in one experiment, build a distribution polygon. Find the distribution function F(x) and plot it. Find the mathematical expectation, variance and standard deviation of a discrete random variable.

Decision. 1. Discrete random variable X=(number of failed elements in one experiment) has the following possible values: x 1 \u003d 0 (none of the elements of the device failed), x 2 \u003d 1 (one element failed), x 3 \u003d 2 (two elements failed) and x 4 \u003d 3 (three elements failed).

Failures of elements are independent of each other, the probabilities of failure of each element are equal to each other, therefore, it is applicable Bernoulli's formula . Given that, by condition, n=3, p=0.1, q=1-p=0.9, we determine the probabilities of the values:
P 3 (0) \u003d C 3 0 p 0 q 3-0 \u003d q 3 \u003d 0.9 3 \u003d 0.729;
P 3 (1) \u003d C 3 1 p 1 q 3-1 \u003d 3 * 0.1 * 0.9 2 \u003d 0.243;
P 3 (2) \u003d C 3 2 p 2 q 3-2 \u003d 3 * 0.1 2 * 0.9 \u003d 0.027;
P 3 (3) \u003d C 3 3 p 3 q 3-3 \u003d p 3 \u003d 0.1 3 \u003d 0.001;
Check: ∑p i = 0.729+0.243+0.027+0.001=1.

Thus, the desired binomial distribution law X has the form:

On the abscissa axis, we plot the possible values x i, and on the ordinate axis, the corresponding probabilities р i . Let's construct points M 1 (0; 0.729), M 2 (1; 0.243), M 3 (2; 0.027), M 4 (3; 0.001). Connecting these points with line segments, we obtain the desired distribution polygon.

3. Find the distribution function F(x) = P(X

For x ≤ 0 we have F(x) = P(X<0) = 0;
for 0< x ≤1 имеем F(x) = Р(Х<1) = Р(Х = 0) = 0,729;
for 1< x ≤ 2 F(x) = Р(Х<2) = Р(Х=0) + Р(Х=1) =0,729+ 0,243 = 0,972;
for 2< x ≤ 3 F(x) = Р(Х<3) = Р(Х = 0) + Р(Х = 1) + Р(Х = 2) = 0,972+0,027 = 0,999;
for x > 3 it will be F(x) = 1, because the event is certain.

Graph of the function F(x)

4. For the binomial distribution X:
- mathematical expectation М(X) = np = 3*0.1 = 0.3;
- dispersion D(X) = npq = 3*0.1*0.9 = 0.27;
- the average standard deviationσ(X) = √D(X) = √0.27 ≈ 0.52.

Portal for the student. Self-training