Discrete Probability Distribution. Binomial distribution of a discrete random variable

Despite the exotic names, common distributions are related to each other in quite intuitive and interesting ways that make it easy to remember them and talk about them confidently. Some naturally follow, for example, from the Bernoulli distribution. Time to show the map of these connections.

Each distribution is illustrated by an example of its distribution density function (DDF). This article is only about those distributions whose outcomes are − single numbers. That's why, horizontal axis each graph is a set of possible numbers-outcomes. Vertical - the probability of each outcome. Some distributions are discrete - their outcomes must be integers, such as 0 or 5. These are indicated by sparse lines, one for each outcome, with a height corresponding to the probability of this outcome. Some are continuous, their outcomes can take any numerical value, such as -1.32 or 0.005. These are shown as dense curves with areas under the sections of the curve that give the probabilities. The sum of the heights of the lines and areas under the curves is always 1.

Print it out, cut along the dotted line, and carry it with you in your wallet. This is your guide to the country of distributions and their relatives.

Bernoulli and uniform

You have already met the Bernoulli distribution above, with two outcomes - heads or tails. Imagine it now as a distribution over 0 and 1, 0 being heads and 1 being tails. As is already clear, both outcomes are equally likely, and this is reflected in the diagram. The Bernoulli PDF contains two lines the same height representing 2 equally likely outcomes: 0 and 1, respectively.

The Bernoulli distribution can also represent unequal outcomes, such as flipping the wrong coin. Then the probability of heads will be not 0.5, but some other value p, and the probability of tails will be 1-p. Like many other distributions, it is actually a whole family of distributions given certain parameters, like p above. When you think "Bernoulli" - think about "tossing a (possibly wrong) coin."

Hence very small step before presenting a distribution over several equiprobable outcomes: a uniform distribution characterized by a flat PDF. Represent the correct dice. His outcomes 1-6 are equally likely. It can be set for any number of outcomes n, and even as a continuous distribution.

think about uniform distribution as a "correct dice".

Binomial and hypergeometric

The binomial distribution can be thought of as the sum of the outcomes of those things that follow the Bernoulli distribution.

Flip an honest coin twice - how many times will it be heads? This is a number that obeys the binomial distribution. Its parameters are n, the number of trials, and p is the probability of "success" (in our case, heads or 1). Each roll is a Bernoulli distributed outcome, or test. Use binomial distribution when counting the number of successes in things like tossing a coin, where each toss is independent of the others and has the same probability of success.

Or imagine an urn with the same number of white and black balls. Close your eyes, pull out the ball, write down its color and return it back. Repeat. How many times has the black ball been drawn? This number also follows the binomial distribution.

This strange situation we have introduced to make it easier to understand the meaning of the hypergeometric distribution. This is the distribution of the same number, but in a situation if we not return the balls. It certainly cousin binomial distribution, but not the same, since the probability of success changes with each ball drawn. If the number of balls is large enough compared to the number of draws, then these distributions are almost the same, since the chance of success changes very little with each draw.

When someone talks about drawing balls out of urns without returning, it is almost always safe to say “yes, hypergeometric distribution”, because in my life I have not yet met anyone who would actually fill urns with balls and then take them out and return them, or vice versa. I don't even have friends with urns. Even more often, this distribution should come up when choosing a significant subset of some general population as a sample.

Note. transl.

It may not be very clear here, but since the tutorial and the express course for beginners, it would be necessary to explain. The population is something that we want to evaluate statistically. To estimate, we select a certain part (subset) and make the required estimate on it (then this subset is called a sample), assuming that the estimate will be similar for the entire population. But for this to be true, additional restrictions are often required on the definition of a subset of the sample (or vice versa, from a known sample, we need to evaluate whether it describes the population accurately enough).

A practical example - we need to select representatives from a company of 100 people to travel to E3. It is known that 10 people have already traveled in it last year (but no one is recognized). How much minimum should be taken so that at least one experienced comrade is likely to be in the group? AT this case population- 100, selection - 10, selection requirements - at least one who has already traveled to E3.

Wikipedia has a less funny but more practical example about defective parts in a batch.

poisson

What about the number of customers calling hotline to technical support every minute? This is an outcome whose distribution is at first glance binomial, if we consider every second as a Bernoulli trial, during which the customer either does not call (0) or calls (1). But power supply organizations know very well: when the electricity is turned off, two people can call in a second. or even more than a hundred of people. Presenting it as 60,000 millisecond trials doesn't help either - there are more trials, the probability of a call per millisecond is less, even if you do not count two or more at the same time, but, technically, this is still not a Bernoulli test. However, logical reasoning works with the transition to infinity. Let n go to infinity and p go to 0, so that np is constant. It's like dividing into smaller and smaller fractions of time with less and less chance of a call. In the limit, we get the Poisson distribution.

Just like the binomial distribution, the Poisson distribution is a quantity distribution: the number of times something happens. It is parametrized not by the probability p and the number of trials n, but by the average intensity λ, which, in analogy with the binomial, is simply constant value n.p. The Poisson distribution is what necessary remember when it comes to counting events for certain time at a constant given intensity.

When there is something like packets arriving at a router or customers appearing in a store or something waiting in line, think Poisson.

Geometric and negative binomial

From simple tests Bernoulli appears another distribution. How many times does a coin come up tails before it comes up heads? The number of tails follows a geometric distribution. Like the Bernoulli distribution, it is parametrized by the probability of a successful outcome, p. It is not parametrized by the number n, the number of trials, because the number of failed trials is precisely the outcome.

If the binomial distribution is "how many successes", then the geometric distribution is "How many failures before success?".

The negative binomial distribution is a simple generalization of the previous one. This is the number of failures before there are r, not 1, successes. Therefore, it is additionally parametrized by this r. It is sometimes described as the number of successes before r failures. But, as my life coach says: “You decide what is success and what is failure”, so this is the same, if you do not forget that the probability p must also correct probability success or failure, respectively.

If you need a joke to relieve tension, you can mention that the binomial and hypergeometric distributions are an obvious pair, but the geometric and negative binomial distributions are also quite similar, and then state “Well, who calls them all like that, huh?”

Exponential and Weibull

Again about calls to technical support: how long will it take before the next call? The distribution of this waiting time seems to be geometric, because every second until no one calls is like a failure, until the second, until, finally, the call occurs. The number of failures is like the number of seconds until no one called, and this is practically time until the next call, but "practically" is not enough for us. The bottom line is that this time will be the sum of whole seconds, and thus it will not be possible to calculate the wait within this second until the call itself.

Well, as before, we go to geometric distribution to the limit, regarding time shares - and voila. We get an exponential distribution , which accurately describes the time before the call. it continuous distribution, we have the first one, because the outcome is not necessarily in whole seconds. Like the Poisson distribution, it is parametrized by the intensity λ.

Echoing the connection between the binomial and the geometric, Poisson's "how many events in a time?" is related to the exponential "how long before the event?". If there are events whose number per unit time obeys the Poisson distribution, then the time between them obeys the exponential distribution with the same parameter λ. This correspondence between the two distributions must be noted when either is discussed.

The exponential distribution should come to mind when thinking about "time to event", perhaps "time to failure". In fact, this is such an important situation that there are more generalized distributions to describe MTBF, such as the Weibull distribution. While the exponential distribution is appropriate when the wear or failure rate is, for example, constant, the Weibull distribution can model an increasing (or decreasing) failure rate over time. Exponential, in general, a special case.

Think Weibull when it comes to MTBF.

Normal, lognormal, Student's and chi-square

The normal, or Gaussian, distribution is probably one of the most important. Its bell-shaped shape is immediately recognizable. Like , this is a particularly curious entity that manifests itself everywhere, even from the most outwardly simple sources. Take a set of values that obey the same distribution - any! - and fold them up. The distribution of their sum is subject to (approximately) normal distribution. The more things are summed up, the closer their sum corresponds to a normal distribution (trick: the distribution of terms must be predictable, be independent, it tends only to normal). That this is so, despite the original distribution, is amazing.

Note. transl.

I was surprised that the author does not write about the need for a comparable scale of summable distributions: if one significantly dominates the others, it will converge extremely badly. And, in general, absolute mutual independence is not necessary, a weak dependence is sufficient.

Well, it's probably for parties, as he wrote.

This is called " central limit theorem", and you need to know what it is, why it is called that and what it means, otherwise they will instantly laugh at it.

In its context, the normal is related to all distributions. Although, basically, it is associated with the distribution of all amounts. The sum of the Bernoulli trials follows a binomial distribution and, as the number of trials increases, this binomial distribution gets closer and closer to a normal distribution. Similarly, its cousin is the hypergeometric distribution. The Poisson distribution - the limiting form of the binomial - also approaches normal with increasing intensity parameter.

Outcomes that follow a lognormal distribution give values whose logarithm is normally distributed. Or in another way: the exponent of a normally distributed value is lognormally distributed. If the sums are normally distributed, then remember also that the products are lognormally distributed.

Student's t-distribution is the basis of the t-test, which many non-statisticians study in other fields. It is used to make assumptions about the mean of a normal distribution and also tends to a normal distribution as its parameter increases. Distinctive feature The t-distribution is its tails, which are thicker than those of the normal distribution.

If the fat-tailed anecdote hasn't shaken your neighbor enough, move on to a rather funny beer tale. Over 100 years ago, Guinness used statistics to improve its stout. Then William Seely Gosset invented a completely new statistical theory for improved barley cultivation. Gosset convinced the boss that other brewers would not understand how to use his ideas and got permission to publish it, but under the pseudonym "Student". Most famous achievement Gosset is just this very t-distribution, which, one might say, is named after him.

Finally, the chi-square distribution is the distribution of the sums of squares of normally distributed quantities. A chi-square test is built on this distribution, itself based on the sum of the squared differences, which should be normally distributed.

Gamma and beta

At this point, if you're already talking about something chi-square, the conversation starts in earnest. You're probably already talking to real statisticians, and it's probably worth bowing out already, as things like the gamma distribution might come up. This is a generalization and exponential and chi-squared distribution. Like the exponential distribution, it is used for complex latency models. For example, the gamma distribution appears when the time to the next n events is simulated. It appears in machine learning as a "conjugate prior" to a couple of other distributions.

Don't get into the conversation about these conjugate distributions, but if you do, don't forget to mention the beta distribution, because it's the conjugate prior of most of the distributions mentioned here. Data scientists are sure that this is exactly what it was made for. Mention this inadvertently and go to the door.

The Beginning of Wisdom

Probability distributions are something you can't know too much about. The truly interested can refer to this super-detailed map of all probability distributions Add tags

Despite their exotic names, common distributions are related to each other in ways that are intuitive and interesting enough to make them easy to remember and talk about with confidence. Some naturally follow, for example, from the Bernoulli distribution. Time to show the map of these connections.

Each distribution is illustrated by an example of its distribution density function (DDF). This article is only about those distributions whose outcomes are single numbers. Therefore, the horizontal axis of each graph is a set of possible numbers-outcomes. Vertical - the probability of each outcome. Some distributions are discrete - their outcomes must be integers, such as 0 or 5. These are indicated by sparse lines, one for each outcome, with a height corresponding to the probability of this outcome. Some are continuous, their outcomes can take on any numerical value, such as -1.32 or 0.005. These are shown as dense curves with areas under the sections of the curve that give the probabilities. The sum of the heights of the lines and areas under the curves is always 1.

Print it out, cut along the dotted line, and carry it with you in your wallet. This is your guide to the country of distributions and their relatives.

Bernoulli and uniform

You have already met the Bernoulli distribution above, with two outcomes - heads or tails. Imagine it now as a distribution over 0 and 1, 0 being heads and 1 being tails. As is already clear, both outcomes are equally likely, and this is reflected in the diagram. The Bernoulli FPR contains two lines of the same height, representing 2 equally likely outcomes: 0 and 1, respectively.

From here it is a very small step to presenting a distribution over several equiprobable outcomes: a uniform distribution characterized by a flat PDF. Imagine the correct dice. His outcomes 1-6 are equally likely. It can be set for any number of outcomes n, and even as a continuous distribution.

Think of an even distribution as a "correct dice".

Binomial and hypergeometric

The binomial distribution can be thought of as the sum of the outcomes of those things that follow the Bernoulli distribution.

Flip an honest coin twice - how many times will it be heads? This is a number that obeys the binomial distribution. Its parameters are n, the number of trials, and p is the probability of "success" (in our case, heads or 1). Each roll is a Bernoulli distributed outcome, or test. Use the binomial distribution when counting the number of successes in things like flipping a coin, where each flip is independent of the others and has the same probability of success.

We presented this strange situation to make it easier to understand the meaning of the hypergeometric distribution. This is the distribution of the same number, but in a situation if we not return the balls. It is certainly a cousin of the binomial distribution, but not the same, as the probability of success changes with each ball drawn. If the number of balls is large enough compared to the number of draws, then these distributions are almost the same, since the chance of success changes very little with each draw.

Note. transl.

A practical example - we need to select representatives from a company of 100 people to travel to E3. It is known that 10 people have already traveled in it last year (but no one is recognized). How much minimum should be taken so that at least one experienced comrade is likely to be in the group? In this case, the population is 100, the sample is 10, and the sample requirements are at least one who has already ridden E3.

Wikipedia has a less funny but more practical example about defective parts in a batch.

poisson

What about the number of customers calling the technical support hotline every minute? This is an outcome whose distribution is at first glance binomial, if we consider every second as a Bernoulli trial, during which the customer either does not call (0) or calls (1). But power supply organizations know very well: when the electricity is turned off, two people can call in a second. or even more than a hundred of people. Presenting it as 60,000 millisecond trials doesn't help either - there are more trials, the probability of a call per millisecond is less, even if you do not count two or more at the same time, but, technically, this is still not a Bernoulli test. However, logical reasoning works with the transition to infinity. Let n go to infinity and p go to 0, so that np is constant. It's like dividing into smaller and smaller fractions of time with less and less chance of a call. In the limit, we get the Poisson distribution.

Just like the binomial distribution, the Poisson distribution is a quantity distribution: the number of times something happens. It is parametrized not by the probability p and the number of trials n, but by the average intensity λ, which, in analogy with the binomial, is simply a constant value of np. The Poisson distribution is what necessary remember when it comes to counting events for a certain time at a constant given intensity.

When there is something like packets arriving at a router or customers appearing in a store or something waiting in line, think Poisson.

Geometric and negative binomial

From simple Bernoulli trials, another distribution emerges. How many times does a coin come up tails before it comes up heads? The number of tails follows a geometric distribution. Like the Bernoulli distribution, it is parametrized by the probability of a successful outcome, p. It is not parametrized by the number n, the number of trials, because the number of failed trials is precisely the outcome.

If the binomial distribution is "how many successes", then the geometric distribution is "How many failures before success?".

The negative binomial distribution is a simple generalization of the previous one. This is the number of failures before there are r, not 1, successes. Therefore, it is additionally parametrized by this r. It is sometimes described as the number of successes before r failures. But, as my life coach says: “You decide what is success and what is failure”, so this is the same, if you do not forget that the probability p must also be the correct probability of success or failure, respectively.

Exponential and Weibull

Well, as before, we pass in the geometric distribution to the limit, with respect to time fractions - and voila. We get an exponential distribution , which accurately describes the time before the call. This is a continuous distribution, the first one we have, because the outcome is not necessarily in whole seconds. Like the Poisson distribution, it is parametrized by the intensity λ.

Think Weibull when it comes to MTBF.

Normal, lognormal, Student's and chi-square

The normal, or Gaussian, distribution is probably one of the most important. Its bell-shaped shape is immediately recognizable. Like , this is a particularly curious entity that manifests itself everywhere, even from the seemingly simplest sources. Take a set of values that obey the same distribution - any! - and fold them up. The distribution of their sum follows a (approximately) normal distribution. The more things are summed up, the closer their sum corresponds to a normal distribution (trick: the distribution of terms must be predictable, be independent, it tends only to normal). That this is so, despite the original distribution, is amazing.

Note. transl.

Well, it's probably for parties, as he wrote.

This is called " central limit theorem", and you need to know what it is, why it is called that and what it means, otherwise they will instantly laugh at it.

Student's t-distribution is the basis of the t-test, which many non-statisticians study in other fields. It is used to make assumptions about the mean of a normal distribution and also tends to a normal distribution as its parameter increases. A distinctive feature of the t-distribution is its tails, which are thicker than those of a normal distribution.

If the fat-tailed anecdote hasn't shaken your neighbor enough, move on to a rather funny beer tale. Over 100 years ago, Guinness used statistics to improve its stout. It was then that William Seeley Gosset invented a completely new statistical theory for improved barley cultivation. Gosset convinced the boss that other brewers would not understand how to use his ideas and got permission to publish it, but under the pseudonym "Student". Gosset's most famous achievement is precisely this very t-distribution, which, one might say, is named after him.

Gamma and beta

At this point, if you're already talking about something chi-square, the conversation starts in earnest. You're probably already talking to real statisticians, and it's probably worth bowing out already, as things like the gamma distribution might come up. This is a generalization and exponential and chi-squared distribution. Like the exponential distribution, it is used for complex latency models. For example, the gamma distribution appears when the time to the next n events is simulated. It appears in machine learning as the "adjoint prior" to a couple of other distributions.

The Beginning of Wisdom

Probability distributions are something you can't know too much about. The truly interested can refer to this super-detailed map of all probability distributions Add tags

random event is any fact that, as a result of a test, may or may not occur. random event is the test result. Trial- this is an experiment, the fulfillment of a certain set of conditions in which this or that phenomenon is observed, this or that result is fixed.

Events are indicated by capital letters of the Latin alphabet A, B, C.

A numerical measure of the degree of objectivity of the possibility of an event occurring is called the probability of a random event.

Classic definition probabilities of event A:

The probability of an event A is equal to the ratio of the number of cases favorable to the event A(m) to total number cases (n).

Statistical definition probabilities

Relative event frequency is the proportion of those actually conducted tests in which the event A appeared W=P*(A)= m/n. This is an experimental experimental characteristic, where m is the number of experiments in which event A appeared; n is the number of all experiments performed.

Probability of an event the number around which the frequency values are grouped is called this event in various series a large number tests P(A)=.

The events are called incompatible if the occurrence of one of them excludes the appearance of the other. Otherwise, the events joint.

Sum two events is an event in which at least one of these events (A or B) appears.

If A and B joint events, then their sum A + B denotes the occurrence of event A or event B, or both events together.

If A and B incompatible event, then the sum A + B means the occurrence of either event A or event B.

2. The concept of dependent and independent events. Conditional probability, law (theorem) of multiplication of probabilities. Bayes formula.

Event B is called independent from event A, if the occurrence of event A does not change the probability of occurrence of event B. The probability of occurrence of several independent events is equal to the product of the probabilities of these:

P(AB) = P(A)*P(B)

For dependent events:

P(AB) = P(A)*P(B/A).

The probability of the product of two events is equal to the product of the probability of one of them by conditional probability another, found under the assumption that the first event occurred.

Conditional Probability event B is the probability of event B, found under the condition that event A occurred. Designated P(B/A)

Work two events is an event consisting in the joint occurrence of these events (A and B)

Bayes formula is used to re-evaluate random events

P(H/A) = (P(H)*P(A/H))/P(A)

P(H) - a priori probability of the event H

P(H/A) is the posterior probability of the hypothesis H, provided that the event A has already happened

P(A/H) – expert judgment

P(A) - full probability of event A

3. Distribution of discrete and continuous random variables and their characteristics: mathematical expectation, variance, standard deviation. Normal law of distribution of continuous random variables.

Random value- this is the value that, as a result of the test, depending on the case, takes one of the possible set of its values.

Discrete random value – it is a random variable when it takes on a separate, isolated, countable set of values.

Continuous random variable is a random variable that takes any value from a certain interval. The concept of a continuous random variable arises during measurements.

For a discrete random variable, the distribution law can be given in the form tables, analytically (as a formula), and graphically.

Table– this is the simplest form of setting the distribution law

Requirements:

for discrete random variables

Analytical:

1)F(x)=P(X

Distribution function = cumulative distribution function. For discrete and continuous random variables.

2)f(x) = F'(x)

Probability density = differential distribution function for a continuous random variable only.

Graphic:

S-va: 1) 0≤F(x)≤1

2) non-decreasing for discrete random variables

S-va: 1) f(x)≥0 P(x)=

2) area S=1

for continuous random variables

Characteristics:

1. mathematical expectation - the average most probable event

For discrete random variables.

For continuous random variables.

2) Dispersion - scattering around the mathematical expectation

For discrete random variables:

D(x)=x i -M(x)) 2 *p i

For continuous random variables:

D(x)=x-M(x)) 2 *f(x)dx

3)Standard deviation:

σ(x)=√(D(x))

σ - standard deviation or standard

x is the arithmetic value of the square root of its variance

Normal distribution law (NZR) - Gaussian law

IRR is the probability decay of a continuous random variable, which is described by a differential function

Section 6. Typical distribution laws and numerical characteristics of random variables

The form of the functions F(x), p(x), or the enumeration p(x i) is called the distribution law of the random variable. While one can imagine an infinite variety of random variables, there are far fewer laws of distribution. First, different random variables can have exactly the same distribution laws. For example: let y take only 2 values 1 and -1 with probabilities 0.5; the value z = -y has exactly the same distribution law.
Secondly, very often random variables have similar distribution laws, i.e., for example, p(x) for them is expressed by formulas of the same form, differing only in one or more constants. These constants are called distribution parameters.

Although in principle a wide variety of laws of distribution are possible, a few of the most typical laws will be considered here. It is important to pay attention to the conditions under which they arise, the parameters and properties of these distributions.

one . Uniform distribution
This is the name of the distribution of a random variable that can take any values in the interval (a,b), and the probability of falling into any segment inside (a,b) is proportional to the length of the segment and does not depend on its position, and the probability of values outside (a,b ) is equal to 0.

Fig 6.1 Function and density of uniform distribution

Distribution parameters: a , b

2. Normal distribution
Distribution with density described by the formula

(6.1)

called normal.
Distribution parameters: a , σ

Figure 6.2 Typical view of density and normal distribution function

3 . Bernoulli distribution
If a series of independent trials is made, in each of which event A can appear with the same probability p, then the number of occurrences of the event is a random variable distributed according to the Bernoulli law, or according to the binomial law (another distribution name).

Here n is the number of trials in the series, m is a random variable (the number of occurrences of event A), P n (m) is the probability that A will happen exactly m times, q \u003d 1 - p (the probability that A will not appear in the test ).

Example 1: A die is rolled 5 times, what is the probability that a 6 will be rolled twice?
n=5, m=2, p=1/6, q=5/6

Distribution parameters: n, p

four . Poisson distribution
The Poisson distribution is obtained as the limiting case of the Bernoulli distribution if p tends to zero and n tends to infinity, but in such a way that their product remains constant: np = a. Formally, such a passage to the limit leads to the formula

Distribution parameter: a

The Poisson distribution is subject to many random variables encountered in science and practical life.

Example 2: Number of calls received at the ambulance station in an hour.
Let us divide the time interval T (1 hour) into small intervals dt, such that the probability of receiving two or more calls during dt is negligible, and the probability of one call p is proportional to dt: p = μdt ;
we will consider the observation during the moments dt as independent trials, the number of such trials during the time T: n = T / dt;
if we assume that the probabilities of receiving calls do not change during an hour, then the total number of calls obeys the Bernoulli law with parameters: n = T / dt, p = μdt. Letting dt tend to zero, we get that n tends to infinity, and the product n × p remains constant: a = n × p = μT.

Example 3: number of ideal gas molecules in some fixed volume V.
Let us divide the volume V into small volumes dV such that the probability of finding two or more molecules in dV is negligible, and the probability of finding one molecule is proportional to dV: р = μdV; we will consider the observation of each volume dV as an independent test, the number of such tests is n=V/dV; if we assume that the probabilities of finding a molecule anywhere inside V are the same, the total number of molecules in volume V obeys Bernoulli's law with parameters: n = V / dV, p = μdV. Letting dV tend to zero, we get that n tends to infinity, and the product n × p remains constant: a = n × p = μV.

Numerical characteristics of random variables

one . Mathematical expectation (average value)

Definition:
The mathematical expectation is
(6.4)

The sum is taken over all the values that the random variable takes. The series must be absolutely convergent (otherwise, the random variable is said to have no mathematical expectation)

; (6.5)

The integral must be absolutely convergent (otherwise the random variable is said to have no expected value)

Properties of mathematical expectation:

a. If C is a constant value, then MC = C
b. Mx = Smx
c. The mathematical expectation of the sum of random variables is always equal to the sum of their mathematical expectations: М(х+y) = Мх + Мy d . The concept of conditional mathematical expectation is introduced. If a random variable takes its values x i with different probabilities p(x i /H j) under different conditions H j , then the conditional expectation is determined by

how or ; (6.6)

If the probabilities of events H j are known, the complete

expected value: ; (6.7)

Example 4: How many times, on average, do you need to toss a coin before the first coat of arms appears? This problem can be solved "on the forehead"

x i	1 2 3 ... k..
p(x i) :		,

but this amount still needs to be calculated. You can do it easier, using the concepts of conditional and full mathematical expectation. Consider the hypotheses H 1 - the coat of arms fell out for the first time, H 2 - it did not fall out the first time. Obviously, p (H 1) \u003d p (H 2) \u003d ½; Mx / H 1 \u003d 1;
Mx / H 2 is 1 more than the desired full expectation, because after the first toss of the coin, the situation has not changed, but once it has already been tossed. Using the formula of the full mathematical expectation, we have Mx \u003d Mx / H 1 × p (H 1) + Mx / H 2 × p (H 2) \u003d 1 × 0.5 + (Mx + 1) × 0.5, solving the equation for Mx, we immediately obtain Mx = 2.

e. If f(x) is a function of a random variable x, then the concept of the mathematical expectation of a function of a random variable is defined:

For a discrete random variable: ; (6.8)

The sum is taken over all the values that the random variable takes. The series must be absolutely convergent.

For a continuous random variable: ; (6.9)

The integral must be absolutely convergent.

2. Variance of a random variable
Definition:
The dispersion of a random variable x is the mathematical expectation of the squared deviation of the value of the quantity from its mathematical expectation: Dx = M(x-Mx) 2

For a discrete random variable: ; (6.10)

The sum is taken over all the values that the random variable takes. The series must be convergent (otherwise the random variable is said to have no variance)

For a continuous random variable: ; (6.11)

The integral must converge (otherwise the random variable is said to have no variance)

Dispersion properties:
a. If C is a constant value, then DC = 0
b. DСх = С 2 Dх
c. The variance of the sum of random variables is always equal to the sum of their variances only if these variables are independent (definition of independent variables)
d. To calculate the variance, it is convenient to use the formula:

Dx = Mx 2 - (Mx) 2 (6.12)

Relationship of numerical characteristics
and parameters of typical distributions

distribution	options	formula	Mx	Dx
uniform	a , b		(b+a) / 2	(b-a) 2 / 12
normal	a , σ		a	σ2
Bernoulli	n,p		np	npq
Poisson	a		a	a

Portal for the student. Self-training

Bernoulli and uniform

Binomial and hypergeometric

poisson

Geometric and negative binomial

Exponential and Weibull

Normal, lognormal, Student's and chi-square

Gamma and beta

The Beginning of Wisdom

Bernoulli and uniform

Binomial and hypergeometric

poisson

Geometric and negative binomial

Exponential and Weibull

Normal, lognormal, Student's and chi-square

Gamma and beta

The Beginning of Wisdom

2. The concept of dependent and independent events. Conditional probability, law (theorem) of multiplication of probabilities. Bayes formula.

Section 6. Typical distribution laws and numerical characteristics of random variables

Numerical characteristics of random variables

Relationship of numerical characteristics and parameters of typical distributions

RELATED ARTICLES

Relationship of numerical characteristics
and parameters of typical distributions