Coefficients mnk. Least Squares in Excel

The method of least squares (OLS, eng. Ordinary Least Squares, OLS) -- a mathematical method used to solve various problems, based on minimizing the sum of squared deviations of some functions from the desired variables. It can be used to "solve" overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate point values by some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

The essence of the method of least squares

Let be a set of unknown variables (parameters), be a set of functions from this set of variables. The task is to select such values of x so that the values of these functions are as close as possible to some values. In essence, we are talking about the "solution" of an overdetermined system of equations in the indicated sense of the maximum closeness of the left and right parts of the system. The essence of the LSM is to choose as a "measure of proximity" the sum of the squared deviations of the left and right parts - . Thus, the essence of the LSM can be expressed as follows:

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions of the system of equations can be found analytically or, for example, by various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of unknown variables, then the system does not have an exact solution and the least squares method allows us to find some “optimal” vector in the sense of the maximum proximity of the vectors and or the maximum proximity of the deviation vector to zero (proximity understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the least squares method can be used to "solve" the system of linear equations

where the matrix is not square, but rectangular in size (more precisely, the rank of the matrix A is greater than the number of required variables).

Such a system of equations, in the general case, has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector in order to minimize the "distance" between the vectors and. To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts of the equations of the system, that is. It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

Using the pseudo-inversion operator, the solution can be rewritten like this:

where is the pseudoinverse matrix for.

This problem can also be “solved” using the so-called weighted LSM (see below), when different equations of the system receive different weights from theoretical considerations.

Strict substantiation and determination of the limits of the meaningful applicability of the method were given by A. A. Markov and A. N. Kolmogorov.

OLS in regression analysis (data approximation)[edit | edit wiki text] Let there be values of some variable (it can be the results of observations, experiments, etc.) and corresponding variables. The task is to approximate the relationship between and by some function known up to some unknown parameters, that is, in fact, to find the best parameter values that bring the values as close as possible to the actual values. In fact, this boils down to the case of "solving" an overdetermined system of equations with respect to:

In regression analysis, and in particular in econometrics, probabilistic models of the relationship between variables are used.

where are the so-called random model errors.

Accordingly, the deviations of the observed values from the model values are already assumed in the model itself. The essence of LSM (ordinary, classical) is to find such parameters under which the sum of squared deviations (errors, for regression models they are often called regression residuals) will be minimal:

where is English. Residual Sum of Squares is defined as:

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of non-linear least squares (NLS or NLLS - Non-Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function by differentiating it with respect to unknown parameters, equating the derivatives to zero and solving the resulting system of equations:

OLS in the case of linear regression[edit | edit wiki text]

Let the regression dependence be linear:

Let y be a column vector of observations of the variable being explained, and be a matrix of observations of factors (rows of the matrix are vectors of factor values in a given observation, columns are a vector of values of a given factor in all observations). The matrix representation of the linear model has the form:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

accordingly, the sum of the squares of the regression residuals will be equal to

Differentiating this function with respect to the parameter vector and equating the derivatives to zero, we obtain a system of equations (in matrix form):

In the deciphered matrix form, this system of equations looks like this:

where all sums are taken over all admissible values.

If a constant is included in the model (as usual), then for all, therefore, in the upper left corner of the matrix of the system of equations is the number of observations, and in the remaining elements of the first row and first column - just the sum of the values of the variables: and the first element of the right side of the system -- .

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when divided by n, arithmetic means appear instead of sums). If the data is centered in the regression model, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second one is the factor covariance vector with the dependent variable. If, in addition, the data is also normalized to the standard deviation (that is, eventually standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with a dependent variable.

An important property of LLS estimates for models with a constant is that the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of a single parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion for the minimum of the sum of squared deviations from it.

The simplest special cases[edit | edit wiki text]

In the case of paired linear regression, when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

From here it is easy to find estimates for the coefficients:

Although constant models are generally preferable, in some cases it is known from theoretical considerations that the constant should be zero. For example, in physics, the relationship between voltage and current has the form; measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about the model. In this case, instead of a system of equations, we have a single equation

Therefore, the formula for estimating a single coefficient has the form

Statistical properties of OLS estimates[edit | edit wiki text]

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For non-biased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error conditional on the factors must be equal to zero. This condition, in particular, is satisfied if the mathematical expectation of random errors is equal to zero, and the factors and random errors are independent random variables.

The first condition can be considered always satisfied for models with a constant, since the constant takes on a non-zero mathematical expectation of errors (therefore, models with a constant are generally preferable). least square regression covariance

The second condition - the condition of exogenous factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, in contrast to a random error, which automatically means that the exogenous condition is satisfied. In the general case, for the consistency of estimates, it is sufficient to fulfill the exogeneity condition together with the convergence of the matrix to some non-singular matrix with an increase in the sample size to infinity.

In order for, in addition to consistency and unbiasedness, the (ordinary) least squares estimates to be also efficient (the best in the class of linear unbiased estimates), additional properties of a random error must be satisfied:

Constant (same) variance of random errors in all observations (no heteroscedasticity):

Lack of correlation (autocorrelation) of random errors in different observations among themselves

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. LLS estimates for classical linear regression are unbiased, consistent and most efficient estimates in the class of all linear unbiased estimates (in the English literature they sometimes use the abbreviation BLUE (Best Linear Unbiased Estimator) - the best linear unbiased estimate; in domestic literature, the Gauss theorem is more often given - Markov). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

Efficiency means that this covariance matrix is "minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have a minimum variance), that is, in the class of linear unbiased estimates, the OLS estimates are the best. The diagonal elements of this matrix, the variances of the estimates of the coefficients, are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proved that the unbiased and consistent (for the classical linear model) estimate of the variance of random errors is the value:

Substituting this value into the formula for the covariance matrix, we obtain an estimate of the covariance matrix. The resulting estimates are also unbiased and consistent. It is also important that the estimate of the error variance (and hence the variances of the coefficients) and the estimates of the model parameters are independent random variables, which makes it possible to obtain test statistics for testing hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not met, the least squares parameter estimates are not the most efficient estimates (remaining unbiased and consistent). However, the estimate of the covariance matrix worsens even more - it becomes biased and inconsistent. This means that statistical conclusions about the quality of the constructed model in this case can be extremely unreliable. One way to solve the last problem is to use special estimates of the covariance matrix, which are consistent under violations of the classical assumptions (standard errors in the White form and standard errors in the Newey-West form). Another approach is to use the so-called generalized least squares.

Generalized least squares[edit | edit wiki text]

Main article: Generalized least squares

The method of least squares allows for a wide generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive-definite quadratic form of the vector of residuals, where is some symmetric positive-definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, this functional can be represented as follows

that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It is proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called. generalized least squares (GLS, GLS - Generalized Least Squares) - LS-method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

The covariance matrix of these estimates, respectively, will be equal to

In fact, the essence of the OLS lies in a certain (linear) transformation (P) of the original data and the application of the usual least squares to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS[edit | edit wiki text]

In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

After alignment, we get a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data with a linear relationship y = a x + b by calculating the appropriate parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

Yandex.RTB R-A-339285-1

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we have to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating the coefficients, it is necessary to compose and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0 .

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, such as substitution or Cramer's method. As a result, we should get formulas that calculate the coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values of the variables for which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph, we will prove why it is so.

This is the application of the least squares method in practice. His formula, which is used to find the parameter a , includes ∑ i = 1 n x i , ∑ i = 1 n y i , ∑ i = 1 n x i y i , ∑ i = 1 n x i 2 , and the parameter
n - it denotes the amount of experimental data. We advise you to calculate each amount separately. The coefficient value b is calculated immediately after a .

Let's go back to the original example.

Example 1

Here we have n equal to five. To make it more convenient to calculate the required amounts included in the coefficient formulas, we fill out the table.

	i = 1	i = 2	i = 3	i = 4	i = 5	∑ i = 1 5
x i	0	1	2	4	5	12
y i	2 , 1	2 , 4	2 , 6	2 , 8	3	12 , 9
x i y i	0	2 , 4	5 , 2	11 , 2	15	33 , 8
x i 2	0	1	4	16	25	46

Decision

The fourth row contains the data obtained by multiplying the values from the second row by the values of the third for each individual i . The fifth line contains the data from the second squared. The last column shows the sums of the values of the individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, substitute the desired values from the last column and calculate the sums:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33 , 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

We got that the desired approximating straight line will look like y = 0 , 165 x + 2 , 184 . Now we need to determine which line will best approximate the data - g (x) = x + 1 3 + 1 or 0 , 165 x + 2 , 184 . Let's make an estimate using the least squares method.

To calculate the error, we need to find the sums of squared deviations of the data from the lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2 , the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0 , 165 x i + 2 , 184)) 2 ≈ 0 , 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0 , 096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0 , 165 x + 2 , 184 .

The least squares method is clearly shown in the graphic illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. Raw data are marked with pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in problems that require data smoothing, as well as in those where the data needs to be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6 . We have devoted a separate article to such examples.

Proof of the LSM method

For the function to take the minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 be positive definite. Let's show you how it should look.

Example 2

We have a second-order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2b

Decision

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a ; b) δ a δ b = δ δ F (a ; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, it can be written as follows: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b .

We have obtained a matrix of quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check if its angular minors are positive.

Calculate the first order angular minor: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second-order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After that, we proceed to the proof of the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

Let's check whether this inequality is valid for arbitrary n . Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We got the correct equality (if the values x 1 and x 2 do not match).

Let's make the assumption that this inequality will be true for n , i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
Now let's prove the validity for n + 1 , i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0 if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the rest of the terms will be greater than 0 because they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will correspond to the smallest value of the function F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means that they are the desired parameters of the least squares method (LSM).

If you notice a mistake in the text, please highlight it and press Ctrl+Enter

Least squares method (OLS, eng. Ordinary Least Squares, OLS)- a mathematical method used to solve various problems, based on minimizing the sum of squared deviations of some functions from the desired variables. It can be used to "solve" overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate the point values of some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

Encyclopedic YouTube

1 / 5

✪ Least squares method. Subject

✪ Mitin I. V. - Processing the results of physical. experiment - Least squares method (Lecture 4)

✪ Least squares, lesson 1/2. Linear function

✪ Econometrics. Lecture 5. Least squares method

✪ Least squares method. Answers

Subtitles

Story

Until the beginning of the XIX century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used, depending on the type of equations and on the ingenuity of calculators, and therefore different calculators, starting from the same observational data, came to different conclusions. Gauss (1795) is credited with the first application of the method, and Legendre (1805) independently discovered and published it under its modern name (fr. Methode des moindres quarres) . Laplace connected the method with the theory of probabilities, and the American mathematician Adrain (1808) considered its probabilistic applications. The method is widespread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the method of least squares

Let be x (\displaystyle x)- kit n (\displaystyle n) unknown variables (parameters), f i (x) (\displaystyle f_(i)(x)), , m > n (\displaystyle m>n)- set of functions from this set of variables. The problem is to choose such values x (\displaystyle x) so that the values of these functions are as close as possible to some values y i (\displaystyle y_(i)). In essence, we are talking about the “solution” of the overdetermined system of equations f i (x) = y i (\displaystyle f_(i)(x)=y_(i)), i = 1 , … , m (\displaystyle i=1,\ldots ,m) in the indicated sense, the maximum proximity of the left and right parts of the system. The essence of LSM is to choose as a "measure of proximity" the sum of the squared deviations of the left and right parts | f i (x) − y i | (\displaystyle |f_(i)(x)-y_(i)|). Thus, the essence of the LSM can be expressed as follows:

∑ i e i 2 = ∑ i (y i − f i (x)) 2 → min x (\displaystyle \sum _(i)e_(i)^(2)=\sum _(i)(y_(i)-f_( i)(x))^(2)\rightarrow \min _(x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions of the system of equations can be found analytically or, for example, by various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of unknown variables, then the system does not have an exact solution and the least squares method allows us to find some "optimal" vector x (\displaystyle x) in the sense of the maximum proximity of the vectors y (\displaystyle y) and f (x) (\displaystyle f(x)) or the maximum proximity of the deviation vector e (\displaystyle e) to zero (proximity is understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the least squares method can be used to "solve" the system of linear equations

A x = b (\displaystyle Ax=b),

where A (\displaystyle A) rectangular size matrix m × n , m > n (\displaystyle m\times n,m>n)(i.e. the number of rows of matrix A is greater than the number of required variables).

Such a system of equations generally has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector x (\displaystyle x) to minimize the "distance" between vectors A x (\displaystyle Ax) and b (\displaystyle b). To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts of the equations of the system, that is (A x − b) T (A x − b) → min (\displaystyle (Ax-b)^(T)(Ax-b)\rightarrow \min ). It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

A T A x = A T b ⇒ x = (A T A) − 1 A T b (\displaystyle A^(T)Ax=A^(T)b\Rightarrow x=(A^(T)A)^(-1)A^ (T)b).

OLS in regression analysis (data approximation)

Let there be n (\displaystyle n) values of some variable y (\displaystyle y)(this may be the results of observations, experiments, etc.) and the corresponding variables x (\displaystyle x). The challenge is to make the relationship between y (\displaystyle y) and x (\displaystyle x) approximate by some function known up to some unknown parameters b (\displaystyle b), that is, actually find the best values of the parameters b (\displaystyle b), maximally approximating the values f (x , b) (\displaystyle f(x,b)) to actual values y (\displaystyle y). In fact, this reduces to the case of "solution" of an overdetermined system of equations with respect to b (\displaystyle b):

F (x t , b) = y t , t = 1 , … , n (\displaystyle f(x_(t),b)=y_(t),t=1,\ldots ,n).

In regression analysis, and in particular in econometrics, probabilistic models of the relationship between variables are used.

Y t = f (x t , b) + ε t (\displaystyle y_(t)=f(x_(t),b)+\varepsilon _(t)),

where ε t (\displaystyle \varepsilon _(t))- so called random errors models.

Accordingly, the deviations of the observed values y (\displaystyle y) from model f (x , b) (\displaystyle f(x,b)) already assumed in the model itself. The essence of LSM (ordinary, classical) is to find such parameters b (\displaystyle b), at which the sum of squared deviations (errors, for regression models they are often called regression residuals) e t (\displaystyle e_(t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\displaystyle (\hat (b))_(OLS)=\arg \min _(b)RSS(b)),

where R S S (\displaystyle RSS)- English. Residual Sum of Squares is defined as:

R S S (b) = e T e = ∑ t = 1 n e t 2 = ∑ t = 1 n (y t − f (x t , b)) 2 (\displaystyle RSS(b)=e^(T)e=\sum _ (t=1)^(n)e_(t)^(2)=\sum _(t=1)^(n)(y_(t)-f(x_(t),b))^(2) ).

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of nonlinear least squares(NLS or NLLS - eng. Non-Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function R S S (b) (\displaystyle RSS(b)), differentiating it with respect to unknown parameters b (\displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (y t − f (x t , b)) ∂ f (x t , b) ∂ b = 0 (\displaystyle \sum _(t=1)^(n)(y_(t)-f(x_ (t),b))(\frac (\partial f(x_(t),b))(\partial b))=0).

LSM in the case of linear regression

Let the regression dependence be linear:

y t = ∑ j = 1 k b j x t j + ε = x t T b + ε t (\displaystyle y_(t)=\sum _(j=1)^(k)b_(j)x_(tj)+\varepsilon =x_( t)^(T)b+\varepsilon _(t)).

Let be y is the column vector of observations of the variable being explained, and X (\displaystyle X)- This (n × k) (\displaystyle ((n\times k)))- matrix of observations of factors (rows of the matrix - vectors of values of factors in this observation, by columns - vector of values of this factor in all observations). The matrix representation of the linear model has the form:

y = Xb + ε (\displaystyle y=Xb+\varepsilon ).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

y ^ = X b , e = y − y ^ = y − X b (\displaystyle (\hat (y))=Xb,\quad e=y-(\hat (y))=y-Xb).

accordingly, the sum of the squares of the regression residuals will be equal to

R S S = e T e = (y − X b) T (y − X b) (\displaystyle RSS=e^(T)e=(y-Xb)^(T)(y-Xb)).

Differentiating this function with respect to the parameter vector b (\displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\displaystyle (X^(T)X)b=X^(T)y).

In the deciphered matrix form, this system of equations looks like this:

(∑ x t 1 2 ∑ x t 1 x t 2 ∑ x t 1 x t 3 … ∑ x t 1 x t k ∑ x t 2 x t 1 ∑ x t 2 2 ∑ x t 2 x t 3 … ∑ x t 2 x t k ∑ x t 3 x t 1 ∑ x t x t 3 2 … ∑ x t 3 x t k ⋮ ⋮ ⋮ ⋱ ⋮ ∑ x t k x t 1 ∑ x t k x t 2 ∑ x t k x t 3 … ∑ x t k 2) (b 1 b 2 b 3 ⋮ b k) = (∑ x t 1 y x t ∑ 3 y t ⋮ ∑ x t k y t) , (\displaystyle (\begin(pmatrix)\sum x_(t1)^(2)&\sum x_(t1)x_(t2)&\sum x_(t1)x_(t3)&\ldots &\sum x_(t1)x_(tk)\\\sum x_(t2)x_(t1)&\sum x_(t2)^(2)&\sum x_(t2)x_(t3)&\ldots &\ sum x_(t2)x_(tk)\\\sum x_(t3)x_(t1)&\sum x_(t3)x_(t2)&\sum x_(t3)^(2)&\ldots &\sum x_ (t3)x_(tk)\\\vdots &\vdots &\vdots &\ddots &\vdots \\\sum x_(tk)x_(t1)&\sum x_(tk)x_(t2)&\sum x_ (tk)x_(t3)&\ldots &\sum x_(tk)^(2)\\\end(pmatrix))(\begin(pmatrix)b_(1)\\b_(2)\\b_(3 )\\\vdots \\b_(k)\\\end(pmatrix))=(\begin(pmatrix)\sum x_(t1)y_(t)\\\sum x_(t2)y_(t)\\ \sum x_(t3)y_(t)\\\vdots \\\sum x_(tk)y_(t)\\\end(pmatrix))) where all sums are taken over all admissible values t (\displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\displaystyle x_(t1)=1) for all t (\displaystyle t), therefore, in the upper left corner of the matrix of the system of equations is the number of observations n (\displaystyle n), and in the remaining elements of the first row and first column - just the sum of the values of the variables: ∑ x t j (\displaystyle \sum x_(tj)) and the first element of the right side of the system - ∑ y t (\displaystyle \sum y_(t)).

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

b ^ O L S = (X T X) − 1 X T y = (1 n X T X) − 1 1 n X T y = V x − 1 C x y (\displaystyle (\hat (b))_(OLS)=(X^(T )X)^(-1)X^(T)y=\left((\frac (1)(n))X^(T)X\right)^(-1)(\frac (1)(n ))X^(T)y=V_(x)^(-1)C_(xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when divided by n, arithmetic means appear instead of sums). If the data in the regression model centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second one is the vector of covariances of factors with a dependent variable. If, in addition, the data is also normalized at the SKO (that is, ultimately standardized), then the first matrix has the meaning of the sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with the dependent variable.

An important property of LLS estimates for models with a constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

y ¯ = b 1 ^ + ∑ j = 2 k b ^ j x ¯ j (\displaystyle (\bar (y))=(\hat (b_(1)))+\sum _(j=2)^(k) (\hat (b))_(j)(\bar (x))_(j)).

The simplest special cases

In the case of pairwise linear regression y t = a + b x t + ε t (\displaystyle y_(t)=a+bx_(t)+\varepsilon _(t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

(1 x ¯ x ¯ x 2 ¯) (a b) = (y ¯ x y ¯) (\displaystyle (\begin(pmatrix)1&(\bar (x))\\(\bar (x))&(\bar (x^(2)))\\\end(pmatrix))(\begin(pmatrix)a\\b\\\end(pmatrix))=(\begin(pmatrix)(\bar (y))\\ (\overline(xy))\\\end(pmatrix))).

From here it is easy to find estimates for the coefficients:

( b ^ = Cov ⁡ (x , y) Var ⁡ (x) = x y ¯ − x ¯ y ¯ x 2 ¯ − x ¯ 2 , a ^ = y ¯ − b x ¯ . (\displaystyle (\begin(cases) (\hat (b))=(\frac (\mathop (\textrm (Cov)) (x,y))(\mathop (\textrm (Var)) (x)))=(\frac ((\overline (xy))-(\bar (x))(\bar (y)))((\overline (x^(2)))-(\overline (x))^(2))),\\( \hat (a))=(\bar (y))-b(\bar (x)).\end(cases)))

Despite the fact that, in general, models with a constant are preferable, in some cases it is known from theoretical considerations that the constant a (\displaystyle a) should be equal to zero. For example, in physics, the relationship between voltage and current has the form U = I ⋅ R (\displaystyle U=I\cdot R); measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about a model y = b x (\displaystyle y=bx). In this case, instead of a system of equations, we have a single equation

(∑ x t 2) b = ∑ x t y t (\displaystyle \left(\sum x_(t)^(2)\right)b=\sum x_(t)y_(t)).

Therefore, the formula for estimating a single coefficient has the form

B ^ = ∑ t = 1 n x t y t ∑ t = 1 n x t 2 = x y ¯ x 2 ¯ (\displaystyle (\hat (b))=(\frac (\sum _(t=1)^(n)x_(t )y_(t))(\sum _(t=1)^(n)x_(t)^(2)))=(\frac (\overline (xy))(\overline (x^(2)) ))).

The case of a polynomial model

If the data is fitted by a polynomial regression function of one variable f (x) = b 0 + ∑ i = 1 k b i x i (\displaystyle f(x)=b_(0)+\sum \limits _(i=1)^(k)b_(i)x^(i)), then, perceiving degrees x i (\displaystyle x^(i)) as independent factors for each i (\displaystyle i) it is possible to estimate the parameters of the model based on the general formula for estimating the parameters of the linear model. To do this, it suffices to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\displaystyle x_(ti)x_(tj)=x_(t)^(i)x_(t)^(j)=x_(t)^(i+j)) and x t j y t = x t j y t (\displaystyle x_(tj)y_(t)=x_(t)^(j)y_(t)). Therefore, the matrix equations in this case will take the form:

(n Σ n x t ... Σ n x t k Σ n x t Σ n x i 2 ... Σ m x i k + 1 ⋮ ⋮ ⋱ ⋮ Σ n x t k Σ n x t + 1 ... Σ n x t 2 k) [b 0 b 1 ⋮ b k] = [Σ n y t Σ n x t y t ⋮ Σ n x T k y t ] . (\displaystyle (\begin(pmatrix)n&\sum \limits _(n)x_(t)&\ldots &\sum \limits _(n)x_(t)^(k)\\\sum \limits _( n)x_(t)&\sum \limits _(n)x_(i)^(2)&\ldots &\sum \limits _(m)x_(i)^(k+1)\\\vdots & \vdots &\ddots &\vdots \\\sum \limits _(n)x_(t)^(k)&\sum \limits _(n)x_(t)^(k+1)&\ldots &\ sum \limits _(n)x_(t)^(2k)\end(pmatrix))(\begin(bmatrix)b_(0)\\b_(1)\\\vdots \\b_(k)\end( bmatrix))=(\begin(bmatrix)\sum \limits _(n)y_(t)\\\sum \limits _(n)x_(t)y_(t)\\\vdots \\\sum \limits _(n)x_(t)^(k)y_(t)\end(bmatrix)).)

Statistical Properties of OLS Estimates

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For the unbiasedness of the least squares estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error conditional on the factors must be equal to zero. This condition is satisfied, in particular, if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random values.

The second condition - the condition of exogenous factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, in contrast to a random error, which automatically means that the exogenous condition is satisfied. In the general case, for the consistency of estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\displaystyle V_(x)) to some nondegenerate matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, the (ordinary) least squares estimates to be also effective (the best in the class of linear unbiased estimates), additional properties of a random error must be satisfied:

These assumptions can be formulated for the covariance matrix of the vector of random errors V (ε) = σ 2 I (\displaystyle V(\varepsilon)=\sigma ^(2)I).

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and most efficient estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used blue (Best Linear Unbiased Estimator) is the best linear unbiased estimate; in domestic literature, the Gauss - Markov theorem is more often cited). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

V (b ^ O L S) = σ 2 (X T X) − 1 (\displaystyle V((\hat (b))_(OLS))=\sigma ^(2)(X^(T)X)^(-1 )).

Efficiency means that this covariance matrix is "minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have a minimum variance), that is, in the class of linear unbiased estimates, the OLS estimates are the best. The diagonal elements of this matrix - the variances of the estimates of the coefficients - are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proved that the unbiased and consistent (for the classical linear model) estimate of the variance of random errors is the value:

S 2 = R S S / (n − k) (\displaystyle s^(2)=RSS/(n-k)).

It should be noted that if the classical assumptions are not met, the least squares parameter estimates are not the most efficient and, where W (\displaystyle W) is some symmetric positive definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is a decomposition W = P T P (\displaystyle W=P^(T)P). Therefore, this functional can be represented as follows e T P T P e = (P e) T P e = e ∗ T e ∗ (\displaystyle e^(T)P^(T)Pe=(Pe)^(T)Pe=e_(*)^(T)e_( *)), that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It is proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called. generalized OLS (OMNK, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε − 1 (\displaystyle W=V_(\varepsilon )^(-1)).

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

B ^ G L S = (X T V − 1 X) − 1 X T V − 1 y (\displaystyle (\hat (b))_(GLS)=(X^(T)V^(-1)X)^(-1) X^(T)V^(-1)y).

The covariance matrix of these estimates, respectively, will be equal to

V (b ^ G L S) = (X T V − 1 X) − 1 (\displaystyle V((\hat (b))_(GLS))=(X^(T)V^(-1)X)^(- one)).

Weighted least squares

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the residuals of the model is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: e T W e = ∑ t = 1 n e t 2 σ t 2 (\displaystyle e^(T)We=\sum _(t=1)^(n)(\frac (e_(t)^(2))(\ sigma _(t)^(2)))). In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

ISBN 978-5-7749-0473-0.

Econometrics. Textbook / Ed. Eliseeva I. I. - 2nd ed. - M. : Finance and statistics, 2006. - 576 p. - ISBN 5-279-02786-3.

Alexandrova N.V. History of mathematical terms, concepts, designations: a dictionary-reference book. - 3rd ed. - M. : LKI, 2008. - 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24p.

tutorial

Introduction

I am a computer programmer. I made the biggest leap in my career when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what it, the luminary, is talking to me about. And it's very difficult. Yes, it's hard and embarrassing to admit you don't know. Who likes to admit that he does not know the basics of something-there. By virtue of my profession, I have to attend a large number of presentations and lectures, where, I confess, in the vast majority of cases I feel sleepy, because I do not understand anything. And I don’t understand because the huge problem of the current situation in science lies in mathematics. It assumes that all students are familiar with absolutely all areas of mathematics (which is absurd). To admit that you do not know what a derivative is (that this is a little later) is a shame.

But I've learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I do not know why quadratic equations are needed in life. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, no reputation, no authority. Yes, it is prestigious to speak in the most abstract language possible, which is complete nonsense in itself.

Do you know what a derivative is? Most likely you will tell me about the limit of the difference relation. In the first year of mathematics at St. Petersburg State University, Viktor Petrovich Khavin me defined derivative as the coefficient of the first term of the Taylor series of the function at the point (it was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time, until I finally understood what it was about. The derivative is nothing more than just a measure of how much the function we are differentiating is similar to the function y=x, y=x^2, y=x^3.

I now have the honor of lecturing students who fear mathematics. If you are afraid of mathematics - we are on the way. As soon as you try to read some text and it seems to you that it is overly complicated, then know that it is badly written. I argue that there is not a single area of mathematics that cannot be spoken about "on fingers" without losing accuracy.

The challenge for the near future: I instructed my students to understand what a linear-quadratic controller is. Don't be shy, waste three minutes of your life, follow the link. If you do not understand anything, then we are on the way. I (a professional mathematician-programmer) also did not understand anything. And I assure you, this can be sorted out "on the fingers." At the moment I do not know what it is, but I assure you that we will be able to figure it out.

So, the first lecture that I am going to give to my students after they come running to me in horror with the words that a linear-quadratic controller is a terrible bug that you will never master in your life is least squares methods. Can you solve linear equations? If you are reading this text, then most likely not.

So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the task is to find the equation of a straight line passing through these two points:

illustration

This straight line should have an equation like the following:

Here alpha and beta are unknown to us, but two points of this line are known:

You can write this equation in matrix form:

Here we should make a lyrical digression: what is a matrix? A matrix is nothing but a two-dimensional array. This is a way of storing data, no more values should be given to it. It is up to us how exactly to interpret a certain matrix. Periodically, I will interpret it as a linear mapping, periodically as a quadratic form, and sometimes simply as a set of vectors. This will all be clarified in context.

Let's replace specific matrices with their symbolic representation:

Then (alpha, beta) can be easily found:

More specifically for our previous data:

Which leads to the following equation of a straight line passing through the points (1,1) and (3,2):

Okay, everything is clear here. And let's find the equation of a straight line passing through three points: (x0,y0), (x1,y1) and (x2,y2):

Oh-oh-oh, but we have three equations for two unknowns! The standard mathematician will say that there is no solution. What will the programmer say? And he will first rewrite the previous system of equations in the following form:

In our case, the vectors i, j, b are three-dimensional, therefore, (in the general case) there is no solution to this system. Any vector (alpha\*i + beta\*j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then there is no solution (equality in the equation cannot be achieved). What to do? Let's look for a compromise. Let's denote by e(alpha, beta) how exactly we did not achieve equality:

And we will try to minimize this error:

Why a square?

We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha,beta)), while just the length gives a function in the form of a cone, non-differentiable at the minimum point. Brr. Square is more convenient.

Obviously, the error is minimized when the vector e orthogonal to the plane spanned by the vectors i and j.

Illustration

In other words: we are looking for a line such that the sum of the squared lengths of the distances from all points to this line is minimal:

UPDATE: here I have a jamb, the distance to the line should be measured vertically, not orthographic projection. This commenter is correct.

Illustration

In completely different words (carefully, poorly formalized, but it should be clear on the fingers): we take all possible lines between all pairs of points and look for the average line between all:

Illustration

Another explanation on the fingers: we attach a spring between all data points (here we have three) and the line that we are looking for, and the line of the equilibrium state is exactly what we are looking for.

Quadratic form minimum

So, given the vector b and the plane spanned by the columns-vectors of the matrix A(in this case (x0,x1,x2) and (1,1,1)), we are looking for a vector e with a minimum square of length. Obviously, the minimum is achievable only for the vector e, orthogonal to the plane spanned by the columns-vectors of the matrix A:

In other words, we are looking for a vector x=(alpha, beta) such that:

I remind you that this vector x=(alpha, beta) is the minimum of the quadratic function ||e(alpha, beta)||^2:

Here it is useful to remember that the matrix can be interpreted as well as the quadratic form, for example, the identity matrix ((1,0),(0,1)) can be interpreted as a function of x^2 + y^2:

quadratic form

All this gymnastics is known as linear regression.

Laplace equation with Dirichlet boundary condition

Now the simplest real problem: there is a certain triangulated surface, it is necessary to smooth it. For example, let's load my face model:

The original commit is available. To minimize external dependencies, I took the code of my software renderer, already on Habré. To solve the linear system, I use OpenNL , it's a great solver, but it's very difficult to install: you need to copy two files (.h + .c) to your project folder. All smoothing is done by the following code:

For (int d=0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i&face = faces[i]; for (int j=0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with the same number of variables as the number of vertices in my model. The first n rows of matrix A have only one 1 per row, and the first n rows of vector b have original model coordinates. That is, I spring-tie between the new vertex position and the old vertex position - the new ones shouldn't be too far away from the old ones.

All subsequent rows of matrix A (faces.size()*3 = the number of edges of all triangles in the grid) have one occurrence of 1 and one occurrence of -1, while the vector b has zero components opposite. This means I put a spring on each edge of our triangular mesh: all edges try to get the same vertex as their starting and ending points.

Once again: all vertices are variables, and they cannot deviate far from their original position, but at the same time they try to become similar to each other.

Here is the result:

Everything would be fine, the model is really smoothed, but it moved away from its original edge. Let's change the code a little:

For (int i=0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

In our matrix A, for the vertices that are on the edge, I add not a row from the category v_i = verts[i][d], but 1000*v_i = 1000*verts[i][d]. What does it change? And this changes our quadratic form of the error. Now a single deviation from the top at the edge will cost not one unit, as before, but 1000 * 1000 units. That is, we hung a stronger spring on the extreme vertices, the solution prefers to stretch others more strongly. Here is the result:

Let's double the strength of the springs between the vertices:
nlCoefficient(face[ j ], 2); nlCoefficient(face[(j+1)%3], -2);

It is logical that the surface has become smoother:

And now even a hundred times stronger:

What is it? Imagine that we have dipped a wire ring in soapy water. As a result, the resulting soap film will try to have the least curvature as possible, touching the same border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface inside. Congratulations, we have just solved the Laplace equation with Dirichlet boundary conditions. Sounds cool? But in fact, just one system of linear equations to solve.

Poisson equation

Let's have another cool name.

Let's say I have an image like this:

Everyone is good, but I don't like the chair.

I cut the picture in half:

And I will select a chair with my hands:

Then I will drag everything that is white in the mask to the left side of the picture, and at the same time I will say throughout the whole picture that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right image:

For (int i=0; i

Here is the result:

Code and pictures are available

Least square method

Least square method ( MNK, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution consists of or satisfies a certain criterion for minimizing the sum of squares of some functions of the unknown variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or restrictions, the number of which exceeds the number of these quantities, etc.

The essence of the MNC

Let some (parametric) model of probabilistic (regression) dependence between the (explained) variable y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- Random model error.

Let there also be sample observations of the values of the indicated variables. Let be the observation number (). Then are the values of the variables in the -th observation. Then, for given values of the parameters b, it is possible to calculate the theoretical (model) values of the explained variable y:

The value of the residuals depends on the values of the parameters b.

The essence of LSM (ordinary, classical) is to find such parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of nonlinear least squares(NLS or NLLS - English. Non Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero, and solving the resulting system of equations:

If the random errors of the model are normally distributed, have the same variance, and are not correlated with each other, the least squares parameter estimates are the same as the maximum likelihood method (MLM) estimates.

LSM in the case of a linear model

Let the regression dependence be linear:

Let be y- column vector of observations of the explained variable, and - matrix of factor observations (rows of the matrix - vectors of factor values in a given observation, by columns - vector of values of a given factor in all observations). The matrix representation of the linear model has the form:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

accordingly, the sum of the squares of the regression residuals will be equal to

Differentiating this function with respect to the parameter vector and equating the derivatives to zero, we obtain a system of equations (in matrix form):

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

For analytical purposes, the last representation of this formula turns out to be useful. If the data in the regression model centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second one is the vector of covariances of factors with a dependent variable. If, in addition, the data is also normalized at the SKO (that is, ultimately standardized), then the first matrix has the meaning of the sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with the dependent variable.

Example: simple (pairwise) regression

In the case of paired linear regression, the calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimates

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: conditional on the factors, the mathematical expectation of a random error must be equal to zero. This condition is satisfied, in particular, if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random variables.

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and most efficient estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used blue (Best Linear Unbaised Estimator) is the best linear unbiased estimate; in domestic literature, the Gauss-Markov theorem is more often cited). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

Generalized least squares

The method of least squares allows for a wide generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the residual vector , where is some symmetric positive definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It is proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called. generalized OLS (OMNK, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

The covariance matrix of these estimates, respectively, will be equal to

Weighted least squares

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the residuals of the model is minimized, that is, each observation receives a "weight" that is inversely proportional to the variance of the random error in this observation: . In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

Some special cases of application of LSM in practice

Linear Approximation

Consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This can be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), these quantities were measured, as a result of which the values \u200b\u200band were obtained their corresponding values. Measurement data should be recorded in a table.

Table. Measurement results.

Measurement No.
1
2
3
4
5
6

The question sounds like this: what value of the coefficient can be chosen to best describe the dependence ? According to the least squares, this value should be such that the sum of the squared deviations of the values from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let's find the value of the coefficient from this formula. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient , which was required in the problem.

Story

Until the beginning of the XIX century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used, depending on the type of equations and on the ingenuity of calculators, and therefore different calculators, starting from the same observational data, came to different conclusions. Gauss (1795) is credited with the first application of the method, and Legendre (1805) independently discovered and published it under its modern name (fr. Methode des moindres quarres ) . Laplace related the method to the theory of probability, and the American mathematician Adrain (1808) considered its probabilistic applications. The method is widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative use of MNCs

The idea of the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (the Euclidean metric in finite-dimensional spaces).

One application is "solving" systems of linear equations in which the number of equations is greater than the number of variables

where the matrix is not square, but rectangular.

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be "solved" only in the sense of choosing such a vector in order to minimize the "distance" between the vectors and . To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts of the equations of the system, that is, . It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

Portal for the student. Self-training

What exactly is OLS (least squares method)

How to derive formulas for calculating coefficients

Proof of the LSM method

Encyclopedic YouTube

Subtitles

Story

The essence of the method of least squares

Example - system of linear equations

OLS in regression analysis (data approximation)

LSM in the case of linear regression

The simplest special cases

The case of a polynomial model

Statistical Properties of OLS Estimates

Weighted least squares

Introduction

Quadratic form minimum

Laplace equation with Dirichlet boundary condition

Poisson equation

The essence of the MNC

LSM in the case of a linear model

Example: simple (pairwise) regression

Properties of OLS estimates

Generalized least squares

Weighted least squares

Some special cases of application of LSM in practice

Linear Approximation

Story

Alternative use of MNCs

RELATED ARTICLES