The least squares method is used for. Linear pairwise regression analysis

We approximate the function by a polynomial of the 2nd degree. To do this, we calculate the coefficients of the normal system of equations:

, ,

Let us compose a normal system of least squares, which has the form:

The solution of the system is easy to find:, , .

Thus, the polynomial of the 2nd degree is found: .

Theoretical reference

Back to page<Введение в вычислительную математику. Примеры>

Example 2. Finding the optimal degree of a polynomial.

Back to page<Введение в вычислительную математику. Примеры>

Example 3. Derivation of a normal system of equations for finding the parameters of an empirical dependence.

Let us derive a system of equations for determining the coefficients and functions , which performs the root-mean-square approximation of the given function with respect to points. Compose a function and write the necessary extremum condition for it:

Then the normal system will take the form:

We have obtained a linear system of equations for unknown parameters and, which is easily solved.

Theoretical reference

Back to page<Введение в вычислительную математику. Примеры>

Example.

Experimental data on the values ​​of variables X and at are given in the table.

As a result of their alignment, the function

Using least square method, approximate these data with a linear dependence y=ax+b(find options a and b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the method of least squares (LSM).

The problem is to find the linear dependence coefficients for which the function of two variables a and btakes the smallest value. That is, given the data a and b the sum of the squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

Derivation of formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of functions by variables a and b, we equate these derivatives to zero.

We solve the resulting system of equations by any method (for example substitution method or Cramer's method) and obtain formulas for finding coefficients using the least squares method (LSM).

With data a and b function takes the smallest value. The proof of this fact is given below in the text at the end of the page.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and the parameter n is the amount of experimental data. The values ​​of these sums are recommended to be calculated separately.

Coefficient b found after calculation a.

It's time to remember the original example.

Decision.

In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

The values ​​of the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients a and b. We substitute in them the corresponding values ​​from the last column of the table:

Hence, y=0.165x+2.184 is the desired approximating straight line.

It remains to find out which of the lines y=0.165x+2.184 or better approximates the original data, i.e. to make an estimate using the least squares method.

Estimation of the error of the method of least squares.

To do this, you need to calculate the sums of squared deviations of the original data from these lines and , a smaller value corresponds to a line that better approximates the original data in terms of the least squares method.

Since , then the line y=0.165x+2.184 approximates the original data better.

Graphic illustration of the least squares method (LSM).

Everything looks great on the charts. The red line is the found line y=0.165x+2.184, the blue line is , the pink dots are the original data.

What is it for, what are all these approximations for?

I personally use to solve data smoothing problems, interpolation and extrapolation problems (in the original example, you could be asked to find the value of the observed value y at x=3 or when x=6 according to the MNC method). But we will talk more about this later in another section of the site.

Top of page

Proof.

So that when found a and b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positive definite. Let's show it.

The second order differential has the form:

I.e

Therefore, the matrix of the quadratic form has the form

and the values ​​of the elements do not depend on a and b.

Let us show that the matrix is ​​positive definite. This requires that the angle minors be positive.

Angular minor of the first order . The inequality is strict, since the points do not coincide. This will be implied in what follows.

Angular minor of the second order

Let's prove that method of mathematical induction.

Conclusion: found values a and b correspond to the smallest value of the function , therefore, are the desired parameters for the least squares method.

Ever understand?
Order a Solution

Top of page

Development of a forecast using the least squares method. Problem solution example

Extrapolation - this is a method of scientific research, which is based on the dissemination of past and present trends, patterns, relationships to the future development of the object of forecasting. Extrapolation methods include moving average method, exponential smoothing method, least squares method.

Essence least squares method consists in minimizing the sum of square deviations between the observed and calculated values. The calculated values ​​are found according to the selected equation - the regression equation. The smaller the distance between the actual values ​​and the calculated ones, the more accurate the forecast based on the regression equation.

The theoretical analysis of the essence of the phenomenon under study, the change in which is displayed by a time series, serves as the basis for choosing a curve. Considerations about the nature of the growth of the levels of the series are sometimes taken into account. So, if the growth of output is expected in an arithmetic progression, then smoothing is performed in a straight line. If it turns out that the growth is exponential, then smoothing should be done according to the exponential function.

The working formula of the method of least squares : Y t+1 = a*X + b, where t + 1 is the forecast period; Уt+1 – predicted indicator; a and b are coefficients; X is a symbol of time.

Coefficients a and b are calculated according to the following formulas:

where, Uf - the actual values ​​of the series of dynamics; n is the number of levels in the time series;

The smoothing of time series by the least squares method serves to reflect the patterns of development of the phenomenon under study. In the analytic expression of a trend, time is considered as an independent variable, and the levels of the series act as a function of this independent variable.

The development of a phenomenon does not depend on how many years have passed since the starting point, but on what factors influenced its development, in what direction and with what intensity. From this it is clear that the development of a phenomenon in time appears as a result of the action of these factors.

Correctly setting the type of curve, the type of analytical dependence on time is one of the most difficult tasks of pre-predictive analysis. .

The choice of the type of function that describes the trend, the parameters of which are determined by the least squares method, is in most cases empirical, by constructing a number of functions and comparing them with each other in terms of the value of the root-mean-square error, calculated by the formula:

where Uf - the actual values ​​of the series of dynamics; Ur – calculated (smoothed) values ​​of the time series; n is the number of levels in the time series; p is the number of parameters defined in the formulas describing the trend (development trend).

Disadvantages of the least squares method :

  • when trying to describe the economic phenomenon under study using a mathematical equation, the forecast will be accurate for a short period of time and the regression equation should be recalculated as new information becomes available;
  • the complexity of the selection of the regression equation, which is solvable using standard computer programs.

An example of using the least squares method to develop a forecast

Task . There are data characterizing the level of unemployment in the region, %

  • Build a forecast of the unemployment rate in the region for the months of November, December, January, using the methods: moving average, exponential smoothing, least squares.
  • Calculate the errors in the resulting forecasts using each method.
  • Compare the results obtained, draw conclusions.

Least squares solution

For the solution, we will compile a table in which we will make the necessary calculations:

ε = 28.63/10 = 2.86% forecast accuracy high.

Conclusion : Comparing the results obtained in the calculations moving average method , exponential smoothing and the least squares method, we can say that the average relative error in calculations by the exponential smoothing method falls within 20-50%. This means that the prediction accuracy in this case is only satisfactory.

In the first and third cases, the forecast accuracy is high, since the average relative error is less than 10%. But the moving average method made it possible to obtain more reliable results (forecast for November - 1.52%, forecast for December - 1.53%, forecast for January - 1.49%), since the average relative error when using this method is the smallest - 1 ,thirteen%.

Least square method

Other related articles:

List of sources used

  1. Scientific and methodological recommendations on the issues of diagnosing social risks and forecasting challenges, threats and social consequences. Russian State Social University. Moscow. 2010;
  2. Vladimirova L.P. Forecasting and planning in market conditions: Proc. allowance. M .: Publishing House "Dashkov and Co", 2001;
  3. Novikova N.V., Pozdeeva O.G. Forecasting the National Economy: Educational and Methodological Guide. Yekaterinburg: Publishing House Ural. state economy university, 2007;
  4. Slutskin L.N. MBA course in business forecasting. Moscow: Alpina Business Books, 2006.

MNE Program

Enter data

Data and Approximation y = a + b x

i- number of the experimental point;
x i- the value of the fixed parameter at the point i;
y i- the value of the measured parameter at the point i;
ω i- measurement weight at point i;
y i, calc.- the difference between the measured value and the value calculated from the regression y at the point i;
S x i (x i)- error estimate x i when measuring y at the point i.

Data and Approximation y = k x

i x i y i ω i y i, calc. Δy i S x i (x i)

Click on the chart

User manual for the MNC online program.

In the data field, enter on each separate line the values ​​of `x` and `y` at one experimental point. Values ​​must be separated by whitespace (space or tab).

The third value can be the point weight of `w`. If the point weight is not specified, then it is equal to one. In the overwhelming majority of cases, the weights of the experimental points are unknown or not calculated; all experimental data are considered equivalent. Sometimes the weights in the studied range of values ​​are definitely not equivalent and can even be calculated theoretically. For example, in spectrophotometry, weights can be calculated using simple formulas, although basically everyone neglects this to reduce labor costs.

Data can be pasted through the clipboard from an office suite spreadsheet, such as Excel from Microsoft Office or Calc from Open Office. To do this, in the spreadsheet, select the range of data to copy, copy to the clipboard, and paste the data into the data field on this page.

To calculate by the least squares method, at least two points are required to determine two coefficients `b` - the tangent of the angle of inclination of the straight line and `a` - the value cut off by the straight line on the `y` axis.

To estimate the error of the calculated regression coefficients, it is necessary to set the number of experimental points to more than two.

Least squares method (LSM).

The greater the number of experimental points, the more accurate the statistical estimate of the coefficients (due to the decrease in the Student's coefficient) and the closer the estimate to the estimate of the general sample.

Obtaining values ​​at each experimental point is often associated with significant labor costs, therefore, a compromise number of experiments is often carried out, which gives a digestible estimate and does not lead to excessive labor costs. As a rule, the number of experimental points for a linear least squares dependence with two coefficients is chosen in the region of 5-7 points.

A Brief Theory of Least Squares for Linear Dependence

Suppose we have a set of experimental data in the form of pairs of values ​​[`y_i`, `x_i`], where `i` is the number of one experimental measurement from 1 to `n`; `y_i` - the value of the measured value at the point `i`; `x_i` - the value of the parameter we set at the point `i`.

An example is the operation of Ohm's law. By changing the voltage (potential difference) between sections of the electrical circuit, we measure the amount of current passing through this section. Physics gives us the dependence found experimentally:

`I=U/R`,
where `I` - current strength; `R` - resistance; `U` - voltage.

In this case, `y_i` is the measured current value, and `x_i` is the voltage value.

As another example, consider the absorption of light by a solution of a substance in solution. Chemistry gives us the formula:

`A = εl C`,
where `A` is the optical density of the solution; `ε` - solute transmittance; `l` - path length when light passes through a cuvette with a solution; `C` is the concentration of the solute.

In this case, `y_i` is the measured optical density `A`, and `x_i` is the concentration of the substance that we set.

We will consider the case when the relative error in setting `x_i` is much less than the relative error in measuring `y_i`. We will also assume that all measured values ​​of `y_i` are random and normally distributed, i.e. obey the normal distribution law.

In the case of a linear dependence of `y` on `x`, we can write the theoretical dependence:
`y = a + bx`.

From a geometric point of view, the coefficient `b` denotes the tangent of the angle of inclination of the line to the `x` axis, and the coefficient `a` - the value of `y` at the point of intersection of the line with the `y` axis (for `x = 0`).

Finding the parameters of the regression line.

In an experiment, the measured values ​​of `y_i` cannot lie exactly on the theoretical line due to measurement errors, which are always inherent in real life. Therefore, a linear equation must be represented by a system of equations:
`y_i = a + b x_i + ε_i` (1),
where `ε_i` is the unknown measurement error of `y` in the `i`th experiment.

Dependence (1) is also called regression, i.e. the dependence of the two quantities on each other with statistical significance.

The task of restoring the dependence is to find the coefficients `a` and `b` from the experimental points [`y_i`, `x_i`].

To find the coefficients `a` and `b` is usually used least square method(MNK). It is a special case of the maximum likelihood principle.

Let's rewrite (1) as `ε_i = y_i - a - b x_i`.

Then the sum of squared errors will be
`Φ = sum_(i=1)^(n) ε_i^2 = sum_(i=1)^(n) (y_i - a - b x_i)^2`. (2)

The principle of the least squares method is to minimize the sum (2) with respect to the parameters `a` and `b`.

The minimum is reached when the partial derivatives of the sum (2) with respect to the coefficients `a` and `b` are equal to zero:
`frac(partial Φ)(partial a) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial a) = 0`
`frac(partial Φ)(partial b) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial b) = 0`

Expanding the derivatives, we obtain a system of two equations with two unknowns:
`sum_(i=1)^(n) (2a + 2bx_i - 2y_i) = sum_(i=1)^(n) (a + bx_i - y_i) = 0`
`sum_(i=1)^(n) (2bx_i^2 + 2ax_i - 2x_iy_i) = sum_(i=1)^(n) (bx_i^2 + ax_i - x_iy_i) = 0`

We open the brackets and transfer the sums independent of the desired coefficients to the other half, we get a system of linear equations:
`sum_(i=1)^(n) y_i = a n + b sum_(i=1)^(n) bx_i`
`sum_(i=1)^(n) x_iy_i = a sum_(i=1)^(n) x_i + b sum_(i=1)^(n) x_i^2`

Solving the resulting system, we find formulas for the coefficients `a` and `b`:

`a = frac(sum_(i=1)^(n) y_i sum_(i=1)^(n) x_i^2 - sum_(i=1)^(n) x_i sum_(i=1)^(n ) x_iy_i) (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.1)

`b = frac(n sum_(i=1)^(n) x_iy_i - sum_(i=1)^(n) x_i sum_(i=1)^(n) y_i) (n sum_(i=1)^ (n) x_i^2 - (sum_(i=1)^(n) x_i)^2)` (3.2)

These formulas have solutions when `n > 1` (the line can be drawn using at least 2 points) and when the determinant `D = n sum_(i=1)^(n) x_i^2 — (sum_(i= 1)^(n) x_i)^2 != 0`, i.e. when the `x_i` points in the experiment are different (i.e. when the line is not vertical).

Estimation of errors in the coefficients of the regression line

For a more accurate estimate of the error in calculating the coefficients `a` and `b`, a large number of experimental points is desirable. When `n = 2`, it is impossible to estimate the error of the coefficients, because the approximating line will uniquely pass through two points.

The error of the random variable `V` is determined error accumulation law
`S_V^2 = sum_(i=1)^p (frac(partial f)(partial z_i))^2 S_(z_i)^2`,
where `p` is the number of `z_i` parameters with `S_(z_i)` error that affect the `S_V` error;
`f` is a dependency function of `V` on `z_i`.

Let's write the law of accumulation of errors for the error of the coefficients `a` and `b`
`S_a^2 = sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial a )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 `,
`S_b^2 = sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial b )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 `,
because `S_(x_i)^2 = 0` (we previously made a reservation that the error of `x` is negligible).

`S_y^2 = S_(y_i)^2` - the error (variance, squared standard deviation) in the `y` dimension, assuming that the error is uniform for all `y` values.

Substituting formulas for calculating `a` and `b` into the resulting expressions, we get

`S_a^2 = S_y^2 frac(sum_(i=1)^(n) (sum_(i=1)^(n) x_i^2 - x_i sum_(i=1)^(n) x_i)^2 ) (D^2) = S_y^2 frac((n sum_(i=1)^(n) x_i^2 - (sum_(i=1)^(n) x_i)^2) sum_(i=1) ^(n) x_i^2) (D^2) = S_y^2 frac(sum_(i=1)^(n) x_i^2) (D)` (4.1)

`S_b^2 = S_y^2 frac(sum_(i=1)^(n) (n x_i - sum_(i=1)^(n) x_i)^2) (D^2) = S_y^2 frac( n (n sum_(i=1)^(n) x_i^2 - (sum_(i=1)^(n) x_i)^2)) (D^2) = S_y^2 frac(n) (D) ` (4.2)

In most real experiments, the value of `Sy` is not measured. To do this, it is necessary to carry out several parallel measurements (experiments) at one or several points of the plan, which increases the time (and possibly cost) of the experiment. Therefore, it is usually assumed that the deviation of `y` from the regression line can be considered random. The variance estimate `y` in this case is calculated by the formula.

`S_y^2 = S_(y, rest)^2 = frac(sum_(i=1)^n (y_i - a - b x_i)^2) (n-2)`.

The divisor `n-2` appears because we have reduced the number of degrees of freedom due to the calculation of two coefficients for the same sample of experimental data.

This estimate is also called the residual variance relative to the regression line `S_(y, rest)^2`.

The assessment of the significance of the coefficients is carried out according to the Student's criterion

`t_a = frac(|a|) (S_a)`, `t_b = frac(|b|) (S_b)`

If the calculated criteria `t_a`, `t_b` are less than the table criteria `t(P, n-2)`, then it is considered that the corresponding coefficient is not significantly different from zero with a given probability `P`.

To assess the quality of the description of a linear relationship, you can compare `S_(y, rest)^2` and `S_(bar y)` relative to the mean using the Fisher criterion.

`S_(bar y) = frac(sum_(i=1)^n (y_i - bar y)^2) (n-1) = frac(sum_(i=1)^n (y_i - (sum_(i= 1)^n y_i) /n)^2) (n-1)` - sample estimate of the variance of `y` relative to the mean.

To evaluate the effectiveness of the regression equation for describing the dependence, the Fisher coefficient is calculated
`F = S_(bar y) / S_(y, rest)^2`,
which is compared with the tabular Fisher coefficient `F(p, n-1, n-2)`.

If `F > F(P, n-1, n-2)`, the difference between the description of the dependence `y = f(x)` using the regression equation and the description using the mean is considered statistically significant with probability `P`. Those. the regression describes the dependence better than the spread of `y` around the mean.

Click on the chart
to add values ​​to the table

Least square method. The method of least squares means the determination of unknown parameters a, b, c, the accepted functional dependence

The method of least squares means the determination of unknown parameters a, b, c,… accepted functional dependence

y = f(x,a,b,c,…),

which would provide a minimum of the mean square (variance) of the error

, (24)

where x i , y i - set of pairs of numbers obtained from the experiment.

Since the condition for the extremum of a function of several variables is the condition that its partial derivatives are equal to zero, then the parameters a, b, c,… are determined from the system of equations:

; ; ; … (25)

It must be remembered that the least squares method is used to select parameters after the form of the function y = f(x) defined.

If from theoretical considerations it is impossible to draw any conclusions about what the empirical formula should be, then one has to be guided by visual representations, primarily a graphical representation of the observed data.

In practice, most often limited to the following types of functions:

1) linear ;

2) quadratic a .

Least square method

In the final lesson of the topic, we will get acquainted with the most famous application FNP, which finds the widest application in various fields of science and practice. It can be physics, chemistry, biology, economics, sociology, psychology and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a ticket to an amazing country called Econometrics=) … How do you not want that?! It's very good there - you just have to decide! …But what you probably definitely want is to learn how to solve problems least squares. And especially diligent readers will learn to solve them not only accurately, but also VERY FAST ;-) But first general statement of the problem+ related example:

Let indicators be studied in some subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be both a scientific hypothesis and based on elementary common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Denote by:

– retail space of a grocery store, sq.m.,
- annual turnover of a grocery store, million rubles.

It is quite clear that the larger the area of ​​the store, the greater its turnover in most cases.

Suppose that after conducting observations / experiments / calculations / dancing with a tambourine, we have at our disposal numerical data:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of the turnover can be obtained using mathematical statistics. However, do not be distracted, the course of commercial espionage is already paid =)

Tabular data can also be written in the form of points and depicted in the usual way for us. Cartesian system .

Let's answer an important question: how many points are needed for a qualitative study?

The bigger, the better. The minimum admissible set consists of 5-6 points. In addition, with a small amount of data, “abnormal” results should not be included in the sample. So, for example, a small elite store can help out orders of magnitude more than “their colleagues”, thereby distorting the general pattern that needs to be found!



If it’s quite simple, we need to choose a function , schedule which passes as close as possible to the points . Such a function is called approximating (approximation - approximation) or theoretical function . Generally speaking, here immediately appears an obvious "pretender" - a polynomial of high degree, the graph of which passes through ALL points. But this option is complicated, and often simply incorrect. (because the chart will “wind” all the time and poorly reflect the main trend).

Thus, the desired function must be sufficiently simple and at the same time reflect the dependence adequately. As you might guess, one of the methods for finding such functions is called least squares. First, let's analyze its essence in a general way. Let some function approximate the experimental data:


How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how big the sum is, but the problem is that the differences can be negative. (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it suggests itself to take the sum modules deviations:

or in folded form: (for those who don't know: is the sum icon, and - auxiliary variable - "counter", which takes values ​​from 1 to ) .

Approximating the experimental points with different functions, we will get different values, and it is obvious where this sum is less - that function is more accurate.

Such a method exists and is called least modulus method. However, in practice it has become much more widespread. least square method, in which possible negative values ​​are eliminated not by the modulus, but by squaring the deviations:



, after which efforts are directed to the selection of such a function that the sum of the squared deviations was as small as possible. Actually, hence the name of the method.

And now we return to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic , exponential , logarithmic , quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive but effective technique:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for straight line equation with optimal values ​​and . In other words, the task is to find SUCH coefficients - so that the sum of the squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now notice that in both cases we are talking about functions of two variables, whose arguments are searched dependency options:

And in essence, we need to solve a standard problem - to find minimum of a function of two variables.

Recall our example: suppose that the "shop" points tend to be located in a straight line and there is every reason to believe the presence linear dependence turnover from the trading area. Let's find SUCH coefficients "a" and "be" so that the sum of squared deviations was the smallest. Everything as usual - first partial derivatives of the 1st order. According to linearity rule you can differentiate right under the sum icon:

If you want to use this information for an essay or coursework, I will be very grateful for the link in the list of sources, you will not find such detailed calculations anywhere:

Let's make a standard system:

We reduce each equation by a “two” and, in addition, “break apart” the sums:

Note : independently analyze why "a" and "be" can be taken out of the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in an "applied" form:

after which the algorithm for solving our problem begins to be drawn:

Do we know the coordinates of the points? We know. Sums can we find? Easily. We compose the simplest system of two linear equations with two unknowns("a" and "beh"). We solve the system, for example, Cramer's method, resulting in a stationary point . Checking sufficient condition for an extremum, we can verify that at this point the function reaches precisely minimum. Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewedhere ) . We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what kind of turnover ("yig") will be at the store with one or another value of the selling area (one or another meaning of "x"). Yes, the resulting forecast will be only a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level of the school curriculum in grades 7-8. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations for the optimal hyperbola, exponent, and some other functions.

In fact, it remains to distribute the promised goodies - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function . Find the sum of squared deviations between empirical and theoretical values. Find out if the function is better (in terms of the least squares method) approximate experimental points.

Note that "x" values ​​are natural values, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular task, both "X" and "G" values ​​can be fully or partially negative. Well, we have been given a “faceless” task, and we start it decision:

We find the coefficients of the optimal function as a solution to the system:

For the purposes of a more compact notation, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in a tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not gifted, and in such cases it saves Cramer's method:
, so the system has a unique solution.

Let's do a check. I understand that I don’t want to, but why skip mistakes where you can absolutely not miss them? Substitute the found solution into the left side of each equation of the system:

The right parts of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions experimental data is best approximated by it.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle "the more - the less"), and this fact is immediately revealed by the negative angular coefficient. Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less sold.

To plot the approximating function, we find two of its values:

and execute the drawing:

The constructed line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression "to be in trend", and I think that this term does not need additional comments.

Calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the "crimson" segments (two of which are so small you can't even see them).

Let's summarize the calculations in a table:


They can again be carried out manually, just in case I will give an example for the 1st point:

but it is much more efficient to do the already known way:

Let's repeat: what is the meaning of the result? From all linear functions function the exponent is the smallest, that is, it is the best approximation in its family. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will it be better to approximate the experimental points?

Let's find the corresponding sum of squared deviations - to distinguish them, I will designate them with the letter "epsilon". The technique is exactly the same:


And again for every fire calculation for the 1st point:

In Excel, we use the standard function EXP (Syntax can be found in Excel Help).

Conclusion: , so the exponential function approximates the experimental points worse than the straight line .

But it should be noted here that "worse" is doesn't mean yet, what is wrong. Now I built a graph of this exponential function - and it also passes close to the points - so much so that without an analytical study it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values ​​of the argument. In various studies, as a rule, economic or sociological, months, years or other equal time intervals are numbered with natural "X". Consider, for example, the following problem:

We have the following data on the store's retail turnover for the first half of the year:

Using straight line analytical alignment, find the sales volume for July.

Yes, no problem: we number the months 1, 2, 3, 4, 5, 6 and use the usual algorithm, as a result of which we get an equation - the only thing when it comes to time is usually the letter “te” (although it's not critical). The resulting equation shows that in the first half of the year, the turnover increased by an average of CU 27.74. per month. Get a forecast for July (month #7): e.u.

And similar tasks - the darkness is dark. Those who wish can use an additional service, namely my Excel calculator (demo version), which solves the problem almost instantly! The working version of the program is available in exchange or for symbolic payment.

At the end of the lesson, a brief information about finding dependencies of some other types. Actually, there is nothing special to tell, since the fundamental approach and solution algorithm remain the same.

Let us assume that the location of the experimental points resembles a hyperbola. Then, in order to find the coefficients of the best hyperbola, you need to find the minimum of the function - those who wish can carry out detailed calculations and come to a similar system:

From a formal technical point of view, it is obtained from the "linear" system (let's mark it with an asterisk) replacing "x" with . Well, the amounts calculate, after which to the optimal coefficients "a" and "be" at hand.

If there is every reason to believe that the points are arranged along a logarithmic curve, then to search for the optimal values ​​and find the minimum of the function . Formally, in the system (*) should be replaced by:

When calculating in Excel, use the function LN. I confess that it will not be difficult for me to create calculators for each of the cases under consideration, but it will still be better if you "program" the calculations yourself. Video tutorials to help.

With exponential dependence, the situation is slightly more complicated. To reduce the matter to the linear case, we take the logarithm of the function and use properties of the logarithm:

Now, comparing the obtained function with the linear function , we come to the conclusion that in the system (*) must be replaced by , and - by . For convenience, we denote:

Please note that the system is resolved with respect to and , and therefore, after finding the roots, you must not forget to find the coefficient itself.

To approximate experimental points optimal parabola , should be found minimum of a function of three variables . After performing standard actions, we get the following "working" system:

Yes, of course, there are more amounts here, but there are no difficulties at all when using your favorite application. And finally, I’ll tell you how to quickly check using Excel and build the desired trend line: create a scatter chart, select any of the points with the mouse and right click select option "Add trendline". Next, select the type of chart and on the tab "Options" activate the option "Show equation on chart". OK

As always, I want to end the article with some beautiful phrase, and I almost typed “Be in trend!”. But in time he changed his mind. And not because it's formulaic. I don’t know how anyone, but I don’t want to follow the promoted American and especially European trend at all =) Therefore, I wish each of you to stick to your own line!

http://www.grandars.ru/student/vysshaya-matematika/metod-naimenshih-kvadratov.html

The least squares method is one of the most common and most developed due to its simplicity and efficiency of methods for estimating the parameters of linear econometric models. At the same time, some caution should be observed when using it, since the models built using it may not meet a number of requirements for the quality of their parameters and, as a result, not “well” reflect the patterns of process development.

Let us consider the procedure for estimating the parameters of a linear econometric model using the least squares method in more detail. Such a model in general form can be represented by equation (1.2):

y t = a 0 + a 1 x 1t +...+ a n x nt + ε t .

The initial data when estimating the parameters a 0 , a 1 ,..., a n is the vector of values ​​of the dependent variable y= (y 1 , y 2 , ... , y T)" and the matrix of values ​​of independent variables

in which the first column, consisting of ones, corresponds to the coefficient of the model .

The method of least squares got its name based on the basic principle that the parameter estimates obtained on its basis must satisfy: the sum of squares of the model error should be minimal.

Examples of solving problems by the least squares method

Example 2.1. The trading enterprise has a network consisting of 12 stores, information on the activities of which is presented in Table. 2.1.

The company's management would like to know how the size of the annual turnover depends on the retail space of the store.

Table 2.1

Shop number Annual turnover, million rubles Trade area, thousand m 2
19,76 0,24
38,09 0,31
40,95 0,55
41,08 0,48
56,29 0,78
68,51 0,98
75,01 0,94
89,05 1,21
91,13 1,29
91,26 1,12
99,84 1,29
108,55 1,49

Least squares solution. Let us designate - the annual turnover of the -th store, million rubles; - selling area of ​​the th store, thousand m 2.

Fig.2.1. Scatterplot for Example 2.1

To determine the form of the functional relationship between the variables and construct a scatterplot (Fig. 2.1).

Based on the scatter diagram, we can conclude that the annual turnover is positively dependent on the selling area (i.e., y will increase with the growth of ). The most appropriate form of functional connection is linear.

Information for further calculations is presented in Table. 2.2. Using the least squares method, we estimate the parameters of the linear one-factor econometric model

Table 2.2

t y t x 1t y t 2 x1t2 x 1t y t
19,76 0,24 390,4576 0,0576 4,7424
38,09 0,31 1450,8481 0,0961 11,8079
40,95 0,55 1676,9025 0,3025 22,5225
41,08 0,48 1687,5664 0,2304 19,7184
56,29 0,78 3168,5641 0,6084 43,9062
68,51 0,98 4693,6201 0,9604 67,1398
75,01 0,94 5626,5001 0,8836 70,5094
89,05 1,21 7929,9025 1,4641 107,7505
91,13 1,29 8304,6769 1,6641 117,5577
91,26 1,12 8328,3876 1,2544 102,2112
99,84 1,29 9968,0256 1,6641 128,7936
108,55 1,49 11783,1025 2,2201 161,7395
S 819,52 10,68 65008,554 11,4058 858,3991
The average 68,29 0,89

Thus,

Therefore, with an increase in the trading area by 1 thousand m 2, other things being equal, the average annual turnover increases by 67.8871 million rubles.

Example 2.2. The management of the enterprise noticed that the annual turnover depends not only on the sales area of ​​the store (see example 2.1), but also on the average number of visitors. The relevant information is presented in table. 2.3.

Table 2.3

Decision. Denote - the average number of visitors to the th store per day, thousand people.

To determine the form of the functional relationship between the variables and construct a scatterplot (Fig. 2.2).

Based on the scatter diagram, we can conclude that the annual turnover is positively related to the average number of visitors per day (i.e., y will increase with the growth of ). The form of functional dependence is linear.

Rice. 2.2. Scatterplot for example 2.2

Table 2.4

t x 2t x 2t 2 yt x 2t x 1t x 2t
8,25 68,0625 163,02 1,98
10,24 104,8575 390,0416 3,1744
9,31 86,6761 381,2445 5,1205
11,01 121,2201 452,2908 5,2848
8,54 72,9316 480,7166 6,6612
7,51 56,4001 514,5101 7,3598
12,36 152,7696 927,1236 11,6184
10,81 116,8561 962,6305 13,0801
9,89 97,8121 901,2757 12,7581
13,72 188,2384 1252,0872 15,3664
12,27 150,5529 1225,0368 15,8283
13,92 193,7664 1511,016 20,7408
S 127,83 1410,44 9160,9934 118,9728
Average 10,65

In general, it is necessary to determine the parameters of the two-factor econometric model

y t \u003d a 0 + a 1 x 1t + a 2 x 2t + ε t

The information required for further calculations is presented in Table. 2.4.

Let us estimate the parameters of a linear two-factor econometric model using the least squares method.

Thus,

Evaluation of the coefficient = 61.6583 shows that, all other things being equal, with an increase in sales area by 1 thousand m 2, the annual turnover will increase by an average of 61.6583 million rubles.

The estimate of the coefficient = 2.2748 shows that, other things being equal, with an increase in the average number of visitors per 1 thousand people. per day, the annual turnover will increase by an average of 2.2748 million rubles.

Example 2.3. Using the information presented in table. 2.2 and 2.4, estimate the parameter of a single-factor econometric model

where is the centered value of the annual turnover of the -th store, million rubles; - centered value of the average daily number of visitors to the t-th store, thousand people. (see examples 2.1-2.2).

Decision. Additional information required for calculations is presented in Table. 2.5.

Table 2.5

-48,53 -2,40 5,7720 116,6013
-30,20 -0,41 0,1702 12,4589
-27,34 -1,34 1,8023 36,7084
-27,21 0,36 0,1278 -9,7288
-12,00 -2,11 4,4627 25,3570
0,22 -3,14 9,8753 -0,6809
6,72 1,71 2,9156 11,4687
20,76 0,16 0,0348 3,2992
22,84 -0,76 0,5814 -17,413
22,97 3,07 9,4096 70,4503
31,55 1,62 2,6163 51,0267
40,26 3,27 10,6766 131,5387
Sum 48,4344 431,0566

Using formula (2.35), we obtain

Thus,

http://www.cleverstudents.ru/articles/mnk.html

Example.

Experimental data on the values ​​of variables X and at are given in the table.

As a result of their alignment, the function

Using least square method, approximate these data with a linear dependence y=ax+b(find options a and b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

Decision.

In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

The values ​​of the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients a and b. We substitute in them the corresponding values ​​from the last column of the table:

Hence, y=0.165x+2.184 is the desired approximating straight line.

It remains to find out which of the lines y=0.165x+2.184 or better approximates the original data, i.e. to make an estimate using the least squares method.

Proof.

So that when found a and b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positive definite. Let's show it.

The second order differential has the form:

I.e

Therefore, the matrix of the quadratic form has the form

and the values ​​of the elements do not depend on a and b.

Let us show that the matrix is ​​positive definite. This requires that the angle minors be positive.

Angular minor of the first order . The inequality is strict, since the points

  • introductory lesson for free;
  • A large number of experienced teachers (native and Russian-speaking);
  • Courses NOT for a specific period (month, six months, year), but for a specific number of lessons (5, 10, 20, 50);
  • Over 10,000 satisfied customers.
  • The cost of one lesson with a Russian-speaking teacher - from 600 rubles, with a native speaker - from 1500 rubles

The essence of the least squares method is in finding the parameters of the trend model that best describes the development trend of any random phenomenon in time or space (a trend is a line that characterizes the trend of this development). The task of the least squares method (OLS) is to find not just some trend model, but to find the best or optimal model. This model will be optimal if the sum of the squared deviations between the observed actual values ​​and the corresponding calculated trend values ​​is minimal (smallest):

where is the standard deviation between the observed actual value

and the corresponding calculated trend value,

The actual (observed) value of the phenomenon under study,

Estimated value of the trend model,

The number of observations of the phenomenon under study.

MNC is rarely used on its own. As a rule, most often it is used only as a necessary technique in correlation studies. It should be remembered that the information basis of the LSM can only be a reliable statistical series, and the number of observations should not be less than 4, otherwise, the smoothing procedures of the LSM may lose their common sense.

The OLS toolkit is reduced to the following procedures:

First procedure. It turns out whether there is any tendency at all to change the resulting attribute when the selected factor-argument changes, or in other words, whether there is a connection between " at " and " X ».

Second procedure. It is determined which line (trajectory) is best able to describe or characterize this trend.

Third procedure.

Example. Suppose we have information on the average sunflower yield for the farm under study (Table 9.1).

Table 9.1

Observation number

Productivity, c/ha

Since the level of technology in the production of sunflower in our country has not changed much over the past 10 years, it means that, most likely, the fluctuations in yield in the analyzed period depended very much on fluctuations in weather and climate conditions. Is it true?

First MNC procedure. The hypothesis about the existence of a trend in the change in sunflower yield depending on changes in weather and climate conditions over the analyzed 10 years is being tested.

In this example, for " y » it is advisable to take the yield of sunflower, and for « x » is the number of the observed year in the analyzed period. Testing the hypothesis about the existence of any relationship between " x " and " y » can be done in two ways: manually and with the help of computer programs. Of course, with the availability of computer technology, this problem is solved by itself. But, in order to better understand the OLS toolkit, it is advisable to test the hypothesis about the existence of a relationship between " x " and " y » manually, when only a pen and an ordinary calculator are at hand. In such cases, the hypothesis of the existence of a trend is best checked visually by the location of the graphic image of the analyzed time series - the correlation field:

The correlation field in our example is located around a slowly increasing line. This in itself indicates the existence of a certain trend in the change in sunflower yield. It is impossible to speak about the presence of any trend only when the correlation field looks like a circle, a circle, a strictly vertical or strictly horizontal cloud, or consists of randomly scattered points. In all other cases, it is necessary to confirm the hypothesis of the existence of a relationship between " x " and " y and continue research.

Second MNC procedure. It is determined which line (trajectory) is best able to describe or characterize the trend in sunflower yield changes for the analyzed period.

With the availability of computer technology, the selection of the optimal trend occurs automatically. With "manual" processing, the choice of the optimal function is carried out, as a rule, in a visual way - by the location of the correlation field. That is, according to the type of chart, the equation of the line is selected, which is best suited to the empirical trend (to the actual trajectory).

As you know, in nature there is a huge variety of functional dependencies, so it is extremely difficult to visually analyze even a small part of them. Fortunately, in real economic practice, most relationships can be accurately described either by a parabola, or a hyperbola, or a straight line. In this regard, with the "manual" option for selecting the best function, you can limit yourself to only these three models.

Hyperbola:

Parabola of the second order: :

It is easy to see that in our example, the trend in sunflower yield changes over the analyzed 10 years is best characterized by a straight line, so the regression equation will be a straight line equation.

Third procedure. The parameters of the regression equation that characterizes this line are calculated, or in other words, an analytical formula is determined that describes the best trend model.

Finding the values ​​of the parameters of the regression equation, in our case, the parameters and , is the core of the LSM. This process is reduced to solving a system of normal equations.

(9.2)

This system of equations is quite easily solved by the Gauss method. Recall that as a result of the solution, in our example, the values ​​of the parameters and are found. Thus, the found regression equation will have the following form:

Choosing the type of regression function, i.e. the type of the considered model of the dependence of Y on X (or X on Y), for example, a linear model y x = a + bx, it is necessary to determine the specific values ​​of the coefficients of the model.

For different values ​​of a and b, it is possible to build an infinite number of dependencies of the form y x = a + bx, i.e., there are an infinite number of lines on the coordinate plane, but we need such a dependence that corresponds to the observed values ​​in the best way. Thus, the problem is reduced to the selection of the best coefficients.

We are looking for a linear function a + bx, based only on a certain number of available observations. To find the function with the best fit to the observed values, we use the least squares method.

Denote: Y i - the value calculated by the equation Y i =a+bx i . y i - measured value, ε i =y i -Y i - difference between the measured and calculated values, ε i =y i -a-bx i .

The method of least squares requires that ε i , the difference between the measured y i and the values ​​of Y i calculated from the equation, be minimal. Therefore, we find the coefficients a and b so that the sum of the squared deviations of the observed values ​​from the values ​​on the straight regression line is the smallest:

Investigating this function of arguments a and with the help of derivatives to an extremum, we can prove that the function takes on a minimum value if the coefficients a and b are solutions of the system:

(2)

If we divide both sides of the normal equations by n, we get:

Given that (3)

Get , from here, substituting the value of a in the first equation, we get:

In this case, b is called the regression coefficient; a is called the free member of the regression equation and is calculated by the formula:

The resulting straight line is an estimate for the theoretical regression line. We have:

So, is a linear regression equation.

Regression can be direct (b>0) and inverse (b Example 1. The results of measuring the X and Y values ​​are given in the table:

x i -2 0 1 2 4
y i 0.5 1 1.5 2 3

Assuming that there is a linear relationship between X and Y y=a+bx, determine the coefficients a and b using the least squares method.

Decision. Here n=5
x i =-2+0+1+2+4=5;
x i 2 =4+0+1+4+16=25
x i y i =-2 0.5+0 1+1 1.5+2 2+4 3=16.5
y i =0.5+1+1.5+2+3=8

and normal system (2) has the form

Solving this system, we get: b=0.425, a=1.175. Therefore y=1.175+0.425x.

Example 2. There is a sample of 10 observations of economic indicators (X) and (Y).

x i 180 172 173 169 175 170 179 170 167 174
y i 186 180 176 171 182 166 182 172 169 177

It is required to find a sample regression equation Y on X. Construct a sample regression line Y on X.

Decision. 1. Let's sort the data by values ​​x i and y i . We get a new table:

x i 167 169 170 170 172 173 174 175 179 180
y i 169 171 166 172 180 176 177 182 182 186

To simplify the calculations, we will compile a calculation table in which we will enter the necessary numerical values.

x i y i x i 2 x i y i
167 169 27889 28223
169 171 28561 28899
170 166 28900 28220
170 172 28900 29240
172 180 29584 30960
173 176 29929 30448
174 177 30276 30798
175 182 30625 31850
179 182 32041 32578
180 186 32400 33480
∑x i =1729 ∑y i =1761 ∑x i 2 299105 ∑x i y i =304696
x=172.9 y=176.1 x i 2 =29910.5 xy=30469.6

According to formula (4), we calculate the regression coefficient

and by formula (5)

Thus, the sample regression equation looks like y=-59.34+1.3804x.
Let's plot the points (x i ; y i) on the coordinate plane and mark the regression line.


Fig 4

Figure 4 shows how the observed values ​​are located relative to the regression line. To numerically estimate the deviations of y i from Y i , where y i are observed values, and Y i are values ​​determined by regression, we will make a table:

x i y i Y i Y i -y i
167 169 168.055 -0.945
169 171 170.778 -0.222
170 166 172.140 6.140
170 172 172.140 0.140
172 180 174.863 -5.137
173 176 176.225 0.225
174 177 177.587 0.587
175 182 178.949 -3.051
179 182 184.395 2.395
180 186 185.757 -0.243

Y i values ​​are calculated according to the regression equation.

The noticeable deviation of some observed values ​​from the regression line is explained by the small number of observations. When studying the degree of linear dependence of Y on X, the number of observations is taken into account. The strength of the dependence is determined by the value of the correlation coefficient.

Example.

Experimental data on the values ​​of variables X and at are given in the table.

As a result of their alignment, the function

Using least square method, approximate these data with a linear dependence y=ax+b(find options a and b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the method of least squares (LSM).

The problem is to find the linear dependence coefficients for which the function of two variables a and b takes the smallest value. That is, given the data a and b the sum of the squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

Derivation of formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables a and b, we equate these derivatives to zero.

We solve the resulting system of equations by any method (for example substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

With data a and b function takes the smallest value. The proof of this fact is given.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and the parameter n- amount of experimental data. The values ​​of these sums are recommended to be calculated separately. Coefficient b found after calculation a.

It's time to remember the original example.

Decision.

In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

The values ​​of the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients a and b. We substitute in them the corresponding values ​​from the last column of the table:

Hence, y=0.165x+2.184 is the desired approximating straight line.

It remains to find out which of the lines y=0.165x+2.184 or better approximates the original data, i.e. to make an estimate using the least squares method.

Estimation of the error of the method of least squares.

To do this, you need to calculate the sums of squared deviations of the original data from these lines and , a smaller value corresponds to a line that better approximates the original data in terms of the least squares method.

Since , then the line y=0.165x+2.184 approximates the original data better.

Graphic illustration of the least squares method (LSM).

Everything looks great on the charts. The red line is the found line y=0.165x+2.184, the blue line is , the pink dots are the original data.

What is it for, what are all these approximations for?

I personally use to solve data smoothing problems, interpolation and extrapolation problems (in the original example, you could be asked to find the value of the observed value y at x=3 or when x=6 according to the MNC method). But we will talk more about this later in another section of the site.

Proof.

So that when found a and b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positive definite. Let's show it.

The second order differential has the form:

I.e

Therefore, the matrix of the quadratic form has the form

and the values ​​of the elements do not depend on a and b.

Let us show that the matrix is ​​positive definite. This requires that the angle minors be positive.

Angular minor of the first order . The inequality is strict, since the points do not coincide. This will be implied in what follows.

Angular minor of the second order

Let's prove that by the method of mathematical induction .

Conclusion: found values a and b correspond to the smallest value of the function , therefore, are the desired parameters for the least squares method.