Matrix of correlation coefficients in econometrics. Given a matrix of paired correlation coefficients

Collinear factors are...

Decision:

It is assumed that two variables are clearly collinear, i.e. are linearly related to each other if . In our model, only the coefficient of paired linear regression between factors and is greater than 0.7. , hence the factors and are collinear.

4. In the multiple regression model, the determinant of the matrix of paired correlation coefficients between the factors , and is close to zero. This means that the factors , and ...

multicollinear

independent

quantifiable

Decision:

To assess the multicollinearity of factors, the determinant of the matrix of paired correlation coefficients between factors can be used. If the factors are not correlated with each other, then the matrix of pairwise correlation coefficients between the factors would be single. Since all off-diagonal elements would be equal to zero.
, because = = and = = =0.
If there is a complete linear dependence between the factors and all pair correlation coefficients are equal to one, then the determinant of such a matrix is ​​equal to zero.


The closer to zero the determinant of the interfactorial correlation matrix, the stronger the multicollinearity of the factors and the more unreliable the results of multiple regression. Conversely, the closer the determinant of the interfactorial correlation matrix is ​​to one, the lower the multicollinearity of the factors.

5. For the econometric model of a linear multiple regression equation, a matrix of paired linear correlation coefficients ( y is the dependent variable; x (1),x (2), x (3), x(4)– independent variables):


Collinear (closely related) independent (explanatory) variables are not

x(2) and x(3)

x(1) and x(3)

x(1) and x(4)

x(2) and x(4)

Decision:

When building a multiple regression model, it is necessary to exclude the possibility of a close linear relationship between independent (explanatory) variables, which leads to the problem of multicollinearity. At the same time, linear correlation coefficients are checked for each pair of independent (explanatory) variables. These values ​​are reflected in the matrix of pairwise linear correlation coefficients. It is believed that the presence of pair correlation coefficients between explanatory variables exceeding 0.7 in absolute value reflects a close relationship between these variables (the closeness of the relationship with the variable y not considered in this case). Such independent variables are called collinear. If the value of the pair correlation coefficient between explanatory variables does not exceed 0.7 in absolute value, then such explanatory variables are not collinear. Let us consider the values ​​of the pair coefficients of interfactorial correlation: between x(1) and x(2) the value is 0.45; between x(1) and x(3)- equal to 0.82; between x(1) and x(4)- equal to 0.94; between x(2) and x(3)– equal to 0.3; between x(2) and x(4)- equal to 0.7; between x(3) and x(4) is equal to 0.12. Thus, the values ​​, , , do not exceed 0.7. Therefore, collinear are not factors x(1) and x(2), x(2) and x(3), x(3) and x(4). Of the last listed pairs, there is a pair in the answer options x(2) and x(3) is the correct answer. For other couples: x(1 and x(3), x(1) and x(4), x(2) and x(4)– the values ​​of the pair coefficients of interfactorial correlation exceed 0.7, and these factors are collinear.

Topic 3: Dummy variables

1. Given a table of initial data for building an econometric regression model:

dummy variables are not

work experience

labor productivity

the level of education

employee skill level

Decision:

When building a regression model, a situation may arise when it is necessary to include in the equation, in addition to quantitative variables, variables that reflect some attributive features (gender, education, region, etc.). Such qualitative variables are called "dummy" variables. To build the model specified in the statement of the task, dummy variables are used: the level of education and the level of qualification of the employee. Other variables are not fictitious, of the proposed options is the length of service and labor productivity.

2. When studying the dependence of meat consumption on the level of income and gender of the consumer, we can recommend ...

use a dummy variable - the gender of the consumer

divide the population into two: for female consumers and for male consumers

use a dummy variable - income level

exclude from consideration the gender of the consumer, since this factor cannot be measured quantitatively

Decision:

When building a regression model, a situation may arise when it is necessary to include in the equation, in addition to quantitative variables, variables that reflect some attributive features (gender, education, region, etc.). Such qualitative variables are called "dummy" variables. They reflect the heterogeneity of the statistical population under study and are used for better modeling of dependencies in such heterogeneous objects of observation. When modeling individual dependencies on heterogeneous data, you can also use the method of dividing the entire collection of heterogeneous data into several separate collections, the number of which is equal to the number of states of the dummy variable. Thus, the correct answers are: "use a dummy variable - the gender of the consumer" and "divide the population into two: for female consumers and for male consumers."

3. We study the dependence of the apartment price ( at) from her living area ( X) and type of house. The model includes dummy variables reflecting the considered types of houses: monolithic, panel, brick. The regression equation is obtained: ,
where ,
Particular regression equations for brick and monolithic are ...

for house type brick

for house type monolithic

for house type brick

for house type monolithic

Decision:

It is required to find out the private regression equation for brick and monolithic houses. For a brick house, the values ​​of the dummy variables are as follows , . The equation will take the form: or for the type of brick house.
For a monolithic house, the values ​​of dummy variables are as follows , . The equation will take the form
or for the type of house monolithic.

The correlation coefficient reflects the degree of relationship between two indicators. Always takes a value from -1 to 1. If the coefficient is located near 0, then they say that there is no connection between the variables.

If the value is close to one (from 0.9, for example), then there is a strong direct relationship between the observed objects. If the coefficient is close to the other extreme point of the range (-1), then there is a strong inverse relationship between the variables. When the value is somewhere in the middle from 0 to 1 or from 0 to -1, then we are talking about a weak relationship (forward or reverse). This relationship is usually not taken into account: it is considered that it does not exist.

Calculation of the correlation coefficient in Excel

Consider, for example, methods for calculating the correlation coefficient, features of the direct and inverse relationship between variables.

Values ​​of indicators x and y:

Y is the independent variable, x is the dependent variable. It is necessary to find the strength (strong / weak) and the direction (forward / reverse) of the relationship between them. The formula for the correlation coefficient looks like this:


To simplify its understanding, we will break it down into several simple elements.

There is a strong direct relationship between the variables.

The built-in CORREL function avoids complicated calculations. Let's calculate the pair correlation coefficient in Excel using it. We call the master of functions. We find what we need. The function arguments are an array of y values ​​and an array of x values:

Let's show the values ​​of the variables on the chart:


There is a strong relationship between y and x, because The lines run almost parallel to each other. The relationship is direct: increasing y - increasing x, decreasing y - decreasing x.



Matrix of Pairwise Correlation Coefficients in Excel

The correlation matrix is ​​a table, at the intersection of the rows and columns of which are the correlation coefficients between the corresponding values. It makes sense to build it for several variables.

The matrix of correlation coefficients in Excel is built using the "Correlation" tool from the "Data Analysis" package.


A strong direct relationship was found between the values ​​of y and x1. There is a strong feedback between x1 and x2. There is practically no connection with the values ​​in the x3 column.

y x (1) x (2) x (3) x (4) x (5)
y 1.00 0.43 0.37 0.40 0.58 0.33
x (1) 0.43 1.00 0.85 0.98 0.11 0.34
x (2) 0.37 0.85 1.00 0.88 0.03 0.46
x (3) 0.40 0.98 0.88 1.00 0.03 0.28
x (4) 0.58 0.11 0.03 0.03 1.00 0.57
x (5) 0.33 0.34 0.46 0.28 0.57 1.00

An analysis of the matrix of paired correlation coefficients shows that the performance indicator is most closely related to the indicator x(4) - the amount of fertilizers used per 1 ha ().

At the same time, the relationship between the features-arguments is quite close. So, there is practically a functional relationship between the number of wheeled tractors ( x(1)) and the number of surface tillage tools .

The presence of multicollinearity is also evidenced by the correlation coefficients and . Given the close relationship of indicators x (1) , x(2) and x(3) , only one of them can enter the yield regression model.

To demonstrate the negative impact of multicollinearity, consider a yield regression model including all inputs:

Fobs = 121.

In parentheses are the values ​​of the corrected estimates of the standard deviations of the estimates of the coefficients of the equation .

Under the regression equation, the following adequacy parameters are presented: multiple coefficient of determination ; corrected estimate of the residual variance , average relative approximation error and calculated value of the -criterion Fobs = 121.

The regression equation is significant because F obl = 121 > F kp = 2.85 found from the table F- distributions at a=0.05; n 1 =6 and n 2 =14.

It follows from this that Q¹0, i.e., and at least one of the coefficients of the equation q j (j= 0, 1, 2, ..., 5) is not equal to zero.

To test the hypothesis about the significance of individual regression coefficients H0: q j =0, where j=1,2,3,4,5, compare critical value t kp = 2.14, found from the table t-distributions at significance level a=2 Q=0.05 and the number of degrees of freedom n=14, with the calculated value . It follows from the equation that the regression coefficient is statistically significant only when x(4) since ½ t 4½=2.90 > t kp=2.14.



The negative signs of the regression coefficients at x(1) and x(5) . From the negative values ​​of the coefficients, it follows that an increase in the saturation of agriculture with wheeled tractors ( x(1)) and plant health products ( x(5)) negatively affects the yield. Thus, the resulting regression equation is unacceptable.

To obtain a regression equation with significant coefficients, we use a step-by-step regression analysis algorithm. Initially, we use a step-by-step algorithm with the elimination of variables.

Exclude a variable from the model x(1) , which corresponds to the minimum absolute value of ½ t 1½=0.01. For the remaining variables, we will again construct the regression equation:

The resulting equation is significant, because F obs = 155 > F kp = 2.90, found at a significance level a=0.05 and numbers of degrees of freedom n 1 =5 and n 2 =15 according to the table F-distributions, i.e. vector q¹0. However, only the regression coefficient is significant in the equation at x(4) . Calculated values ​​½ t j ½ for other coefficients less than t kr = 2.131 found in the table t-distributions for a=2 Q=0.05 and n=15.

Excluding a variable from the model x(3) , which corresponds to the minimum value t 3 =0.35 and get the regression equation:

(2.9)

In the resulting equation, it is not statistically significant and we cannot economically interpret the coefficient at x(5) . Excluding x(5) we get the regression equation:

(2.10)

We have obtained a meaningful regression equation with meaningful and interpretable coefficients.

However, the resulting equation is not the only “good” or “best” yield model in our example.

Let us show that in the condition of multicollinearity, the step-by-step algorithm with the inclusion of variables is more efficient. The first step in the yield model y includes a variable x(4) , which has the highest correlation coefficient with y, explained by the variable - r(y,x(4))=0.58. In the second step, including the equation along with x(4) variables x(1) or x(3) , we will get models that are superior to (2.10) for economic reasons and statistical characteristics:

(2.11)

(2.12)

The inclusion of any of the three remaining variables in the equation worsens its properties. See, for example, equation (2.9).

Thus, we have three “good” yield models, from which one must be chosen for economic and statistical reasons.

According to statistical criteria, model (2.11) is the most adequate. It corresponds to the minimum values ​​of the residual variance = 2.26 and the average relative approximation error and the largest values ​​and Fobs = 273.

Model (2.12) has somewhat worse indicators of adequacy, and then model (2.10).

We will now choose the best of models (2.11) and (2.12). These models differ from each other in variables x(1) and x(3) . However, in yield models, the variable x(1) (number of wheeled tractors per 100 ha) is preferable to variable x(3) (number of surface tillage implements per 100 ha), which is somewhat secondary (or derived from x (1)).

In this connection, for economic reasons, preference should be given to model (2.12). Thus, after implementing the stepwise regression analysis algorithm with the inclusion of variables and taking into account the fact that only one of the three related variables should enter the equation ( x (1) , x(2) or x(3)) choose the final regression equation:

The equation is significant at a=0.05, because F obl = 266 > F kp = 3.20 found from the table F-distributions for a= Q=0.05; n 1 =3 and n 2 =17. All regression coefficients are also significant in equation ½ t j½> t kp (a=2 Q=0.05; n=17)=2.11. The regression coefficient q 1 should be recognized as significant (q 1 ¹0) for economic reasons, while t 1 =2.09 only slightly less t kp = 2.11.

It follows from the regression equation that an increase per unit in the number of tractors per 100 hectares of arable land (with a fixed value x(4)) leads to an increase in grain yields by an average of 0.345 c/ha.

An approximate calculation of the coefficients of elasticity e 1 "0.068 and e 2" 0.161 shows that with an increase in indicators x(1) and x(4) by 1%, the grain yield increases by an average of 0.068% and 0.161%, respectively.

The multiple coefficient of determination indicates that only 46.9% of the yield variation is explained by the indicators included in the model ( x(1) and x(4)), that is, the saturation of crop production with tractors and fertilizers. The rest of the variation is due to the action of unaccounted for factors ( x (2) , x (3) , x(5) , weather conditions, etc.). The average relative approximation error characterizes the adequacy of the model, as well as the value of the residual variance . When interpreting the regression equation, the values ​​of relative approximation errors are of interest . Recall that - the model value of the effective indicator characterizes the average yield value for the totality of the considered areas, provided that the values ​​of the explanatory variables x(1) and x(4) fixed at the same level, namely x (1) = x i(1) and x (4) = x i(4) . Then for the values ​​of d i yields can be compared. Areas that correspond to d values i>0, have an above-average yield, and d i<0 - ниже среднего.

In our example, crop production is most efficient in the area corresponding to d 7 \u003d 28%, where the yield is 28% higher than the average for the region, and the least efficient - in the area with d 20 =-27,3%.


Tasks and exercises

2.1. From the general population ( y, x (1) , ..., x(p)), where y has a normal distribution law with conditional mathematical expectation and variance s 2 , a random sample of volume n, let it go ( y i, x i (1) , ..., x i(p)) - result i th observation ( i=1, 2, ..., n). Determine: a) the mathematical expectation of the least squares estimate of the vector q; b) the covariance matrix of the least squares estimate of the vector q; c) the mathematical expectation of the estimate.

2.2. According to the condition of problem 2.1, find the mathematical expectation of the sum of squared deviations due to regression, i.e. EQ R, where

.

2.3. According to the condition of problem 2.1, determine the mathematical expectation of the sum of squared deviations due to the residual variation relative to the regression lines, i.e. EQ ost where

2.4. Prove that under the hypothesis Н 0: q=0 the statistics

has an F-distribution with degrees of freedom n 1 =p+1 and n 2 =n-p-1.

2.5. Prove that when the hypothesis H 0: q j =0 is fulfilled, the statistics has a t-distribution with the number of degrees of freedom n=n-p-1.

2.6. Based on the data (Table 2.3) on the dependence of fodder bread shrinkage ( y) on the duration of storage ( x) find a point estimate of the conditional mathematical expectation under the assumption that the general regression equation is linear.

Table 2.3.

It is required: a) to find estimates and residual variance s 2 under the assumption that the general regression equation has the form ; b) check for a=0.05 the significance of the regression equation, i.e. hypothesis H 0: q=0; c) with reliability g=0.9 determine the interval estimates of the parameters q 0 , q 1 ; d) with reliability g=0.95 determine the interval estimate of the conditional expectation for X 0=6; e) determine at g=0.95 the confidence interval of prediction at the point X=12.

2.7. Based on the data on the dynamics of the growth rate of the share price for 5 months, given in Table. 2.4.

Table 2.4.

months ( x)
y (%)

and the assumption that the general regression equation has the form , it is required: a) to determine the estimates and parameters of the regression equation and the residual variance s 2 ; b) check at a=0.01 the significance of the regression coefficient, i.e. hypotheses H 0: q 1 =0;

c) with reliability g=0.95 find interval estimates of the parameters q 0 and q 1 ; d) with reliability g = 0.9, establish an interval estimate of the conditional mathematical expectation for x 0=4; e) determine at g=0.9 the confidence interval of prediction at the point x=5.

2.8. The results of the study of the dynamics of weight gain in young animals are given in Table 2.5.

Table 2.5.

Assuming that the general regression equation is linear, it is required: a) to determine estimates and parameters of the regression equation and residual variance s 2 ; b) check for a=0.05 the significance of the regression equation, i.e. hypotheses H 0: q=0;

c) with reliability g=0.8 to find interval estimates of the parameters q 0 and q 1 ; d) with reliability g=0.98 determine and compare the interval estimates of the conditional mathematical expectation for x 0 =3 and x 1 =6;

e) determine at g=0.98 the confidence interval of prediction at the point x=8.

2.9. Cost price ( y) one copy of the book, depending on the circulation ( x) (thousand copies) is characterized by data collected by the publishing house (Table 2.6). Determine the least squares estimates and parameters of the hyperbolic regression equation , with reliability g=0.9 build confidence intervals for the parameters q 0 and q 1 , as well as the conditional expectation at x=10.

Table 2.6.

Determine estimates and parameters of the regression equation of the type x=20.

2.11. In table. 2.8 reported growth rates (%) of the following macroeconomic indicators n\u003d 10 developed countries of the world for 1992: GNP - x(1) , industrial production - x(2) , price index - x (3) .

Table 2.8.

Countries x and parameters of the regression equation, estimation of the residual variance; b) check at a=0.05 the significance of the regression coefficient, i.e. H 0: q 1 =0; c) with reliability g=0.9 find interval estimates q 0 and q 1 ; d) find at g=0.95 the confidence interval for at the point X 0 =x i, where i=5; e) compare the statistical characteristics of the regression equations: 1, 2 and 3.

2.12. Solve problem 2.11, taking for the value to be explained ( at) indicator x(1) , and for the explanatory ( X) variable x (3) .

1. Ayvazyan S.A., Mkhitaryan V.S. Applied Statistics and Fundamentals of Econometrics: Textbook. M., UNITI, 1998 (2nd edition 2001);

2. Ayvazyan S.A., Mkhitaryan V.S. Applied Statistics in Problems and Exercises: Textbook. M. UNITY - DANA, 2001;

3. Aivazyan S.A., Enyukov I.S., Meshalkin L.D. Applied statistics. Dependency research. M., Finance and statistics, 1985, 487p.;

4. Aivazyan S.A., Buchstaber V.M., Enyukov I.S., Meshalkin L.D. Applied statistics. Classification and dimensionality reduction. M., Finance and statistics, 1989, 607p.;

5. Johnston J. Econometric Methods, Moscow: Statistics, 1980, 446 pp.;

6. Dubrov A.V., Mkhitaryan V.S., Troshin L.I. Multivariate statistical methods. M., Finance and statistics, 2000;

7. Mkhitaryan V.S., Troshin L.I. Research of dependences by methods of correlation and regression. M., MESI, 1995, 120 pp.;

8. Mkhitaryan V.S., Dubrov A.M., Troshin L.I. Multidimensional statistical methods in economics. M., MESI, 1995, 149p.;

9. Dubrov A.M., Mkhitaryan V.S., Troshin L.I. Mathematical statistics for businessmen and managers. M., MESI, 2000, 140s.;

10. Lukashin Yu.I. Regression and adaptive forecasting methods: Textbook, M., MESI, 1997.

11. Lukashin Yu.I. Adaptive methods of short-term forecasting. - M., Statistics, 1979.


APPS


Appendix 1. Options for tasks for independent computer research.

1. Calculate the matrix of paired correlation coefficients; analyze the tightness and direction of the relationship of the resulting feature Y with each of the factors. X; evaluate the statistical significance of the correlation coefficients r(Y,X i); choose the most informative factor.

2. Build a paired regression model with the most informative factor; give an economic interpretation of the regression coefficient.

3. Evaluate the quality of the model using the average relative error of approximation, the coefficient of determination and F - Fisher's criterion (take the significance level α = 0.05).

4. With a confidence probability γ=80% to predict the average value of the indicator Y(forecast values ​​of the factors are given in Appendix 6). Present graphically actual and model values Y, prediction results.

5. Using the inclusion method, build two-factor models, keeping the most informative factor in them; build a three-factor model with a complete list of factors.

6. Choose the best of the built multiple models. Give an economic interpretation of its coefficients.

7. Check the significance of the multiple regression coefficients using t–Student's test (accept significance level α=0.05). Has the quality of the multiple model improved compared to the pair model?

8. Assess the influence of factors on the result using elasticity coefficients, beta and delta coefficients.

Task 2. Modeling a one-dimensional time series

Appendix 7 shows the time series Y(t) socio-economic indicators for the Altai Territory for the period from 2000 to 2011. It is required to study the dynamics of the indicator corresponding to the task variant.

Option Designation, name, unit of measurement of the indicator
Y1 Average consumer spending per capita (per month), rub.
Y2 Emissions of pollutants into the atmospheric air, thousand tons
Y3 Average prices in the secondary housing market (at the end of the year, per square meter of total area), rub
Y4 Volume of paid services per capita, rub
Y5 Average annual number of people employed in the economy, thousand people
Y6 Number of own cars per 1000 people (at the end of the year), units
Y7 Average per capita cash income (per month), rub
Y8 Consumer price index (December to December of the previous year), %
Y9 Investments in fixed assets (in actual prices), million rubles
Y10 Retail trade turnover per capita (in actual prices), rub


Work order

1. Build a linear model of the time series, the parameters of which are estimated by the least squares. Explain the meaning of the regression coefficient.

2. Assess the adequacy of the constructed model using the properties of randomness, independence, and compliance of the residual component with the normal distribution law.

3. Evaluate the accuracy of the model based on the use of the average relative approximation error.

4. Forecast the indicator under consideration for a year ahead (calculate the forecast interval with a confidence level of 70%).

5. Present graphically the actual values ​​of the indicator, the results of modeling and forecasting.

6. Calculate the parameters of the logarithmic, polynomial (polynomial of the 2nd degree), power, exponential and hyperbolic trends. Based on the graphic image and the value of the determination index, select the most appropriate type of trend.

7. With the help of the best non-linear model, carry out point forecasting of the considered indicator for the year ahead. Compare the result obtained with the predictive confidence interval built using the linear model.

EXAMPLE

Performing control work

Task 1

The company sells used cars. The names of indicators and initial data for econometric modeling are presented in the table:

Realization price, thousand c.u. ( Y) The price of a new car, thousand c.u. ( X1) Service life, years ( X2) Left hand drive - 1, right hand drive - 0, ( X3)
8,33 13,99 3,8
10,40 19,05 2,4
10,60 17,36 4,5
16,58 25,00 3,5
20,94 25,45 3,0
19,13 31,81 3,5
13,88 22,53 3,0
8,80 16,24 5,0
13,89 16,54 2,0
11,03 19,04 4,5
14,88 22,61 4,6
20,43 27,56 4,0
14,80 22,51 3,3
26,05 31,75 2,3

Required:

1. Calculate the matrix of paired correlation coefficients; analyze the tightness and direction of the relationship of the resulting feature Y with each of the factors X; evaluate the statistical significance of the correlation coefficients r(Y, X i); choose the most informative factor.

Using Excel (Data / Data Analysis / CORRELATION):

Let's get a matrix of pair correlation coefficients between all available variables:

At X1 X2 X3
At
X1 0,910987
X2 -0,4156 -0,2603
x3 0,190785 0,221927 -0,30308

Let's analyze the correlation coefficients between the resulting feature Y and each of the factors X j:

> 0, therefore, between variables Y and X 1 there is a direct correlation: the higher the price of a new car, the higher the selling price.

> 0.7 - this dependence is close.

< 0, значит, между переменными Y and X 2 observed

inverse correlation: the selling price is lower for auto-

mobile phones with a long service life.

– this dependence is moderate, closer to weak.

> 0, so between variables Y and X 3 shows a direct correlation: the selling price is higher for left-hand drive cars.

< 0,4 – эта зависимость слабая.

To check the significance of the found correlation coefficients, we use Student's test.

For each correlation coefficient compute t-statistics by formula and enter the calculation results in an additional column of the correlation table:

At X1 X2 X3 t-statistics
At
X1 0,910987 7,651524603
X2 -0,4156 -0,2603 1,582847988
x3 0,190785 0,221927 -0,30308 0,673265587

According to the table of critical points of Student's distribution at the level of significance and the number of degrees of freedom, we determine the critical value (Appendix 1, or the function STUDRASP).Y and the service life X 2 is reliable.

< , следовательно, коэффициент не является значимым. На основании выборочных данных нет оснований утверждать, что зависимость между ценой реализации Y and steering wheel position X 3 is reliable.

Thus, the closest and most significant relationship is observed between the selling price Y and the price of a new car X one ; factor X 1 is the most informative.

Multiple regression is not the result of a transformation of the equation:

-
;

-
.

Linearization implies a procedure...

- bringing the equation of multiple regression to the steam room;

+ bringing a nonlinear equation to a linear form;

- reduction of a linear equation to a non-linear form;

- reduction of a nonlinear equation with respect to parameters to an equation that is linear with respect to the result.

Remains do not change;

The number of observations decreases

In a standardized multiple regression equation, the variables are:

Initial variables;

Standardized parameters;

Mean values ​​of initial variables;

standardized variables.

One method for assigning numeric values ​​to dummy variables is. . .

+– ranking;

Alignment of numerical values ​​in ascending order;

Alignment of numerical values ​​in descending order;

Finding the mean.

The matrix of paired correlation coefficients displays the values ​​of the pairwise linear correlation coefficients between. . . .

Variables;

parameters;

Parameters and variables;

Variable and random factors.

The method for estimating the parameters of models with heteroscedastic residuals is called the ____________ least squares method:

Ordinary;

Indirect;

generalized;

Minimum.

The regression equation is given. Define the model specification.

Polynomial Pair Regression Equation;

Linear simple regression equation;

Polynomial equation of multiple regression;

Linear multiple regression equation.

In a standardized equation, the free term is ….

Equals 1;

Equal to the coefficient of multiple determination;

Equal to the multiple correlation coefficient;

Is absent.

Factors are included as dummy variables in the multiple regression model.

Having probabilistic values;

Having quantitative values;

Not having qualitative values;

Not having quantitative values.

The factors of the econometric model are collinear if the coefficient ...

Correlations between them modulo more than 0.7;

The determinations between them are greater than 0.7 in absolute value;

The determinations between them are less than 0.7 in absolute value;

The generalized least squares method differs from the usual least squares method in that, when using GLS ...

The original levels of the variables are converted;

Remains do not change;

The remainder is equal to zero;

The number of observations decreases.

The sample size is determined ...

The numerical value of the variables selected in the sample;

The volume of the general population;

The number of parameters for independent variables;

The number of result variables.

11. Multiple regression is not the result of a transformation of the equation:

+-
;

-
;

-
.

The initial values ​​of the dummy variables assume the values ​​...

quality;

Quantitatively measurable;

The same;

Values.

The generalized least squares method implies ...

Variable conversion;

Transition from multiple regression to pair regression;

Linearization of the regression equation;

Two-stage application of the least squares method.

The linear equation of multiple regression has the form . Determine which factor or :

+- , since 3.7>2.5;

They have the same effect;

- , since 2.5>-3.7;

According to this equation, it is impossible to answer the question posed, since the regression coefficients are incomparable among themselves.

The inclusion of a factor in the model is advisable if the regression coefficient for this factor is ...

Zero;

insignificant;

essential;

Insignificant.

What is transformed when applying the generalized least squares method?

Standardized regression coefficients;

Dispersion of the effective feature;

Initial levels of variables;

Dispersion of a factor sign.

A study is being made of the dependence of the production of an enterprise employee on a number of factors. An example of a dummy variable in this model would be ______ employee.

Age;

The level of education;

Wage.

The transition from point estimation to interval estimation is possible if the estimates are:

Effective and insolvent;

Inefficient and wealthy;

Efficient and unbiased;

Wealthy and displaced.

A matrix of pairwise correlation coefficients is built to identify collinear and multicollinear …

parameters;

Random factors;

significant factors;

results.

Based on the transformation of variables using the generalized least squares method, we obtain a new regression equation, which is:

Weighted regression in which variables are taken with weights
;

;

Nonlinear regression in which variables are taken with weights
;

Weighted regression in which variables are taken with weights .

If the calculated value of the Fisher criterion is less than the tabular value, then the hypothesis of the statistical insignificance of the equation ...

Rejected;

insignificant;

accepted;

Not essential.

If the factors are included in the model as a product, then the model is called:

total;

derivative;

Additive;

Multiplicative.

The regression equation that relates the resulting feature to one of the factors with the value of other variables fixed at the average level is called:

Multiple;

essential;

Private;

Insignificant.

Regarding the number of factors included in the regression equation, there are ...

Linear and non-linear regression;

Direct and indirect regression;

Simple and multiple regression;

Multiple and multivariate regression.

The requirement for regression equations, the parameters of which can be found using the least squares method, is:

Equality to zero of the values ​​of the factor attribute4

Non-linearity of parameters;

Equality to zero of the average values ​​of the resulting variable;

Linearity of parameters.

The least squares method is not applicable for ...

Linear equations of pair regression;

Polynomial multiple regression equations;

Equations that are non-linear in terms of the estimated parameters;

Linear equations of multiple regression.

When dummy variables are included in the model, they are assigned ...

Null values;

Numeric labels;

Same values;

Quality labels.

If there is a non-linear relationship between economic indicators, then ...

It is not practical to use the specification of a non-linear regression equation;

It is advisable to use the specification of a non-linear regression equation;

It is advisable to use the specification of a linear paired regression equation;

It is necessary to include other factors in the model and use a linear multiple regression equation.

The result of the linearization of polynomial equations is ...

Nonlinear Pair Regression Equations;

Linear equations of pair regression;

Nonlinear multiple regression equations;

Linear equations of multiple regression.

In the standardized multiple regression equation
0,3;
-2.1. Determine which factor or has a stronger effect on :

+- , since 2.1>0.3;

According to this equation, it is impossible to answer the question posed, since the values ​​of the “pure” regression coefficients are unknown;

- , since 0.3>-2.1;

According to this equation, it is impossible to answer the question posed, since the standardized coefficients are not comparable with each other.

The factor variables of a multiple regression equation converted from qualitative to quantitative are called ...

anomalous;

Multiple;

Paired;

Fictitious.

Estimates of the parameters of the linear equation of multiple regression can be found using the method:

Medium squares;

The largest squares;

Normal squares;

Least squares.

The main requirement for the factors included in the multiple regression model is:

Lack of relationship between result and factor;

Lack of relationship between factors;

Lack of linear relationship between factors;

The presence of a close relationship between factors.

Dummy variables are included in the multiple regression equation to take into account the effect of features on the result ...

quality character;

quantitative nature;

of a non-essential nature;

Random character.

From a pair of collinear factors, the econometric model includes the factor

Which, with a fairly close connection with the result, has the greatest connection with other factors;

Which, in the absence of connection with the result, has the maximum connection with other factors;

Which, in the absence of a connection with the result, has the least connection with other factors;

Which, with a fairly close relationship with the result, has a smaller relationship with other factors.

Heteroskedasticity refers to...

The constancy of the variance of the residuals, regardless of the value of the factor;

The dependence of the mathematical expectation of the residuals on the value of the factor;

Dependence of the variance of residuals on the value of the factor;

Independence of the mathematical expectation of the residuals from the value of the factor.

The value of the residual variance when a significant factor is included in the model:

Will not change;

will increase;

will be zero;

Will decrease.

If the specification of the model displays a non-linear form of dependence between economic indicators, then the non-linear equation ...

regressions;

determinations;

Correlations;

Approximations.

The dependence is investigated, which is characterized by a linear multiple regression equation. For the equation, the value of the tightness of the relationship between the resulting variable and a set of factors is calculated. As this indicator, a multiple coefficient was used ...

Correlations;

elasticity;

regressions;

Determinations.

A model of dependence of demand on a number of factors is being built. The dummy variable in this multiple regression equation is not _________consumer.

Family status;

The level of education;

For an essential parameter, the calculated value of the Student's criterion ...

More than the table value of the criterion;

Equal to zero;

Not more than the tabular value of the Student's criterion;

Less than the table value of the criterion.

An LSM system built to estimate the parameters of a linear multiple regression equation can be solved...

Moving average method;

The method of determinants;

Method of first differences;

Simplex method.

An indicator characterizing how many sigmas the result will change on average when the corresponding factor changes by one sigma, with the level of other factors unchanged, is called ____________ regression coefficient

standardized;

Normalized;

Aligned;

Centered.

The multicollinearity of the factors of the econometric model implies…

The presence of a non-linear relationship between the two factors;

The presence of a linear relationship between more than two factors;

Lack of dependence between factors;

The presence of a linear relationship between the two factors.

Generalized least squares is not used for models with _______ residuals.

Autocorrelated and heteroscedastic;

homoscedastic;

heteroskedastic;

Autocorrelated.

The method for assigning numeric values ​​to dummy variables is not:

Ranging;

Assignment of digital labels;

Finding the average value;

Assignment of quantitative values.

Normally distributed residues;

Homoscedastic residues;

Autocorrelation residuals;

Autocorrelations of the resulting trait.

The selection of factors in a multiple regression model using the inclusion method is based on a comparison of values ​​...

The total variance before and after including the factor in the model;

Residual variance before and after including random factors in the model;

Variances before and after inclusion of the result in the model;

Residual variance before and after inclusion of factor model.

The generalized least squares method is used to correct...

Parameters of the nonlinear regression equation;

The accuracy of determining the coefficient of multiple correlation;

Autocorrelations between independent variables;

Heteroskedasticity of residuals in the regression equation.

After applying the generalized least squares method, it is possible to avoid _________ residuals

heteroskedasticity;

Normal distribution;

Equal to zero sums;

Random character.

Dummy variables are included in the ____________regression equations

Random;

steam room;

Indirect;

Multiple.

The interaction of the factors of the econometric model means that…

The influence of factors on the resulting feature depends on the values ​​of another non-collinear factor;

The influence of factors on the resulting attribute increases, starting from a certain level of factor values;

Factors duplicate each other's influence on the result;

The influence of one of the factors on the resulting attribute does not depend on the values ​​of the other factor.

Topic Multiple Regression (Problems)

The regression equation, built on 15 observations, has the form:

Missing values ​​as well as confidence interval for

with a probability of 0.99 are:

The regression equation, built on 20 observations, has the form:

with a probability of 0.9 are:

The regression equation, built on 16 observations, has the form:

Missing values ​​as well as confidence interval for with a probability of 0.99 are:

The regression equation in a standardized form is:

The partial elasticity coefficients are equal to:

The standardized regression equation is:

The partial elasticity coefficients are equal to:

The standardized regression equation is:

The partial elasticity coefficients are equal to:

The standardized regression equation is:

The partial elasticity coefficients are equal to:

The standardized regression equation is:

The partial elasticity coefficients are equal to:

Based on 18 observations, the following data were obtained:

;
;
;
;

are equal:

Based on 17 observations, the following data were obtained:

;
;
;
;

Values ​​of the adjusted coefficient of determination, partial coefficients of elasticity and parameter are equal:

Based on 22 observations, the following data were obtained:

;
;
;
;

Values ​​of the adjusted coefficient of determination, partial coefficients of elasticity and parameter are equal:

Based on 25 observations, the following data were obtained:

;
;
;
;

Values ​​of the adjusted coefficient of determination, partial coefficients of elasticity and parameter are equal:

Based on 24 observations, the following data were obtained:

;
;
;
;

Values ​​of the adjusted coefficient of determination, partial coefficients of elasticity and parameter are equal:

Based on 28 observations, the following data were obtained:

;
;
;
;

Values ​​of the adjusted coefficient of determination, partial coefficients of elasticity and parameter are equal:

Based on 26 observations, the following data were obtained:

;
;
;
;

Values ​​of the adjusted coefficient of determination, partial coefficients of elasticity and parameter are equal:

In the regression equation:

Restore missing characteristics; construct a confidence interval for with a probability of 0.95 if n=12