Abstract: Spurs in econometrics. Multiple regression model specification Model specification regression equation selection

Depending on the number of factors included in the regression equation, paired and multiple regressions are distinguished.

Equation of relationship between two variables and x called pairwise regression , and the dependence y from several explanatory variables = ( x 1 ,x 2 ,... x n)– multiple regression .

The pairwise regression equation is:

Where - independent variable affecting at; – model coefficients.

As already noted, at the first stage of econometric research, the choice of the form of relationship between variables is made, i.e. the specification of the regression equation is carried out. For this purpose, their range of factors influencing the resulting variable at, the most significantly influencing factors are highlighted. Pairwise regression is considered sufficient if it is possible to isolate the dominant factor, which is used as an explanatory (independent) variable. The magnitude of random errors depends on the correct choice of model specification: the closer the actual data, the smaller they are at to the values ​​calculated using the constructed equation.

Model specification errors include not only the incorrect choice of a particular mathematical function f relationships between variables at and , but underestimation of any significant factor in the regression equation, i.e. using pairwise regression instead of multiple regression.

In pairwise regression, the choice of mathematical function can be done graphically, analytically and experimentally.

Most often, to select the type of paired regression equation, it is used graphic method , based on the construction of a correlation field. The main types of curves used in assessing the relationships between variables are presented in Figure 1:




a) b) V)

Analytical method choosing the type of regression equation consists of studying the material nature of the relationship between the factors under study and taking into account the degrees of their influence on each other in the regression equation.

Using experimental method equations of various types are constructed, and then the best one is selected from them in terms of the magnitude of the error variance:

.

The smaller the error variance, the better the constructed regression equation fits the original data.

The construction of a multiple regression equation begins with deciding on the specification of the model. It includes two ranges of issues: selection of factors and choice of the type of regression equation.

The inclusion of a particular set of factors in a multiple regression equation is primarily related to the researcher’s understanding of the nature of the relationship between the modeled indicator and other economic phenomena. Factors included in multiple regression must meet the following requirements.

    They must be quantifiable. If it is necessary to include a qualitative factor in the model that does not have a quantitative measurement, then it must be given quantitative certainty.

    Factors should not be intercorrelated, much less be in an exact functional connection.

The selection of factors is made on the basis of qualitative theoretical and economic analysis. However, theoretical analysis often does not allow us to unambiguously answer the question about the quantitative relationship of the characteristics under consideration and the advisability of including the factor in the model. Therefore, the selection of factors is usually carried out in two stages: in the first, factors are selected based on the essence of the problem; on the second, statistics for regression parameters are determined based on the matrix of correlation indicators.

Intercorrelation coefficients (i.e., correlations between explanatory variables) allow redundant factors to be excluded from the model. It is believed that two variables are explicitly collinear, i.e. are in a linear relationship with each other if. If the factors are clearly collinear, then they duplicate each other and it is recommended to exclude one of them from the regression. In this case, preference is given not to the factor that is more closely related to the result, but to the factor that, despite a sufficiently close connection with the result, has the least close connection with other factors. This requirement reveals the specificity of multiple regression as a method for studying the complex impact of factors in conditions of their independence from each other.

The magnitude of pairwise correlation coefficients reveals only clear collinearity of factors. The greatest difficulties in using the multiple regression apparatus arise in the presence of multicollinearity of factors, when more than two factors are related to each other by a linear relationship, i.e. there is a cumulative influence of factors on each other.

To assess multicollinearity of factors, the determinant of the matrix of paired correlation coefficients between factors can be used.

The closer to zero the determinant of the interfactor correlation matrix is, the stronger the multicollinearity of the factors and the more unreliable the results of multiple regression. And, conversely, the closer the determinant of the interfactor correlation matrix is ​​to one, the less multicollinearity of the factors.

There are a number of approaches to overcome strong interfactor correlation. The simplest way to eliminate multicollinearity is to exclude one or more factors from the model. Another approach involves transforming factors, which reduces the correlation between them.

When selecting factors, it is also recommended to use the following rule: the number of factors included is usually 6–7 times less than the volume of the population on which the regression is built. If this relationship is violated, then the number of degrees of freedom of the residual dispersion is very small. This leads to the fact that the parameters of the regression equation turn out to be statistically insignificant, and the -criterion is less than the table value.

The basis of econometrics is the construction of an econometric model and the determination of the possibilities of using this model to describe the analysis and forecasting of real economic processes. The goals of the course project are development of design solutions for information and methodological support for research in the field of econometric modeling, as well as obtaining practical skills in constructing and researching econometric models. The ultimate applied goal of econometric modeling of real socio-economic processes in a given...


Share your work on social networks

If this work does not suit you, at the bottom of the page there is a list of similar works. You can also use the search button


MINISTRY OF EDUCATION AND SCIENCE OF RUSSIA

Federal State Budgetary Educational Institution

higher professional education

"Tver State Technical University"

(TvSTU)

Institute of Additional Professional Education

Department of Accounting, Analysis and Audit

Course project

Discipline: Econometrics

On the topic: “Comparative analysis of econometric regression models”

COMPLETED: 3rd year student

Institute of Further Education and Training

Groups RBAiA-37-12

Zamyatin

Kristina Dmitrievna

(Student's full name)

CHECKED:

Konovalova A. S.

(Teacher's full name)

Rzhev 2015

INTRODUCTION

CHAPTER 1. ANALYTICAL PART

Fundamentals of econometric research of regression models.

Technology of econometric research of regression models.

CHAPTER 2. DESIGN PART

2.1 Information and methodological support

econometric research

Paired and multiple regression.

CONCLUSION

List of SOURCES USED

INTRODUCTION

Econometrics is a science whose subject of study is quantitative patterns and interdependencies in the economy based on the methods of mathematical statistics. The basis of econometrics is the construction of an econometric model and the determination of the possibilities of using this model to describe, analyze and forecast real economic processes.

By creating the ability to make informed economic decisions, econometric analysis is the basis of economic analysis and forecasting.

In any field of economics, the activity of a specialist requires the use of modern working methods based on econometric models, concepts and techniques.

The number of people arriving in EU countries for permanent residence was chosen as the subject of econometric research in the course project. Migration processes are an extremely important factor for assessing the prospects for the development of society, therefore the relevance of the research topic determines the growing social significance of these processes in the modern world.

Economic research of migration processes is a significant factor in increasing the efficiency of countries' development. The history of human development is inextricably linked with changes in population dynamics. In Europe, rapid population growth is primarily due to socio-economic changes, i.e. follows economic growth and social changes.

The goals of the course project are development of design solutions for information and methodological support for research in the field of econometric modeling, as well as obtaining practical skills in constructing and researching econometric models.

The objective of the course project is to use in practice knowledge and skills in the construction and research of econometric models for conducting econometric data analysis.

The ultimate applied goal of econometric modeling of real socio-economic processes in this course project is the forecast of economic and socio-economic indicators characterizing the state and development of the analyzed system, that is, the determination of trends in migration processes in the EU countries and their dependence on existing factors taken into account when constructing an econometric models.

CHAPTER 1. ANALYTICAL PART

1.1. Fundamentals of econometric research of regression models.

The economic discipline concerned with the development and application of statistical methods to measure relationships among econometric variables is econometrics, which is a combination of economic theory, statistics, and mathematics.

Econometric data are not the results of a controlled experiment. Econometrics deals with specific economic data and is concerned with the quantitative description of specific relationships, that is, it replaces coefficients presented in a general form with specific numerical values. In econometrics, special analysis methods are developed to reduce the impact of measurement errors on the results obtained.

The main tool of econometrics is an econometric model, that is, a formalized description of quantitative relationships between variables. The modeling methodology contains great opportunities for self-development, since modeling is a cyclical process, each cycle can be followed by the next, and knowledge about the object under study is expanded and refined, the original model is gradually improved. Deficiencies discovered after the previous modeling cycle, due to poor knowledge of the object and errors in model construction, can be corrected in subsequent cycles.

Three classes of econometric models can be distinguished:

Temporal data model;

Single equation regression model;

System of simultaneous equations.

Classification of problems solved using an econometric model: 1) according to final applied goals:

Forecast of econometric and socio-economic indicators characterizing the state and development of the analyzed system;

Imitation of possible scenarios for the socio-economic development of the system.

2) by hierarchy level:

Macro-level tasks (country as a whole);

Meso-level tasks (regions, industries, corporations);

Micro level (family, enterprise, firm).

3) according to the profile of the econometric system, aimed at studying:

Market;

Investment, financial or social policy;

Pricing;

Distribution relations;

Demand and consumption;

A set of problems.

Main stages of econometric modeling:

Stage 1 - staging. Determination of the final goals of the model, the set of factors and indicators involved in it, and their role. The main objectives of the research: analysis of the state and behavior of an economic object, forecast of its economic indicators, imitation of the development of the object, development of management decisions.

Stage 2 - a priori. Analysis of the essence of the object under study, formation and formalization of information known before the start of modeling.

Stage 3 - parameterization. Selecting the general form of the model, the composition and form of the connections included in it. The main task of this stage is to select the function f(X).

Stage 4 - informational. Collection of necessary statistical information.

Stage 5 - model identification. Statistical analysis of the model and estimation of its parameters. The bulk of econometric research.

Stage 6 - model verification. Checking the adequacy of the model, assessing the accuracy of model data. It turns out how successfully the problems of specification and identification have been solved, and what is the accuracy of calculations using this model. It is checked how well the constructed model corresponds to the simulated real economic object or process.

When modeling economic processes in econometric models, the following is used:

1. Spatial data - a set of information on different objects taken over the same period of time.

2. Temporary data - a set of information characterizing the same object, but for different periods of time.

A set of information represents a set of features that characterize the object of study. Signs can act in one of two roles: the role of an effective sign and the role of a factor sign.

Variables are divided into:

Exogenous, the values ​​of which are set from the outside;

Endogenous, the values ​​of which are determined within the model;

Lagged - endogenous or exogenous variables of the econometric model, dated to previous points in time and located in the equation with current variables;

Predetermined - exogenous variables tied to past, current and future points in time and lagged endogenous variables already known at a given point in time.

Econometrics primarily looks at model specification errors by assuming that measurement errors are kept to a minimum.

Model specification - selection of the type of functional dependence (regression equations). The magnitude of random errors will not be the same across model specifications, and minimizing the residual term allows the best specification to be selected.

In addition to the choice of model specification, the correct description of the model structure is also important. The value of the resulting attribute may not depend on the actual value of the explanatory variable, but on the value that was expected in the previous period.

The simplest regression model with only two variables is part of the class of single-equation regression models, in which one explained variable is represented as a function of several independent (explanatory) variables and parameters. This class includes multiple regression models.

Simpler are time series models that explain the behavior of a time series based only on its previous values, these are the models:

Trend,

Seasonality,

Adaptive forecast,

Moving average, etc.

More general are systems of simultaneous equations in which, in addition to explanatory variables, the right-hand sides may also contain explained variables from other equations, i.e. different from the explained variable on the left side of this equation.

When using separate regression equations, it is assumed that factors can be changed independently of each other, although in reality their changes are not independent, and a change in one variable most often entails changes in the entire system of characteristics, because they are interconnected. It is necessary to be able to describe the structure of relationships between variables using a system of simultaneous (structural) equations.

Statistical and mathematical models of economic phenomena and processes are determined by the specifics of a particular area of ​​economic research. The theory and practice of expert assessments is an important section of econometrics, since expert assessments are used to solve a number of economic problems.

More well-known in theoretical and educational publications are various econometric models designed to predict macroeconomic indicators. These are usually models aimed at forecasting a multivariate time series. They represent a system of linear dependencies between past and present values ​​of variables. In such tasks, both the structure of the model is assessed, i.e. the type of dependence between the values ​​of the known vector coordinates at previous times and their values ​​at the predicted moment, as well as the coefficients included in this dependence. The structure of such a model is an object of non-numerical nature. Each area of ​​economic research has its own econometric models.

1.2. Technology of econometric research of regression models.

Research and quantitative assessment of objectively existing relationships and dependencies between economic phenomena is the main task of econometrics.

A cause-and-effect relationship is a relationship between phenomena in which a change in one of them, called the cause, leads to a change in the other, called the effect. Therefore, the cause always precedes the effect.

Cause-and-effect relationships between phenomena are of the greatest interest to the researcher, which makes it possible to identify factors that have a major influence on the variation of the phenomena and processes being studied.

Cause-and-effect relationships in socio-economic phenomena have the following features:

1. cause X and effect Y do not interact directly, but through intermediate factors, which are omitted in the analysis.

2. socio-economic phenomena develop and are formed as a result of the simultaneous influence of a large number of factors. One of the main problems in studying these phenomena is the task of identifying the main causes and abstracting from the secondary ones.

According to the direction of change, connections are divided into:

1. direct (changes in the resultant and factor characteristics occur in the same direction),

2. reverse (changes in the resultant and factor characteristics occur in opposite directions).

Based on the nature of the manifestation, they are distinguished:

1. functional connection - a connection in which a certain value of a factor characteristic corresponds to one and only one value of the resultant characteristic, manifests itself in all cases of observation and for each specific unit of the population under study, and is studied mainly in the natural sciences.

2. stochastic dependence - a causal dependence that does not manifest itself in each individual case, but in general, with a large number of observations, and the same values ​​of factor characteristics, as a rule, correspond to different values ​​of the resulting characteristic, but, considering the entire set of observations, it is possible note the presence of a certain relationship between the values ​​of the characteristics. A special case of a stochastic relationship is a correlation relationship, in which a change in the average value of an effective characteristic is due to a change in factor characteristics.

According to the analytical expression, connections are distinguished:

1. linear: the change in the resulting characteristic is directly proportional to the change in the factor characteristics.

2. nonlinear.

Analytically, a linear stochastic relationship between phenomena can be represented by an equation of a straight line on a plane, or an equation of a hyperplane in n-dimensional space (if there are n factor variables).

Building an econometric model is the basis of econometric research. The degree of reliability of the analysis results and their applicability depends on how well the resulting model describes the studied patterns between economic processes.

The construction of an econometric model begins with the specification of the model, which consists in obtaining an answer to two questions:

1) what economic indicators should be included in the model;

2) what type of analytical relationship between the selected characteristics is.

In studies devoted to the development of methods for forecasting such financial indicators as exchange rates, securities, and indices, models are widely used based on the assumption that the dynamics of these processes are completely determined by internal conditions.

After identifying the set of variables under consideration, the next step is to determine the specific type of model that best matches the phenomenon being studied.

Based on the nature of the relationships between factors and variables, the model is divided into linear and nonlinear. Based on the properties of their parameters, models are divided into models with a constant and variable structure.

A special type of models consists of systems of interconnected econometric equations.

If, on the basis of a preliminary qualitative analysis of the phenomenon under consideration, it is not possible to unambiguously select the most suitable type of model, then several alternative models are considered, among which, during the research process, the one that best corresponds to the phenomenon under study is selected.

In general, the procedure for constructing an econometric model can be represented in the following steps:

1. Model specification, i.e., selection of a class of models that are most suitable for describing the phenomena and processes being studied.

This stage involves solving two problems:

a) selection of significant factors for their subsequent inclusion in the model;

b) choosing the type of model, i.e. choosing the type of analytical relationship connecting the variables included in the model.

2. Estimation of model parameters, i.e. obtaining numerical values ​​of model constants. In this case, a previously obtained array of source data is used.

3. Checking the quality of the constructed model and justifying the possibility of its further use. The most complex and time-consuming part of econometric research is the stage of estimating model parameters, where methods of probability theory and mathematical statistics are used.

When solving the problem of choosing the type of analytical dependence, various considerations can be used:

Conclusions from analytical studies on the qualitative nature of the dependence,

Description of the properties of various analytical dependencies,

Goals of building the model.

The choice of the type of econometric model is based, first of all, on the results of preliminary qualitative or substantive analysis carried out using the methods of economic theory. The nature of the expected dependence is justified based on theoretical assumptions about the nature of the pattern of development of the phenomenon or process being studied.

Another approach is based on the analysis of an array of initial data, which allows us to identify some characteristics of the expected dependencies and, on this basis, formulate, as a rule, several assumptions about the type of analytical connection. The constructed model is used to formulate assumptions about the nature of the pattern in the development of the phenomenon being studied, which are tested during further research.

Linear models are most widely used in econometrics.

This is due to several reasons:

There are effective methods for constructing such models.

In a small range of values ​​of factor characteristics, linear models can approximate real nonlinear dependencies with sufficient accuracy.

The model parameters have a clear economic interpretation.

Forecasts based on linear models are characterized by a lower risk of significant forecast error.

An important component of the process of constructing an econometric model is the selection of factors that significantly influence the indicator being studied and are to be included in the model being developed. The optimal set of factors is determined based on qualitative and quantitative analysis.

At the stage of problem formulation and meaningful economic analysis of the economic model, factors are selected whose influence should be taken into account when constructing the model. In some cases, a set of factors is determined unambiguously or with a high degree of confidence. In more complex cases, the next stage uses formal statistical methods to check the feasibility of including each factor in the model. First of all, the factors are checked for the presence of a close linear correlation between them, the existence of which leads to unreliable estimates of the model parameters.

To overcome strong interfactor correlation, the following are used:

exclusion of one or more factors from the model. Of the two correlating factors, the one that is more correlated with the other factors is eliminated;

transformation of factors, which reduces the correlation between them.

One of the criteria for including factors in the model is the degree of their isolated influence on the resulting trait.

Two methods for determining the optimal set of factors:

1. inclusion method. A regression equation is constructed with one most influential factor, then the following factors are sequentially introduced into it and the pair of most influential factors is determined, then one more factor is added to the first two and the best three factors are determined, etc. At each step, a regression model is built and tested significance of factors. Only significant factors are included in the model. To test the significance of a factor, either the Student's t test or Fisher's partial test can be used. The process ends when there are no more factors to include in the model.

2. method of exclusion. A regression equation is constructed with a full set of factors, from which insignificant or least significant factors are then sequentially excluded. At each step, only one factor is excluded, since after eliminating a factor, another factor, which was previously insignificant, may become significant. The process ends when there are no more factors to exclude.

Inclusion and exclusion methods do not guarantee the determination of the optimal set of factors, but in most cases they provide results that are either optimal or close to them. It is not recommended to include a very large number of factors in the model, as this may make it difficult to identify qualitative patterns and increases the risk of including unimportant random factors in the model. To obtain reliable parameter estimates, it is desirable that the number of observations exceeds the number of parameters to be determined by at least 6-7 times.

After selecting factors and choosing the type of analytical dependence, the model parameters are assessed. When estimating model parameters, a previously prepared array of observations is used as initial data. The quality of estimates is determined by the presence of such properties as unbiasedness, consistency and efficiency. A parameter estimate is called unbiased if its mathematical expectation is equal to the estimated parameter. An estimate of a parameter is called consistent if it converges in probability to the estimated parameter as the number of observations increases. A parameter estimate is said to be efficient if it has the smallest variance among possible unbiased parameter estimates calculated from samples of the same size n.

CHAPTER 2. DESIGN PART

2.1 Information and methodological support for econometric research.

The econometric research methodology includes the following stages: specification; parameterization, verification, additional research.

1. Specification of paired and multiple regression equation models includes an analysis of the correlation dependence of the dependent variable on each explanatory variable. Based on the results of the analysis, a conclusion is made about the regression equation model. As a result of this stage, the regression equation model is determined.

2. Parameterization of a pairwise regression equation involves estimating regression parameters and their socio-economic interpretation. For parameterization, it is recommended to use the “Regression” tool as part of the MsExcel “Data Analysis” add-ons. Based on the results of automated regression analysis, regression parameters are determined and their interpretation is also given.

Thus, an econometric study of paired regression includes calculating the parameters of regression equations, assessing error variances and variances of model parameters, assessing the strength of the relationship between a factor and the result using the elasticity coefficient, assessing the closeness of the relationship, assessing the quality of the equation using the average error of approximation, assessing the statistical reliability of regression equations using Fisher's F test.

To construct and analyze paired regression, a list of the twenty largest countries of the European Union was selected from the statistical yearbook, namely the number of people arriving in the country for permanent residence and the nominal annual wages of employees.

The correlation coefficient is calculated using the formula:

Where

The correlation coefficient shows the close relationship between the phenomena being studied.

To construct a paired regression equation, it is necessary to consider possible regression equations:

  1. linear dependence
  2. exponential relationship
  3. quadratic dependence
  4. cubic dependence

To estimate the regression parameters, we apply the least squares method (OLS) to all these models.

The idea of ​​the method is to obtain the best approximation of a set of observations x i , y i , i = 1,…, n linear function in the sense of minimizing the functional:

To calculate parameters a and b linear regression solves a system of equations with respect to a and b.

from which parameter estimates can be determined a and b.

t Student's test.

A hypothesis is put forward H 0 about the random nature of the indicator, i.e. its insignificant difference from zero. H 0 : =0

The construction of the exponential curve equation is preceded by the procedure of linearization of variables by taking the logarithm of both sides of the equation:

The parameters of the model equation are found using the following formulas:

A linear equation is obtained.

X , theoretical value results can be obtained. Based on them, an indicator of the closeness of the connection correlation index is calculated.

This coefficient is checked for significance using t Student's test.

Calculation of estimates of error variances and variances of model parameters is carried out using the following formulas:

The equation of a quadratic curve is constructed by making the replacement

Substituting actual values ​​into the equation X

This coefficient is checked for significance using t Student's test.

Calculation of estimates of error variances and variances of model parameters is carried out using the following formulas:

The equation of a cubic curve is constructed by making the replacement

This results in a linear equation

Substituting actual values ​​into this equation X , theoretical value results can be obtained. Using them, we will calculate the indicator of connection closeness correlation index.

This coefficient is checked for significance using t Student's test.

Calculation of estimates of error variances and variances of model parameters is carried out using the following formulas:

The average elasticity coefficient shows by what percentage on average the result y will change from its average value when the factor x changes by 1% from its average value:

The coefficient of determination provides an assessment of the quality of the constructed model. The coefficient of determination characterizes the proportion of the variance of the resulting characteristic y, explained by regression, in the total variance of the resulting characteristic.

The coefficient of determination is equal to the square of the correlation index. The closer to unity, the better the quality of fit, i.e. approximates y more accurately.

Average error of approximation average deviation of calculated values ​​from actual ones:

The permissible limit of values ​​is no more than 8-10%.

The significance of the regression equation is assessed using F -Fisher criterion. In this case, a null hypothesis is put forward about the equality of the actual and residual variances, and therefore the factor x has no effect on y, i.e.

H 0 : D actual = D rest

To do this, a comparison is made between the actual and critical (tabular) values F -Fisher criterion. determined from the ratio of the values ​​of factor and residual variances:

The maximum possible value of the criterion under the influence of random factors with given degrees of freedom and significance level. Significance level is the probability of rejecting a correct hypothesis given that it is true.

If<, то отклоняется и признается статистическая значимость и надежность уравнения регрессии, иначе - принимается и делается вывод о не значимости уравнения регрессии.

3. Parameterization of a multiple regression equation involves estimating regression parameters and their socio-economic interpretation. For parameterization, it is recommended to use the “Regression” tool as part of the MsExcel “Data Analysis” add-ons. Based on the results of automated regression analysis, regression parameters are determined and their interpretation is also given.

The regression equation is verified based on the results of automated regression analysis.

Thus, an econometric study of multiple regression includes the construction of a multiple regression equation, the calculation of elasticity coefficients for each factor and a comparative assessment of the strength of the relationship of each factor with the result, the economic interpretation of the constructed model, the construction of a correlation matrix, the calculation of the multiple correlation coefficient, the calculation of estimates of model error variances and estimates of model parameters, constructing confidence intervals for model coefficients with a selected significance level, checking the significance of each coefficient, assessing the closeness of the relationship, assessing the statistical reliability of the regression equation using Fisher's F test.

To construct and analyze multiple regression, several more indicators are introduced into the model to take into account several factors influencing the number of people arriving in the country for permanent residence. Namely, such factors as the number of unemployed and the country's GDP.

Multiple regression relationship equation with several unknown variables:

where y dependent variable (resultative characteristic),

Independent variables (factors).

To construct a multiple regression equation, a linear function written in matrix form is used:

Where,

To estimate the parameters of the multiple regression equation, the least squares method is used:

The following system of equations is constructed, the solution of which allows us to obtain estimates of the regression parameters:

Its explicit solution is usually written in matrix form, otherwise it becomes too cumbersome.

Estimates of model parameters in matrix form are determined by the expression:

X matrix of values ​​of explanatory variables;

Y vector of values ​​of the dependent variable.

To identify the dependence of the number of people arriving for permanent residence on the nominal annual salary of hired workers, the number of unemployed and the level of GDP, we will construct a multiple regression equation in the form:

To characterize the relative strength of influence of factors on y Let's calculate the average elasticity coefficients. Average elasticity coefficients for linear regression are calculated using the formulas:

With a linear dependence, the multiple correlation coefficient can be determined through the matrix of paired correlation coefficients:

where is the determinant of the matrix of paired correlation coefficients;

Determinant of the interfactor correlation matrix.

Matrix of pair correlation coefficients:

Interfactor correlation matrix:

Calculation of estimates of error variances and variances of model parameters is carried out using the following formulas:

To assess the statistical significance of regression coefficients, we calculate t -Student's test and confidence intervals for each parameter. A hypothesis is put forward about the random nature of the indicators, i.e. about their insignificant difference from zero. We get a set of hypotheses:

: b 0 =0; b 1 =0; b 2 =0; b 3 =0

t -Student's t-test is carried out by comparing their values ​​with the table value, calculated as a quantile of the Student's distribution, where the significance level is the probability of rejecting the correct hypothesis, provided that it is true.

To calculate confidence intervals, use the following formula:

The quality of the constructed model as a whole is assessed by the coefficient of determination. The coefficient of multiple determination is calculated as the square of the multiple correlation index: .

The adjusted index of multiple determination contains a correction for the number of degrees of freedom and is calculated using the formula:

where n number of observations;

m number of factors.

The significance of the multiple regression equation as a whole, as well as in paired regression, is assessed using F- Fisher test:

In this case, a hypothesis is put forward about the insignificance of the regression equation:

Finally, a judgment is formed about the quality of the regression equation.

4. A comparative analysis of regression models is carried out.

2.2. An example of an econometric study.

Based on statistical data, econometric research is carried out in accordance with the methodology of clause 2.1.

All necessary calculations are carried out using MS Excel, manual calculations are carried out, and the results obtained are checked using the functions of the data analysis package “Regression”.

The linear pair correlation coefficient is:

0,504652547

The correlation coefficient has a positive value and is equal to a moderate direct relationship between the indicator y and factor x : with an increase in the average annual salary of a country’s workers, the number of people arriving in the country increases.

2. Paired regression is constructed and analyzed. Initial data are presented in Table 1.

Table 1. Initial data for constructing and analyzing paired regression

y - number of people who arrived in the country for permanent residence, thousand people;

As a result of the analysis, it is necessary to establish how much the wages of hired workers in the country influence the number of people who arrived in the country for permanent residence.

Parameter Estimation a and b.

Regression equation:

Regression coefficient b =4.279 shows the average change in the result with a change in the factor by one unit: with an increase in the annual salary of hired workers by 1 thousand euros. the number of arrivals for permanent residence will increase by an average of 4.279 thousand people. A positive value of the regression coefficient indicates the direct direction of the relationship.

The linear pair correlation coefficient is:

0,504652547

The connection is direct and moderate.

2.47 T table (0.05;18) = 2.101

>T table , the coefficient is significant.

Calculation of estimates of error variances and variances of model parameters is carried out. Interim calculations are presented in Table 2.

10765,218 = 1477,566815 = 2,976774696

Construction of the exponential curve equation.

The regression parameter values ​​were

0,068027 = 1,68049

The linear equation obtained is: .

After potentiation:

Correlation index.

This coefficient is checked for significance.

2.15 T tab (0.05;18) = 2.101

>T table , the coefficient is significant.

Calculation of estimates of error variances and variances of model parameters is carried out. Interim calculations are presented in Table 3.

As a result, the following values ​​were obtained:

11483,75 = 452,87517 = 3,1754617

Table 2. Calculation of values ​​for the linear model

Table 3. Calculation of values ​​for the exponential model

The equation of a quadratic curve is constructed.

Equation parameters:

Correlation index.

This coefficient is checked for significance.

3.41 T tab (0.05;18) = 2.101

>T table , the coefficient is significant.

Calculation of estimates of error variances and variances of model parameters is carried out. Interim calculations are presented in Table 4.

As a result, the following values ​​were obtained:

8760,35808 = 743,283328 = 0,00123901

The equation of a cubic curve is constructed.

Equation parameters:

The regression equation takes the form:

Correlation index.

This coefficient is checked for significance.

4.38 T tab (0.05;18) = 2.101

>T table , the coefficient is significant.

Calculation of estimates of error variances and variances of model parameters is carried out. Interim calculations are presented in Table 5.

As a result, the following values ​​were obtained:

6978.45007 = 514.7649432 = 5.9851E-07

The highest degree of connection between variables is in the model with cubic dependence, because the correlation coefficient in the cubic model is closest to unity, and the lowest in the exponential model. The variances of errors and model parameters take minimum cubic values.

Table 4. Calculation of values ​​for the quadratic model

Table 5. Calculation of values ​​for the cubic model

The average elasticity coefficient is found.

Linear dependence

1,250028395 %.

Exponential dependence

1,2083965

With an increase in the annual wages of hired workers by 1%, the number of people arriving in the country for permanent residence increases by 1,2083965 % .

Quadratic dependence

With an increase in the annual wages of hired workers by 1%, the number of people arriving in the country for permanent residence increases by 1,24843054 % .

Cubic dependence

0,938829224

With an increase in the annual wages of hired workers by 1%, the number of people arriving in the country for permanent residence increases by 0,938829224 % .

The elasticity coefficients are shown in Table 6.

All constructed models confirm that the wages of hired workers are a factor in increasing the number of people arriving in the country for permanent residence. The elasticity coefficient shows that the annual wages of hired workers have a greater influence on the number of people arriving in the country for permanent residence with linear and quadratic dependencies. To a lesser extent, this relationship can be traced in the cubic dependence.

The coefficient of determination is found.

Linear dependence

The regression equation explains 25% of the variance of the effective attribute, and the remaining factors account for 75% of its variance.

The linear dependence model does not approximate the original data well.

Exponential dependence =

The relationship between the indicators is as weak as in the linear model. Variation only 20% explained by variation X , and the remaining factors account for 80%. The connection in this model is the weakest. Therefore, the quality of the model is unsatisfactory.

Quadratic dependence

The relationship between the indicators is slightly better than in the exponential and linear models. The variation in y is only 40% explained by the variation in x. It is also not advisable to use this model for forecasting.

Cubic dependence

The relationship between the indicators is better than in previous models. 52% of the variation in y is explained by the variation in x.

The values ​​of the coefficients of determination are presented in Table 6.

Table 6. Calculation of parameters and characteristics of models.

The quality of the built models is low, The model with a cubic dependence had the highest quality score, since the share of explained variation was 52%.

The average error of approximation is determined the average deviation of the calculated values ​​from the actual ones:

Linear model = 1153,261 %

On average, calculated values ​​deviate from actual ones by 1153,261 %, which indicates a very large approximation error.

Exponential dependence = 396,93259

The approximation error is slightly lower than that of other models, but is also unacceptable.

Quadratic dependence = 656,415018

A high approximation error is observed, which indicates a low quality of fitting of the equation

Cubic dependence = 409,3804652

The approximation error also significantly exceeds the acceptable values.In all the models considered, the average error of approximation significantly exceeds the permissible values, and the quality of fitting the models to the original data is very low.

3. Multiple regression construction and analysis is carried out.

The initial data for constructing multiple regression are shown in Table 7.

Table 7. Initial data for constructing multiple regression.

y - number of people who arrived in the country for permanent residence, thousand people:

x 1 - nominal annual wages of employees, thousand euros.

x 2 - number of unemployed, thousand people.

x 3 - GDP, billion euros.

Estimates of regression equation parameters:

Multiple regression equation:

Average elasticity coefficients.

0,12026241 = -0,06319176 = 0,86930458

The calculation of these values ​​is given in Table 8.

With an increase in the annual wages of hired workers by 1% of the average level, with other factors remaining unchanged, the number of people arriving for permanent residence increases by 0,12 %.

With an increase in the number of unemployed by 1% of the average, with other factors remaining unchanged, the number of people arriving for permanent residence decreases by 0,06 %

With an increase in GDP by 1% of the average, with other factors remaining unchanged, the number of arrivals for permanent residence increases by 0,87 %

The change in the number of people arriving in the country for permanent residence is directly dependent on the annual wages of hired workers and the level of the country's GDP and inversely dependent on the number of unemployed, which does not contradict logical assumptions. Elasticity coefficients, as indicators of the strength of the connection, show that the largest change in the number of arrivals to the country is caused by the value of GDP, and the smallest by the number of unemployed.

The multiple correlation coefficient is calculated:

The multiple correlation index value ranges from 0 to 1.

The average approximation error is calculated:

372,353247%

The value of the average error of approximation indicates a poor fit of the model to the original data.

Table 8. Calculation of the values ​​of the characteristics of the multiple regression model

The combined influence of all factors on the number of people arriving in the country for permanent residence is quite large. WITHthe relationship between the indicator under consideration and the factors influencing it has strengthened compared to paired regression ( r yx =0.506). There is a fairly strong connection.

It is necessary to take into account that there is a slight multicollinearity in the model, which may indicate its instability, since the determinant of the interfactor correlation matrix is ​​quite far from 1. The maximum pair correlation coefficient is observed between factors x 1 and x 3 (r x 1 x 3 =0.595), which is understandable, because The average annual salary in the country should be directly dependent on the country's GDP.

Calculation of estimates of error variances and variances of model parameters:

n = 20 number of observations, m =4 number of parameters.

For the constructed model, the error variance estimate was:

6674,02207

Estimates of variances of model parameters:

Standard errors of model parameters:

Interim calculations of the obtained data are presented in Appendix 8.

Assessing the significance of regression coefficients using t -Student's t-test.

Meanings,<, значит коэффициенты являются статистически незначимыми и случайно отличаются от 0.

>, therefore it is statistically significant

For the constructed model, confidence intervals of regression coefficients:

All obtained regression coefficients, except, are statistically insignificant, the confidence interval for them is quite large, which may indicate insufficient quality of the model.

Coefficient of multiple determination for the constructed model

This coefficient of determination shows that the quality of the model is satisfactory.

With the addition of another variable it usually increases. In order to avoid possible exaggeration of the closeness of the connection, an adjusted coefficient of determination is used. For a given volume of observations, all other things being equal, with an increase in the number of independent variables (parameters), the adjusted coefficient of multiple determination decreases. For the constructed model, the values ​​of the adjusted and unadjusted coefficient of determination do not differ significantly from each other, but since the adjusted coefficient of determination decreased slightly, which suggests that the increase in the proportion of explained regression when adding a new variable is insignificant, and that adding a variable is not advisable.

Assessing the significance of the regression equation using F -Fisher criterion.

F (0.05, m -1, n - m )= F (0.05,1,18)= 4.413873

Linear model = 6,150512218

Exponential dependence = 4,6394274

Quadratic dependence = 11,6775003

Cubic dependence = 19,25548322

In all considered models<, гипотеза отвергается.

The significance of the multiple regression equation as a whole using F- Fisher test:

Since F table< F факт then it is not accepted

4. As a result of the study, we can draw the following conclusion: All obtained regression equations are significant. According to the results F -test and the indicators of the coefficient of determination and the average error of approximation, we can conclude that among the considered paired regression models there is no model with good quality that could be used for forecasting purposes. However, the best model that describes the relationship between the annual salary of a country’s wage workers and the number of people arriving in the country for permanent residence is a model with a cubic dependence, since it is significant, the coefficient of determination takes the largest value and the average error of approximation is not so large in comparison with other models, although it does not accept an acceptable value.

All four paired regression models are statistically significant, however, rather small values ​​of the coefficient of determination and large errors in the average approximation indicate the poor quality of these models.

Having compared the parameters and characteristics of these equations, it is concluded that the model with a cubic dependence has the greatest reliability and accuracy. This is evidenced by the highest value of the correlation index and, accordingly, the coefficient of determination, which is closest to 1 and confirms the best quality of the model in terms of data approximation, the results of the F-test, which recognized the model as significant, as well as the average approximation error, which is smaller than that of other models. The standard errors of the regression parameters and the standard error of the forecast for this model also take smaller values.

The multiple regression equation is significant, i.e. the hypothesis about the random nature of the assessed characteristics is rejected. The resulting model is statistically reliable.

CONCLUSION

As a result of econometric research and data analysis, four paired regression equations were considered, establishing the relationship between the average annual wage of hired workers in the country and the number of people who arrived in the country for permanent residence. This is a linear model, exponential model, models with quadratic and cubic dependence. All constructed models confirm that the increase in wages of hired workers is a factor in the increase in the number of people arriving in the country for permanent residence.

The highest indicator of the closeness of the relationship between variables is in the model with cubic dependence, because the coefficient of determination in the cubic model takes the greatest value, which indicates the greatest reliability of the found regression equation. A model in the form of a cubic relationship best describes the relationship between the number of people arriving in the country for permanent residence and the annual wages of hired workers.In all considered models, the average error of approximation significantly exceeds the acceptable values, which indicates a low quality of fit of the models. However, the model with a cubic dependence is the best in terms of approximating data and assessing the closeness of the relationship, since it has the largest share of explained variation compared to other models - 52% (the coefficient of determination is closest to 1).

For all the parameters considered, the regression equation with a cubic dependence is the best of those considered. But it is not optimal for practical use and forecasting, which is explained by the large scatter of data, as well as the fact that the number of immigrants depends on many factors that cannot be taken into account in paired regression.

Not good enough characteristics of the model may be caused by the presence in the source data of units with anomalous values ​​of the characteristics under study: in the UK, the number of arrivals for permanent residence is significantly higher than this indicator for other countries. Perhaps this country should be excluded from the sample to obtain a more accurate and reliable result.

As a result of constructing multiple regression, the influence on the number of people arriving in the country for permanent residence of such factors as the country's GDP, the number of unemployed and the average annual wage of hired workers was investigated.

The change in the number of people arriving in the country for permanent residence is directly dependent on the annual wages of hired workers and the level of the country's GDP and inversely related to the number of unemployed. The largest change in the number of arrivals to the country is caused by the value of GDP, and the smallest by the number of unemployed.

The combined influence of all factors on the number of people arriving in the country for permanent residence is quite large, since the multiple correlation indextakes a high value. However, this may be due to the presence of multicollinearity.

All obtained coefficients of the multiple regression equation except the coefficient for the factor GDP level are statistically insignificant, the confidence interval for them is quite large.

Despite this, the coefficient of determination shows that the quality of the model is satisfactory. The multiple regression equation is significant, i.e. the hypothesis about the random nature of the assessed characteristics is rejected.

However, heteroscedasticity may be observed in the model, i.e. The model may need to be corrected.

These results can be explained by the rather small sample size, especially taking into account the global nature of the study, the presence of an anomalous value of the studied characteristic, the failure to take into account any significant factors, as well as the fact that the number of emigrants to the country depends on a large number of non-quantitative, personal factors, individual preferences.

Despite the lack of an exact result and a qualitative regression equation suitable for forecasting and further research, the study revealed that the wages of hired workers in the country, the unemployment rate and GDP have an important impact on the number of people arriving in the country for permanent residence.

List of sources used

1. Gerasimov, A.N. Econometrics: theory and practice [Electronic resource]: electronic textbook / Gerasimov, A.N., Gladilin, A.V., Gromov, E.I. - M.: KnoRus, 2011. - CD. - (82803-2) (U; G 37)

2. Yakovleva, A. Order. Econometrics: course of lectures - M.: Eksmo, 2010. - (83407-1)

3. Valentinov, V.A. Econometrics [Text]: workshop - M.: Dashkov i K, 2010. - 435 p. - (84265-12) (U; V 15)

4. Valentinov, V.A. Econometrics [Text]: textbook for universities on special topics. "Mathematical methods in economics" and other economics. specialist. - M.: Dashkov and K, 2010. - 448 p. - (84266-30) (U; V 15)

5. Novikov, A.I. Econometrics [Text]: textbook. manual for universities in the direction 521600 "Economics" and economics. specialties - M.: INFRA-M, 2011. - 143, p. - (86112-10) (U; N 73)

6. Kolemaev, V.A. Econometrics [Text]: textbook for universities in specialty 061800 "Mathematical methods in economics" / State. University of Ex. - M.: INFRA-M, 2010. - 160 p. - (86113-10) (U; K 60)

7. Gladilin, A.V. Econometrics [Text]: textbook. manual for universities in economics. specialties / Gladilin, A.V., Gerasimov, A.N., Gromov, E.I. - M.: KnoRus, 2011. - 227 p. - (86160-10) (U; G 52)

8. Novikov, A.I. Econometrics [Text]: textbook. benefits for example "Finance and Credit", "Economics" - M.: Dashkov and K, 2013. - 223 p. - (93895-1) (U; N 73)

9. Timofeev, V.S. Econometrics [Text]: textbook for bachelors in economics. eg and special / Timofeev, V.S., Faddeenkov, A.V., Shchekoldin, V.Yu. - M.: Yurayt, 2013. - 328 p. - (94305-3) (U; T 41)

10. Econometrics [Text]: textbook for masters, for universities in economics. directions and specials / Eliseeva, I.I., Kurysheva, S.V., Neradovskaya, Yu.V., [etc.] ; edited by I.I. Eliseeva; St. Petersburg State University of Economics and Finance - M.: Yurayt, 2012. - 449 p. - (95469-2) (U; E 40)

11. Novikov, A.I. Econometrics [Electronic resource]: textbook. manual - M.: Dashkov and K, 2013. - EBS Lan. - (104974-1) (U; N 73)

12. Varyukhin, A.M. Econometrics [Text]: lecture notes / Varyukhin, A.M., Pankina, O.Yu., Yakovleva, A.V. - M.: Yurayt, 2007. - 191 p. - (105626-1) (U; V 18)

13. Econometrics [Electronic resource]: textbook / Baldin, K.V., Bashlykov, V.N., Bryzgalov, N.A., [etc.]; edited by V.B. Utkina - Moscow: Dashkov and K, 2013. - EBS Lan. - (107123-1) (U; E 40)

14. Perepelitsa, N.M. *Econometrics: workshop (direction 100700.62 Trading business) [Electronic resource]: as part of the educational and methodological complex / Tver State. tech. University, Dept. MEN - Tver: TvSTU, 2012. - Server. - (107926-1)

EMBED Equation.3

Other similar works that may interest you.vshm>

1589. Comparative analysis of antivirus programs 79.33 KB
This final qualifying work examines the problem of combating computer viruses, which is dealt with by anti-virus programs. Among the set of programs used by the majority of personal computer users every day, antivirus programs traditionally occupy a special place.
19100. Comparative analysis of intuitive and logical thinking 22.37 KB
Comparative analysis of intuitive and logical thinking. Basic theories of thinking and approaches to its study in foreign and domestic psychology. In the process of thinking, a person reflects the objective world differently than in the processes of perception and imagination. During independent work, the main theories of thinking and approaches to its study in psychology will be considered.
18483. TALES OF THE INDIANS OF NORTH AMERICA: COMPARATIVE ANALYSIS 8.39 KB
The phenomenon of fairy tales is a very mysterious topic of research, since oral folk art, more than other types of art, is subject to changes and distortions of meaning under the influence of changing factors in the sociocultural environment.
18490. 115.79 KB
Responsibility of a public notary when performing notarial acts. Legal basis for the activities of privately practicing notaries on the territory of the Republic of Kazakhstan. Responsibility of a notary engaged in private practice. Comparative analysis of public and private notary institutions on the territory of the Republic of Kazakhstan. Judicial practice in considering cases challenging the actions of notaries when they carry out notarial...
9809. Comparative analysis and development prospects for laptop computers 343.85 KB
The problem of this study is relevant in modern conditions. This is evidenced by the frequent study of the issues raised, and despite the abundance of information about portable computers, their functional features, fundamental differences and long-term development prospects remain unclear.
14351. SHADOW ECONOMY IN MODERN INTERPRETATION: COMPARATIVE ANALYSIS 186.56 KB
To achieve the formulated goal, the following tasks are set. First, it is necessary to consider the main reasons and prerequisites for the emergence of the shadow economy. Secondly, to give a general description of the concept of the shadow economy phenomenon and its economic nature. Thirdly, there is a need to conduct a meaningful and structural analysis of this economic phenomenon
14398. COMPARATIVE ANALYSIS OF GAS FIELDS OF THE AMUDARYA REGION OF TURKMENISTAN 5.97 MB
Comparative characteristics of gas fields based on Upper and Middle Jurassic deposits. Today, the main objects for searching for oil and gas deposits are Jurassic and Cretaceous deposits. Other objects of the Amudarya region, despite their prospects, remain awaiting drilling and discovery of oil and gas fields in the Cenozoic...
20554. Comparative analysis of approaches to determining margin requirements for derivatives portfolios 275.48 KB
Central counterparties serve markets that often differ significantly in both microstructure and range of financial instruments with different risk profiles: spot markets with T+ execution mode, money market instruments (for example, repos), exchange-traded and over-the-counter derivatives
19049. COMPARATIVE ANALYSIS AND ASSESSMENT OF THE PERFORMANCE CHARACTERISTICS OF PC POWER SUPPLY units 1.04 MB
A modern power supply is a switching unit, not a power unit. The pulse unit contains more electronics and has its advantages and disadvantages. The advantages include light weight and the possibility of continuous power supply during a voltage drop. The disadvantages are that they do not have a very long service life compared to power units due to the presence of electronics.
16100. Demand for education services in Russia: comparative econometric analysis 228.72 KB
Data and variables used To analyze the expenditures of Russian households on educational services, data from a regular sample micro survey of household budgets of the Federal State Statistics Service of the Russian Federation for 2007 were used. The variable was changed to eliminate outliers in the sample and obtain more robust estimation results. Models and results Heckman model To estimate household demand for education, the Heckman model was chosen, variables with asterisks are unobservable...

The main goal of multiple regression is to build a model with a large number of factors and determine the influence of each factor separately on the result, as well as determine the total impact of the factors on the modeled indicator.

The specification of a multiple regression model includes the selection of a factor and the choice of the type of mathematical function (choice of the type of regression equation). Factors included in multiple regression should be quantitatively measurable and should not be intercorrelated, much less be in an exact functional connection (i.e., they should influence each other to a lesser extent, and to a greater extent influence the resulting trait).

The factors included in multiple regression must explain the variation in the independent variable. For example, if a model is built with a set of - factors, then the value of the determination indicator is found for it, which fixes the share of the explained variation of the resulting characteristic due to - factors.

The influence of other unaccounted factors in the model is estimated as the corresponding residual variance.

When an additional factor is included in the model, the value of the determination indicator should increase, and the value of the residual variance should decrease. If this does not happen, then the additional factor does not improve the model and is practically superfluous, and the introduction of such a factor can lead to statistical insignificance of the regression parameters according to the Student’s t-test.

The selection of factors for multiple regression is carried out in two stages:

1. Factors are selected based on the essence of the problem.

2. Based on the matrix of correlation indicators, statistics for the regression parameters are determined.

Correlation coefficients between explanatory variables, also called intercorrelation coefficients, make it possible to exclude duplicate factors from the model.

Two variables are said to be clearly collinear if the correlation coefficient is .

If the variables are clearly collinear, then they are in a strong linear relationship.



In the presence of clearly collinear variables, preference is given not to the factor more closely related to the result, but to the factor that has the least close relationship with other factors.

Based on the magnitude of pairwise correlation coefficients, only obvious collinearity of factors is revealed.

When using multiple regression, multicollaterality of facts may arise, i.e. more than two factors are related to each other by a linear relationship. In such cases, OLS becomes less reliable when estimating individual factors, resulting in difficulty in interpreting multiple regression parameters as characteristics of the action of a factor in its pure form. Linear regression parameters lose economic meaning, parameter estimates are unreliable, large standard errors arise, which can change with changes in the volume of observations, i.e. the model becomes unsuitable for analyzing and forecasting the economic situation. To assess multicollaterality of a factor, the following methods are used:

1. Determination of the matrix of paired correlation coefficients between factors, for example, if a linear multiple regression model is specified, then the determinant of the matrix of paired coefficients will take the form:

If the value of this determinant is 1

,

then the factors are non-collinear with each other.

If there is a complete linear relationship between the factors, then all pair correlation coefficients are equal to 1, resulting in

.

2. Method for testing the hypothesis of independence of variables. In this case, the null hypothesis is proved that the value has an approximate distribution with the number of degrees of freedom.

If , then the null hypothesis is rejected.

By determining and comparing the coefficients of multiple determination of a factor, using sequentially each of the factors as a dependent variable, it is possible to determine the factors responsible for multicollaterality, i.e. factor with the largest value.

There are the following ways to overcome strong interfactor correlation:

1) exclusion of one or more data from the model;

2) transformation of factors to reduce correlation;

3) combining regression equations that will reflect not only factors, but also their interaction;

4) transition of the reduced form equation, etc.

When constructing a multiple regression equation, one of the most important steps is the selection of factors included in the model. Various approaches to selecting factors based on correlation indicators to various methods, among which the most applicable:

1) Exclusion method - data is filtered out;

2) Inclusion method - an additional factor is introduced;

3) Stepwise regression analysis - exclude the previously introduced factor.

When selecting factors, the following rule is used: the number of factors included is usually 6-7 times less than the volume of the population on which the model is built.

The parameter is not subject to economic interpretation. In a power-law model, a nonlinear multiple regression equation, the coefficients , ,..., are elasticity coefficients that show how much, on average, the result will change when the corresponding factor changes by 1%, with the influence of other factors remaining unchanged.

One of the basic assumptions for constructing a qualitative model is the correct (good) specification of the regression equation. Correct specification of a regression equation means that it generally correctly reflects the relationship between the variable of interest and the explanatory factors involved in the model. This is a necessary prerequisite for further qualitative assessment of the regression model.

The wrong choice of a functional form or set of explanatory variables is called specification errors, the main types of which are.

  • 1. Dropping a significant variable. The essence of this error and its consequences are clearly illustrated by the following example. Let the theoretical model reflecting the economic dependence under consideration have the form

This model corresponds to the following empirical regression equation:

The researcher, for some reason (lack of information, superficial knowledge about the subject of research, etc.) believes that the variable Y is actually affected only by the variable X y It is limited to considering the model

At the same time, he does not consider the X2 variable as an explanatory variable, making the mistake of discarding an essential variable.

Let the empirical regression equation corresponding to the theoretical equation (9.28) have the form

The consequences of this error are quite serious. Estimates obtained using OLS using equation (9.29) are biased (M[y* 0 ] F b 0 , M[y*] F b g) and untenable even with an infinitely large number of tests. Consequently, possible interval estimates and the results of testing the corresponding hypotheses will be unreliable.

The consequences of this error will not be as serious as in the previous case. Estimates of 0, coefficients found for model (9.30) remain, as a rule, unbiased (M = b 0, M[y* 1 ] = b 1) and wealthy. However, their accuracy will decrease, while standard errors will increase, i.e. the estimates will become inefficient, which will affect their robustness. This conclusion logically follows from calculating the variances of regression coefficient estimates for these equations:

Here rXiX2- correlation coefficient between explanatory variables X 1 and X 2.

Therefore, and the equal sign is possible

only when

An increase in the dispersion of estimates can lead to erroneous results of testing hypotheses regarding the values ​​of regression coefficients and an expansion of interval estimates.

3. Choosing the wrong functional form. We illustrate the essence of the error with the following example. Let the correct regression model have the form

Any other dependence with the same variables, but having a different functional form, leads to a distortion of the true dependence. For example, in the following equations

a mistake was made in choosing the wrong functional form of the regression equation. The consequences of this error will be very serious. Typically, such an error leads either to biased estimates or to deterioration of the statistical properties of estimates of regression coefficients and other indicators of equation quality. This is primarily caused by the violation of the Gauss-Markov conditions for deviations. The predictive qualities of the model in this case are very low.

When constructing regression equations, especially at the initial stages, specification errors are made quite often due to superficial knowledge about the economic processes under study, or due to an insufficiently developed theory, or due to errors in the collection and processing of statistical data when constructing an empirical regression equation. It is important to be able to detect and correct these errors. The complexity of the detection procedure is determined by the type of error and our knowledge of the object under study.

If there is one insignificant variable in a regression equation, it will show up with a low t-statistic. In the future, this variable is excluded from consideration.

If there are several statistically insignificant explanatory variables in the equation, then another regression equation should be constructed without these insignificant variables. Then, using F-statistics, the coefficients of determination for the initial and additional regression equations are compared

where n is the number of observations;

ha - the number of explanatory variables in the original equation;

To-- the number of explanatory variables discarded from the original equation.

Possible reasoning and conclusions for this situation are given in paragraph 6.7.2.

However, carrying out these checks makes sense only with the correct selection of the type (functional form) of the regression equation, which can be done if it is consistent with the theory. For example, when constructing a Phillips curve establishing the relationship between wages Y and unemployment X, is the inverse. The following models are possible:

Note that the choice of model is not always carried out unambiguously and in the future it is necessary to compare the model with both theoretical and empirical data and improve it. Let us recall that when determining the quality of a model, the following parameters are usually analyzed:

  • a) adjusted coefficient of determination I;
  • b) t-statistics;
  • c) Durbin-Watson DW statistics;
  • d) consistency of the signs of the coefficients with the theory;
  • e) predictive qualities (errors) of the model.

If all these indicators are satisfactory, then this model can be proposed to describe the real process under study. If any of the characteristics described above is not satisfactory, that is, there is reason to doubt the quality of this model (the functional form of the equation is incorrectly chosen; an important explanatory variable is not taken into account; there is an explanatory variable that does not have a significant effect on the dependent variable).

  • Adding a non-significant variable. In some cases, too many explanatory variables are included in regression equations, and not always justifiably. For example, the theoretical model has the following form. Let the researcher replace it with a more complex model: adding at the same time an explanatory variable X2 that does not have a real impact on Y. In this case, the error of adding an unimportant variable is committed.