Methods of statistical analysis. Data Analysis: Statistical Research Methods

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Hosted at http://www.allbest.ru/

  • 3. Series of dynamics
  • Literature

1. Absolute and relative values

As a result of summary and grouping statistical material in the hands of the researcher is the most diverse information about the studied phenomena and processes. However, dwelling on the results obtained would be a big mistake, because even grouped according to given criteria and reflected in tabular or graphical form, these data are still only a kind of illustration, intermediate result, which must be analyzed - in this case, statistical. Statisticalanalysis - This performance studied object in quality dismembered systems, those. complex elements and connections, generating in his interaction organic whole.

As a result of such an analysis, a model of the object under study should be built, and, since we are talking about statistics, statistically significant elements and relationships should be used when building the model.

Actually, statistical analysis is aimed at identifying such significant elements and relationships.

Absoluteindicators(values) - total values, calculated or taken from summary statistical reports without any transformations. Absolute indicators are always nominal and are reflected in the units of measurement that were set when compiling the statistical observation program (the number of criminal cases initiated, the number of crimes committed, the number of divorces, etc.).

Absolute indicators are basic for any further statistical operations, but they themselves are of little use for analysis. In absolute terms, for example, it is difficult to judge the level of crime in different cities or regions and it is practically impossible to answer the question where crime is higher and where it is lower, since cities or regions can differ significantly in population, territory and other important parameters.

relativequantities in statistics, they are generalizing indicators that reveal the numerical form of the ratio of two compared statistical values. When calculating relative values, two absolute values ​​are most often compared, but both average and relative values ​​can be compared, obtaining new relative indicators. The simplest example of calculating a relative value is the answer to the question: how many times is one number greater than another?

Starting to consider relative values, it is necessary to take into account the following. In principle, anything can be compared, even the linear dimensions of a sheet of A4 paper with the number of products manufactured by the Lomonosov Porcelain Factory. However, such a comparison will not give us anything. The most important condition for a fruitful calculation of relative quantities can be formulated as follows:

1. The units of measurement of the compared quantities must be the same or quite comparable. The numbers of crimes, criminal cases and convicts are correlated indicators, i.e. related, but not comparable in terms of units of measurement. In one criminal case, several crimes may be considered and a group of persons convicted; Several convicts can commit one crime and, conversely, one convict can commit many deeds. The numbers of crimes, cases and convictions are comparable with the population, the number of personnel of the criminal justice system, the standard of living of the people and other data of the same year. Moreover, within one year the considered indicators are quite comparable with each other.

2. Comparable data must necessarily correspond to each other in terms of time or territory of their receipt, or both.

Absolute value, with which compared other inemasks, called basis or base comparisons, a compareandcarved indicator - magnitude comparisons. For example, when calculating the ratio of the dynamics of crime in Russia in 2000-2010. 2000 data will be baseline. They can be taken as a unit (then relative value will be expressed as a factor), per 100 (as a percentage). Depending on the dimension of the compared values, the most convenient, indicative and visual form of the expression of the relative value is chosen.

If the value being compared is much larger than the base, the resulting ratio is best expressed in terms of coefficients. For example, crime over a certain period (in years) increased by 2.6 times. The expression in times in this case will be more indicative than in percentage. As a percentage, relative values ​​are expressed when the comparison value does not differ much from the base.

Relative values ​​used in statistics, including legal ones, are different types. The following types of relative values ​​are used in legal statistics:

1. relations characterizing the structure of the population, or distribution relations;

2. the relationship of the part to the whole, or the relationship of intensity;

3. relations that characterize the dynamics;

4. relations of degree and comparison.

Relativemagnitudedistribution - This relative value, expressed in percent individual parts aggregates studied phenomena(crimes, criminals, civil cases, lawsuits, causes, preventive measures, etc.) to them general total, accepted behind 100% . This is the most common (and simplest) kind of relative data used in statistics. These are, for example, the structure of crime (by types of crimes), the structure of convictions (by types of crimes, by age of convicts), etc.

statistical analysis absolute value

Attitudeintensity(part-to-whole ratio) - a generalizing relative value that reflects the prevalence of a particular feature in the observed aggregates.

The most common indicator of intensity used in legal statistics is the intensity of crime. . Crime intensity is usually reflected by the crime rate , those. the number of crimes per 100 or 10 thousand inhabitants.

KP \u003d (P * 100000) / N

where P - absolute number registered crimes, H is the absolute number of the population.

A prerequisite that determines the very possibility of calculating such indicators, as mentioned above, is that all absolute indicators used are taken in one territory and for one period of time.

Relations,characterizingdynamics, represent generalizing relative quantities, showing change in time those or other indicators legal statistics. The time interval is usually taken as a year.

For the basis (base) equal to 1, or 100%, information about the studied feature of a certain year, which was something characteristic of the phenomenon under study, is taken. The data of the base year act as a fixed base, to which the indicators of subsequent years are percentageed.

Statistical analysis tasks often require yearly (or other periods) comparisons when base accepted data everyone previous of the year(month or other period). Such a base is called mobile. This is usually used in the analysis of time series (series of dynamics).

Relationsdegreesandcomparisons allow you to compare different indicators in order to identify which value is much larger than the other, to what extent one phenomenon differs from another or is similar to it, what is common and different in the observed statistical processes, etc.

An index is a specially created relative indicator of comparison (in time, space, when compared with a forecast, etc.), showing how many times the level of the phenomenon under study in some conditions differs from the level of the same phenomenon in other conditions. The most common indices are in economic statistics, although they also play a certain role in the analysis of legal phenomena.

Indexes are indispensable in cases where it is necessary to compare disparate indicators, the simple summation of which is impossible. Therefore, indexes are usually defined as numbers-indicatorsformeasurementsmiddlespeakersaggregatesheterogeneouselements.

In statistics, indexes are usually denoted by the letter I (i). Uppercase letter or capital - depends on whether we are talking about an individual (private) index or it is general.

Individualindices(i) reflect the ratio of the indicator of the current period to the corresponding indicator of the period being compared.

Consolidatedindices are used in the analysis of the correlation of complex socio-economic phenomena and consist of two parts: the actual indexed value and the co-measurement ("weight").

2. Averages and their application in legal statistics

The result of processing absolute and relative indicators is the construction of distribution series. Row distribution - Thisorderedonqualityorquantitativefeatureddistributionunitsaggregates. The analysis of these series is the basis of any statistical analysis, no matter how complex it turns out to be in the future.

A distribution series can be built on the basis of qualitative or quantitative features. In the first case it is called attributive, in the second - variational. In this case, the difference in a quantitative trait is called variation, and this sign itself - option. It is with variational series that legal statistics most often has to deal.

A variational series always consists of two columns (graph). One indicates the value of a quantitative attribute in ascending order, which, in fact, are called options, which are indicated x. The other column (column) indicates the number of units that are characteristic of one or another variant. They are called frequencies and are denoted by the Latin letter f.

Table 2.1

Option x

Frequency f

The frequency of manifestation of one or another trait is very important when calculating other significant statistical indicators, namely, the averages and indicators of variation.

Variation series, in turn, can be discrete or interval. Discrete series, as the name implies, are built on the basis of discretely varying features, and interval series are built on the basis of continuous variations. So, for example, the distribution of offenders by age can be either discrete (18, 19.20 years old, etc.) or continuous (up to 18 years old, 18-25 years old, 25-30 years old, etc.). Moreover, the interval series themselves can be built both according to discrete and according to continuous principle. In the first case, the boundaries of adjacent intervals do not repeat; in our example, the intervals will look like this: up to 18 years old, 18-25, 26-30, 31-35, etc. Such a series is called continuousdiscreterow. intervalrowwithcontinuousvariation assumes the coincidence of the upper limit of the previous interval with lower bound subsequent.

The very first indicator describing the variational series is medium quantities. They play an important role in legal statistics, since only with their help it is possible to characterize populations according to a quantitative variable sign, by which they can be compared. With the help of average values, it is possible to compare sets of legally significant phenomena of interest to us according to certain quantitative characteristics and draw the necessary conclusions from these comparisons.

Mediumquantities reflect most general trend (regularity), inherent in the entire mass of phenomena studied. It manifests itself in typical quantitative characteristic, i.e. in the average value of all available (variable) indicators.

Statistics has developed many types of averages: arithmetic, geometric, cubic, harmonic, etc. However, they are practically not used in legal statistics, so we will consider only two types of averages - the arithmetic average and the geometric average.

The most common and well-known average is averagearithmetic. To calculate it, the sum of the indicators is calculated and divided by total number indicators. For example, a family of 4 consists of parents aged 38 and 40 and two children aged 7 and 10. We sum up the age: 38 + 40 + 7 + 10 and divide the resulting sum of 95 by 4. The resulting average age family - 23.75 years. Or let's calculate the average monthly workload of investigators if a department of 8 people solves 25 cases per month. Divide 25 by 8 and get 3,125 cases per month per investigator.

In legal statistics, the arithmetic mean is used when calculating the workload of employees (investigators, prosecutors, judges, etc.), calculating the absolute increase in crime, calculating the sample, etc.

However, in the above example, the average monthly workload per investigator was calculated incorrectly. The fact is that the simple arithmetic mean does not take into account frequency studied trait. In our example, the average monthly workload for an investigator is as correct and informative as the "average temperature in a hospital" from a well-known anecdote, which, as you know, is room temperature. In order to take into account the frequency of manifestations of the studied trait when calculating the arithmetic mean, it is used as follows averagearithmeticweighted or average for discrete variational series. (Discrete variational series - the sequence of change of a sign according to discrete (discontinuous) indicators).

Arithmetic weighted average ( weighted average) has no fundamental differences from the simple arithmetic average. In it, the summation of the same value is replaced by multiplying this value by its frequency, i.e. in this case, each value (variant) is weighted by frequency of occurrence.

So, calculating the average workload of investigators, we must multiply the number of cases by the number of investigators who investigated exactly such a number of cases. It is usually convenient to present such calculations in the form of tables:

Table 2.2

Number of cases

(option X)

Number of investigators (frequency f)

Artwork option

to frequencies ( Xf)

2. Calculate the actual weighted average by the formula:

where x- the number of criminal cases, and f- number of investigators.

Thus, the weighted average is not 3.125, but 4.375. If you think about it, this is how it should be: the load on each individual investigator increases due to the fact that one investigator in our hypothetical department turned out to be an idler - or, on the contrary, was investigating a particularly important and complex case. But the question of interpreting the results of a statistical study will be considered in next topic. In some cases, namely, in cases of grouped frequencies discrete distribution- the calculation of the average, at first glance, is not obvious. Suppose we need to calculate the arithmetic mean for the distribution of persons convicted of hooliganism by age. The distribution looks like this:

Table 2.3

(option X)

Number of convicts (frequency f)

Interval midpoint

Artwork option

to frequencies ( Xf)

(21-18) /2+18=19,5

Further, the average is calculated according to the general rule and is 23.6 years for this discrete series. In the case of the so-called. open rows, that is, in situations where the extreme intervals are determined by "less than x" or more x", the value of the extreme intervals is set similarly to other intervals.

3. Series of dynamics

The social phenomena studied by statistics are in constant development and change. Socio-legal indicators can be presented not only in a static form, reflecting a certain phenomenon, but also as a process taking place in time and space, as well as in the form of interaction of the characteristics under study. In other words, time series show the development of a trait, i.e. its change in time, space or depending on environmental conditions.

This series is a sequence of average values ​​in the specified periods of time (for each calendar year).

For a deeper study of social phenomena and their analysis, a simple comparison of the levels of a series of dynamics is not enough; it is necessary to calculate the derived indicators of a series of dynamics: absolute growth, growth rate, growth rate, average growth and growth rates, the absolute content of one percent increase.

The calculation of indicators of the series of dynamics is carried out on the basis of a comparison of their levels. In this case, there are two ways to compare the levels of the dynamic series:

basic indicators, when all subsequent levels are compared with some initial, taken as a base;

chain indicators, when each subsequent level of a series of dynamics is compared with the previous one.

Absolute growth shows how many units the level of the current period is more or less than the level of the base or previous period for a specific period of time.

Absolute growth (P) is calculated as the difference between the compared levels.

Base Absolute Growth:

P b = y i - y bases . (f.1).

Chain Absolute Growth:

P c = y i - y i -1 (f.2).

The growth rate (Tr) shows how many times (by what percentage) the level of the current period is more or less than the level of the base or previous period:

Base growth rate:

(f.3)

Chain growth rate:

(f.4)

The growth rate (Tpr) shows how many percent the level of the current period is more or less than the level of the base or previous period, taken as the base of comparison, and is calculated as the ratio of absolute growth to the absolute level, taken as the base.

The growth rate can also be calculated by subtracting 100% from the growth rate.

Base growth rate:

or (f.5)

Chain growth rate:

or (f.6)

The average growth rate is calculated by the formula of the geometric mean of the growth rates of a series of dynamics:

(form 7)

where is the average growth rate;

- growth rates for certain periods;

n- the number of growth rates.

Similar problems with a root exponent greater than three, as a rule, are solved using the logarithm. It is known from algebra that the logarithm of the root is equal to the logarithm root value divided by the root exponent, and that the logarithm of the product of several factors is equal to the sum logarithms of these factors.

Thus, the average growth rate is calculated by taking the root n degree from the works of individual n- chain growth rates. The average growth rate is the difference between the average growth rate and one (), or 100% when the growth rate is expressed as a percentage:

or

In the absence of a dynamic series intermediate levels average growth and growth rates are determined by the following formula:

(f.8)

where is the final level of the dynamic series;

- the initial level of the dynamic series;

n - number of levels (dates).

It is obvious that the indicators of average growth rates and growth, calculated by the formulas (f.7 and f.8), have the same numerical values.

The absolute content of 1% growth shows what absolute value contains 1% growth and is calculated as the ratio of absolute growth to the growth rate.

Absolute content of 1% increase:

basic: (f.9)

chain: (f.10)

Calculation and analysis absolute value each percentage increase contribute to a deeper understanding of the nature of the development of the phenomenon under study. The data of our example show that, despite fluctuations in growth rates and growth over individual years, the basic indicators of the absolute content of 1% growth remain unchanged, while the chain indicators characterizing the changes in the absolute value of one percent growth in each subsequent year compared to the previous one are continuously increasing.

When constructing, processing and analyzing time series, there is often a need to determine the average levels of the studied phenomena for certain periods of time. The average chronological interval series is calculated at equal intervals by the formula of the arithmetic mean simple, with unequal intervals - by the arithmetic weighted average:

where - middle level interval series;

- initial levels of the series;

n- number of levels.

For the moment series of dynamics, provided that the time intervals between the dates are equal, the average level is calculated using the chronological average formula:

(f.11)

where is the average chronological value;

y 1 ,., y n- the absolute level of the series;

n - the number of absolute levels of the series of dynamics.

The average chronological of the levels of the moment series of dynamics is equal to the sum of the indicators of this series, divided by the number of indicators without one; in this case, the initial and final levels should be taken in half, since the number of dates (moments) is usually one more than the number of periods.

Depending on the content and form of presentation of the initial data (interval or moment series of dynamics, equal or no time intervals) to calculate various social indicators, for example, the average annual number of crimes and offenses (by type), the average size of working capital balances, the average number of offenders, etc., use the appropriate analytical expressions.

4. Statistical Methods interrelationships

In previous questions, we considered, if I may say so, the analysis of "one-dimensional" distributions - variational series. This is a very important, but far from the only type of statistical analysis. Analysis of variational series is the basis for more "advanced" types of statistical analysis, primarily for studyinterconnections. As a result of such a study, cause-and-effect relationships between phenomena are revealed, which makes it possible to determine which changes in signs affect the variations of the studied phenomena and processes. At the same time, the signs that cause a change in others are called factorial (factors), and the signs that change under their influence are called effective.

In statistical science, there are two types of relationships between various signs and their information - functional connection (rigidly determined) and statistical (stochastic).

For functionalconnections full correspondence between the change in the factor attribute and the change in the effective value is characteristic. This relationship is equally manifested in all units of any population. The simplest example: an increase in temperature is reflected in the volume of mercury in a thermometer. In this case, the ambient temperature acts as a factor, and the volume of mercury - as an effective feature.

Functional relationships are typical for phenomena studied by such sciences as chemistry, physics, mechanics, in which it is possible to set up "pure" experiments, in which the influence of extraneous factors is eliminated. The fact is that functional connection between the two is possible only if the second value (the resultant attribute) depends only and exclusively from the first. In public events, this is extremely rare.

Socio-legal processes, which are the result of the simultaneous impact a large number factors are described by means of statistical relationships, that is, relationships stochastically (by chance) deterministic when different values ​​of one variable correspond to different values ​​of another variable.

The most important (and common) case of stochastic dependence is correlationaddiction. With such a dependence, the cause determines the effect not unambiguously, but only with a certain degree of probability. A separate type of statistical analysis is devoted to the identification of such relationships - correlation analysis.

Main task correlation analysis - on the basis of strictly mathematical methods to establish a quantitative expression of the relationship that exists between the studied characteristics. There are several approaches to how exactly the correlation is calculated and, accordingly, several types of correlation coefficients: the contingency coefficient A.A. Chuprov (to measure the relationship between qualitative features), the association coefficient of K. Pearson, as well as the rank correlation coefficients of Spearman and Kendall. In the general case, such coefficients show the probability with which the studied relationships appear. Accordingly, the higher the coefficient, the more pronounced is the relationship between the features.

Both direct and inverse correlations can exist between the studied factors. Straightcorrelationaddiction observed in cases where the change in the values ​​of the factor corresponds to the same changes in the value of the resulting attribute, that is, when the value of the factor attribute increases, the value of the effective attribute also increases, and vice versa. For example, there is a direct correlation between criminogenic factors and crime ( with a "+" sign). If an increase in the values ​​of one attribute causes reverse changes in the values ​​of another, then such a relationship is called reverse. For example, the higher the social control in a society, the lower the crime rate (connection with the "-" sign).

Both direct and feedback can be straight and curvilinear.

Rectilinear ( linear) relationships appear when, with an increase in the values ​​of the attribute-factor, there is an increase (direct) or decrease (reverse) in the value of the attribute-consequence. Mathematically, such a relationship is expressed by the regression equation: at = a + bX, where at - sign-consequence; a and b - corresponding coupling coefficients; X - sign-factor.

Curvilinear connections are different. An increase in the value of a factor attribute has an uneven effect on the value of the resulting attribute. Initially, this relationship can be direct, and then reverse. A well-known example is the relationship of crimes with the age of offenders. First, the criminal activity of individuals grows in direct proportion to the increase in the age of offenders (up to approximately 30 years), and then, with increasing age, criminal activity decreases. Moreover, the peak of the distribution curve of offenders by age is shifted from the average to the left (toward a younger age) and is asymmetric.

Correlation direct links can be oneaboutfactorial, when the relationship between one sign-factor and one sign-consequence is investigated (pair correlation). They may also be multifactorial, when the influence of many interacting signs-factors on the sign-consequence (multiple correlation) is studied.

But, no matter which of the correlation coefficients is used, no matter what correlation is investigated, it is impossible to establish a relationship between the signs based only on statistical indicators. The initial analysis of indicators is always an analysis qualitative, during which the socio-legal nature of the phenomenon is studied and understood. In this case, those scientific methods and approaches are used that are characteristic of the branch of science that studies this phenomenon (sociology, law, psychology, etc.). Then, the analysis of groupings and averages allows you to put forward hypotheses, build models, determine the type of connection and dependency. Only after this is the quantitative characteristic of the dependence determined - in fact, the correlation coefficient.

Literature

1. Avanesov G.A. Fundamentals of criminological forecasting. Tutorial. Moscow: Higher School of the Ministry of Internal Affairs of the USSR, 1970.

2. Avrutin K.E., Gilinsky Ya.I. Criminological analysis of crime in the region: methodology, technique, technique. L., 1991.

3. Adamov E. et al. Economics and statistics of firms: Textbook / Ed. S.D. Ilyenkova. M.: Finance and statistics, 2008.

4. Balakina N.N. Statistics: Proc. - method. complex. Khabarovsk: IVESEP, branch in Khabarovsk, 2008.

5. Bluvshtein Yu.D., Volkov G.I. Time Series Crime: A Study Guide. Minsk, 1984.

6. Borovikov V.P., Borovikov I.P. STATISTICA - Statistical analysis and data processing in Windows environment. M.: Information and publishing house "Filin", 1997.

7. Borodin S.V. Fight against crime: theoretical model comprehensive program. Moscow: Nauka, 1990.

8. Questions of statistics // Monthly scientific and information journal of the State Statistics Committee of the Russian Federation. M., 2002-2009.

9. Gusarov V.M. Statistics: Proc. allowance for universities. M.: UNITI-DANA, 2009.

10. Dobrynina N.V., Nimenya I.N. Statistics: Proc. - method. allowance. St. Petersburg: SPbGIEU, 2009.

11. Eliseeva I.I., Yuzbashev M.M. General theory statistics: Textbook for universities / Ed.I. I. Eliseeva. 4th ed. M.: Finance and statistics, 1999.

12. Eliseeva I.I., Yuzbashev M.M. General Theory of Statistics: Textbook. - M.: Finance and statistics, 1995.

13. Eremina T., Matyatina V., Plushevskaya Yu. Problems of development of sectors of the Russian economy // Questions of Economics. 2009. No. 7.

14. Efimova M.R., Ganchenko O.I., Petrova E.V. Workshop on the general theory of statistics: Proc. allowance. 2nd ed., revised. and additional M.: Finance and statistics, 2009.

15. Efimova M.R., Petrova E.V., Rumyantsev V.N. General Theory of Statistics: Textbook. - M.: INFRA-M, 1998.

16. Kirillov L.A. Criminological study and crime prevention by internal affairs bodies M., 1992.

17. Kosoplechev N.P., Methods of criminological research. M., 1984.

18. Lee D.A. Crime in Russia: system analysis. M., 1997.

19. Lee D.A. Criminal statistical accounting: structural and functional patterns. M .: Information and publishing agency "Russian World", 1998.

20. Makarova N.V., Trofimets V.Ya. Statistics in Excel: Proc. allowance. M.: Finance and statistics, 2009.

21. Nesterov L.I. New trends in the statistics of national wealth // Questions of statistics. 2008. No. 11.

22. Petrova E.V. and others. Workshop on transport statistics: Proc. allowance. M.: Finance and statistics, 2008.

23. Crime in Russia in the nineties and some aspects of legality and the fight against it. M., 1995.

24. Crime, statistics, law // Ed. prof. A.I. Debt. Moscow: Criminological Association, 1997.

25. Rostov K.T. Crime in the regions of Russia (social and criminological analysis). St. Petersburg: St. Petersburg Academy of the Ministry of Internal Affairs of Russia, 1998.

26. Guidelines for the census taker on the procedure for conducting the 2002 All-Russian Population Census and filling out census documents. M.: PIK "Offset", 2003.

27. Savyuk L.K. Legal statistics: Textbook. M.: Jurist, 1999.

28. Salin V.N., Shpakovskaya E.P. Socio-economic statistics: Textbook for universities. Moscow: Gardanika Lawyer, 2008.

29. Sidenko A.V., Popov G.Yu., Matveeva V.M. Statistics: Textbook. Moscow: Business and Service, 2008.

30. Social prevention of offenses: advice, recommendations // Ed. YES. Kerimov. M., 1989.

31. social statistics: Textbook for universities // Ed. I.I. Eliseeva. 3rd ed. M.: Finance and statistics, 2009.

Hosted on Allbest.ru

Similar Documents

    Consideration of the main methods of statistical analysis. Study of the Kungursky municipal district. Carrying out calculations according to the indicators of the yearbook. Analysis of demography and socio-economic development of the area based on the results of the application.

    term paper, added 06/24/2015

    Average value - free characteristic regularities of the process under the conditions in which it takes place. Forms and methods for calculating average values. Applying Averages in Practice: Calculating Differentiation wages by sectors of the economy.

    term paper, added 12/04/2007

    Statistical methods of divorce analysis. Statistical analysis of divorces in the Amur region. Analysis of the dynamics and structure of divorces. Grouping of cities and districts of the Amur region by the number of divorces per year. Calculation of average values ​​and indicators of variation.

    term paper, added 04/12/2014

    Aspects of statistical analysis of housing provision. Application of statistical methods for the analysis of housing provision of the population. Analysis of the homogeneity of the population of districts in terms of the demographic load factor. Correlation-regression analysis.

    term paper, added 01/18/2009

    Organization state statistics in Russia. Requirements for the collected data. Forms, types and methods of statistical observation. Preparation of statistical observation. Errors of statistical observation. Methods for monitoring statistics.

    abstract, added 02.12.2007

    Development of a monitoring program for criminal law statistics, its main stages and requirements, methods and procedures for implementation. Determining the state of crime in the study area. Rules for registration of the results of statistical observation.

    test, added 05/18/2010

    Classification of statistical documentation. Types of documents: written, iconographic, statistical and phonetic. Methods and ways of analyzing materials: non-formalized (traditional) and formalized. The procedure for the implementation of content analysis.

    presentation, added 02/16/2014

    concept medium size. The method of averages in the study of social phenomena. The relevance of the application of the method of averages in the study of social phenomena is ensured by the possibility of moving from the singular to the general, from random to regular.

    term paper, added 01/13/2009

    The concept of statistical observation. Analysis of rectilinear and curvilinear correlations. Acquaintance with formulas and values ​​of statistical observation. Analysis of calculations of the relationship of indices, construction of a histogram, elements of a distribution series.

    test, added 03/27/2012

    Characteristics of the main indicators of statistical analysis social conditioning public health in Russian Federation. Levels of health assessment from the point of view of social medicine. Classification of the children's part of the population by health groups.

Sufficiently detailed in domestic literature. In the practice of Russian enterprises, meanwhile, only some of them are used. Consider next some methods statistical processing.

General information

In the practice of domestic enterprises, it is predominantly common statistical control methods. If we talk about the regulation of the technological process, then it is noted extremely rarely. Application of statistical methods provides that a group of specialists who have the appropriate qualifications is formed at the enterprise.

Meaning

According to ISO ser. 9000, the supplier needs to determine the need for statistical methods that are applied during the development, regulation and testing of opportunities production process and product characteristics. The methods used are based on the theory of probability and mathematical calculations. Statistical methods for data analysis can be implemented at any stage of the product life cycle. They provide an assessment and account of the degree of heterogeneity of products or the variability of their properties relative to the established denominations or required values, as well as the variability of the process of its creation. Statistical methods are methods by which you can given accuracy and reliability to judge the state of the phenomena that are being investigated. They allow you to predict certain problems, develop optimal solutions based on the studied factual information, trends and patterns.

Directions of use

The main areas in which there are widespread statistical methods are:


Practice of developed countries

Statistical methods are a base that ensures the creation of products with high consumer characteristics. These techniques are widely used in industrialized countries. Statistical methods are, in fact, guarantees that consumers receive products that meet established requirements. The effect of their use has been proven by practice. industrial enterprises Japan. It was they who contributed to the achievement of the highest production level in this country. Long-term experience of foreign countries shows how effective these techniques are. In particular, it is known that Hewlelt Packard, using statistical methods, was able to reduce the number of marriages per month from 9,000 to 45 units in one of the cases.

Difficulties of implementation

In domestic practice, there are a number of obstacles that do not allow the use statistical methods of study indicators. Difficulties arise due to:


Program development

It must be said that determining the need for certain statistical methods in the field of quality, choosing, mastering specific techniques is a rather complicated and lengthy job for any domestic enterprise. For its effective implementation, it is advisable to develop a special long-term program. It should provide for the formation of a service whose tasks will include the organization and methodological guide application of statistical methods. Within the framework of the program, it is necessary to provide for equipping with appropriate technical means, training specialists, and determining the composition of production tasks that should be solved using the selected methods. Mastering is recommended to start with using the simplest approaches. For example, you can use the well-known elementary production. Subsequently, it is advisable to move on to other methods. For example, it can be analysis of variance, selective processing of information, regulation of processes, planning of factorial research and experiments, etc.

Classification

Statistical methods of economic analysis include different tricks. Needless to say, there are quite a few of them. However, a leading expert in the field of quality management in Japan, K. Ishikawa, recommends using seven basic methods:

  1. Pareto charts.
  2. Grouping information according to common features.
  3. Control cards.
  4. Cause and effect diagrams.
  5. Histograms.
  6. Control sheets.
  7. Scatter charts.

Based on his own experience in the field of management, Ishikawa claims that 95% of all issues and problems in the enterprise can be solved using these seven approaches.

Pareto chart

This one is based on a certain ratio. It has been called the "Pareto Principle". According to him, out of 20% of the causes, 80% of the consequences appear. shows in a clear and understandable way the relative influence of each circumstance on common problem in descending order. This impact can be investigated on the number of losses, defects, provoked by each cause. Relative influence is illustrated by bars, cumulative influence of factors by a cumulative straight line.

cause and effect diagram

On it, the problem under study is conventionally depicted in the form of a horizontal straight arrow, and the conditions and factors that indirectly or directly affect it are in the form of oblique arrows. When building, even seemingly insignificant circumstances should be taken into account. This is due to the fact that in practice there are quite often cases in which the solution of the problem is ensured by the exclusion of several seemingly insignificant factors. The reasons that influence the main circumstances (of the first and subsequent orders) are depicted on the diagram with horizontal short arrows. The detailed diagram will be in the form of a fish skeleton.

Grouping information

This economic-statistical method is used to organize a set of indicators that were obtained by evaluating and measuring one or more parameters of an object. As a rule, such information is presented in the form of an unordered sequence of values. These can be the linear dimensions of the workpiece, the melting point, the hardness of the material, the number of defects, and so on. Based on such a system, it is difficult to draw conclusions about the properties of the product or the processes of its creation. Ordering is done using line charts. They clearly show changes in observed parameters over a certain period.

Control sheet

As a rule, it is presented in the form of a frequency distribution table for the occurrence of the measured values ​​of the object's parameters in the corresponding intervals. Checklists are compiled depending on the purpose of the study. The range of indicator values ​​is divided into equal intervals. Their number is usually chosen equal to the square root of the number of measurements taken. The form should be simple in order to eliminate problems when filling out, reading, checking.

bar graph

It is presented in the form of a stepped polygon. It clearly illustrates the distribution of measurement indicators. Range set values is divided into equal intervals, which are laid along the x-axis. A rectangle is built for each interval. Its height is equal to the frequency of occurrence of the value in the given interval.

Scatterplots

They are used to test the hypothesis about the relationship between two variables. The model is built as follows. The value of one parameter is plotted on the abscissa axis, and the value of another indicator is plotted on the ordinate. As a result, a dot appears on the graph. These actions are repeated for all values ​​of the variables. If there is a relationship, the correlation field is extended, and the direction will not coincide with the direction of the y-axis. If there is no constraint, it will be parallel to one of the axes or will have the shape of a circle.

Control cards

They are used when evaluating a process over a specific period. The formation of control charts is based on the following provisions:

  1. All processes deviate from the set parameters over time.
  2. The unstable course of the phenomenon does not change by chance. Deviations that go beyond the boundaries of the expected limits are non-random.
  3. Individual changes can be predicted.
  4. A stable process can randomly deviate within the expected limits.

Use in the practice of Russian enterprises

It should be said that domestic and foreign experience shows that the most effective statistical method for assessing the stability and accuracy of equipment and technological processes is the compilation of control charts. This method is also used in the regulation of production potential capacities. When constructing maps, it is necessary to choose the parameter under study correctly. It is recommended to give preference to those indicators that are directly related to the intended use of the product, that can be easily measured and that can be influenced by process control. If such a choice is difficult or not justified, it is possible to evaluate the values ​​correlated (interrelated) with the controlled parameter.

Nuances

If the measurement of indicators with the accuracy required for mapping according to a quantitative criterion is not economically or technically possible, an alternative sign is used. Terms such as "marriage" and "defect" are associated with it. The latter is understood as each separate non-compliance of the product with the established requirements. Marriage is a product, the provision of which is not allowed to consumers, due to the presence of defects in it.

Peculiarities

Each type of card has its own specifics. It must be taken into account when choosing them for a particular case. Cards by quantitative criterion are considered more sensitive to process changes than those that use an alternative feature. However, the former are more labor intensive. They are used for:

  1. Process debugging.
  2. Assessing the possibilities of introducing technology.
  3. Checking the accuracy of the equipment.
  4. Tolerance definitions.
  5. Multiple Mappings acceptable ways product creation.

Additionally

If the disorder of the process is characterized by the displacement of the controlled parameter, it is necessary to use X-maps. If there is an increase in the dispersion of values, R or S models should be chosen. It is necessary, however, to take into account a number of features. In particular, the use of S-charts will make it possible to more accurately and quickly establish the disorder of the process than R-models with the same ones. At the same time, the construction of the latter does not require complex calculations.

Conclusion

In economics, it is possible to explore the factors that are revealed in the course of qualitative assessment, in space and dynamics. They can be used to perform predictive calculations. Statistical methods of economic analysis do not include methods for assessing the cause-and-effect relationships of economic processes and events, identifying promising and untapped reserves for improving performance. In other words, factorial techniques are not included in the considered approaches.

statistics"biostatistics".

1. nominal;
2. ordinal;
3. interval;

samples

representative

sample frame simple random sample interval sampling

stratified sampling

cluster and sampling quota

null hypothesis

alternative hypothesis power

confidence level».


Title: Fundamentals of statistical data analysis
Detailed description:

After the completion of any scientific research, fundamental or experimental, a statistical analysis of the data obtained is carried out. In order for the statistical analysis to be successfully carried out and to solve the tasks, the study must be properly planned. Therefore, without understanding the basics of statistics, it is impossible to plan and process the results of a scientific experiment. However, medical education does not provide not only knowledge of statistics, but even the basics higher mathematics. Therefore, very often one can come across the opinion that only a statistician should deal with statistical processing in biomedical research, and a medical researcher should focus on medical issues of his own. scientific work. Such a division of labor, implying assistance in data analysis, is fully justified. However, an understanding of the principles of statistics is necessary at least in order to avoid incorrect setting of the problem for a specialist, communication with whom before the start of the study is as important as at the stage of data processing.

Before talking about the basics of statistical analysis, it is necessary to clarify the meaning of the term " statistics". There are many definitions, but the most complete and concise, in our opinion, is the definition of statistics as "the science of collecting, presenting and analyzing data". In turn, the use of statistics in applications to the living world is called "biometrics" or " biostatistics".

It should be noted that very often statistics is reduced only to the processing of experimental data, without paying attention to the stage of obtaining them. However, statistical knowledge is necessary already during the planning of the experiment, so that the indicators obtained during it can give the researcher reliable information. Therefore, we can say that the statistical analysis of the results of the experiment begins even before the start of the study.

Already at the stage of developing a plan, the researcher should clearly understand what type of variables will be in his work. All variables can be divided into two classes: qualitative and quantitative. What range a variable can take depends on the scale of measurement. There are four main scales:

1. nominal;
2. ordinal;
3. interval;
4. rational (scale of relations).

In the nominal scale (the scale of “names”) there are only symbols for describing some classes of objects, for example, “gender” or “profession of the patient”. The nominal scale implies that the variable will take values, quantitative relationships between which cannot be determined. Thus, it is impossible to establish a mathematical relationship between the male and female sexes. Conventional numerical designations (women - 0, men - 1, or vice versa) are given absolutely arbitrarily and are intended only for computer processing. The nominal scale is qualitative in its purest form; individual categories in this scale are expressed by frequencies (the number or proportion of observations, percentages).

The ordinal (ordinal) scale provides that individual categories in it can be arranged in ascending or descending order. In medical statistics, a classic example of an ordinal scale is the gradation of the severity of a disease. In this case, we can build the severity in ascending order, but still do not have the ability to specify quantitative relationships, i.e. the distance between the values ​​measured in the ordinal scale is unknown or does not matter. It is easy to establish the order of the values ​​of the “severity” variable, but it is impossible to determine how many times a severe condition differs from a moderate condition.

The ordinal scale refers to gender quantitative types data, and its gradations can be described both by frequencies (as in a qualitative scale) and by measures central values which we will focus on below.

Interval and rational scales are purely quantitative data types. In the interval scale, we can already determine how much one value of a variable differs from another. Thus, an increase in body temperature by 1 degree Celsius always means an increase in the heat released by a fixed number of units. However, in the interval scale there are both positive and negative values(no absolute zero). In this regard, it is impossible to say that 20 degrees Celsius is twice as warm as 10. We can only state that 20 degrees is as much warmer than 10 as 30 is warmer than 20.

The rational scale (the ratio scale) has one reference point and only positive values. In medicine, most rational scales are concentrations. For example, a glucose level of 10 mmol/L is twice the concentration compared to 5 mmol/L. For temperature, the rational scale is the Kelvin scale, where there is absolute zero (absence of heat).

It should be added that any quantitative variable can be continuous, as in the case of measuring body temperature (this is a continuous interval scale), or discrete, if we count the number of blood cells or the offspring of laboratory animals (this is a discrete rational scale).

These differences are of decisive importance for the choice of methods for statistical analysis of experimental results. So, for nominal data, the chi-square test is applicable, and the well-known Student's test requires that the variable (interval or rational) be continuous.

After the question of the type of the variable has been resolved, it is necessary to start forming samples. A sample is a small group of objects of a certain class (in medicine, a population). To obtain absolutely accurate data, it is necessary to study all objects of a given class, however, for practical (often financial) reasons, only a part of the population, which is called the sample, is studied. In the future, statistical analysis allows the researcher to extend the patterns obtained to the entire population with a certain degree of accuracy. In fact, all biomedical statistics is aimed at obtaining the most accurate results from the least possible number of observations, because in human research, an ethical issue is also important. We can't afford to risk large quantity patients than necessary.

The creation of a sample is regulated by a number of mandatory requirements, the violation of which can lead to erroneous conclusions from the results of the study. First, sample size is important. The accuracy of estimating the studied parameters depends on the sample size. The word "accuracy" should be taken into account here. How more sizes of the studied groups, the more accurate (but not necessarily correct) results the scientist receives. In order for the results of sampling studies to be transferable to the entire population as a whole, the sample must be representative. The representativeness of the sample implies that it reflects all the essential properties of the population. In other words, in the studied groups, persons of different sex, age, professions, social status, etc. are found with the same frequency as in the entire population.

However, before starting the selection of the study group, one should decide on the need to study a particular population. An example of a population can be all patients with a certain nosology or people of working age, etc. Thus, the results obtained for a population of young people of military age can hardly be extrapolated to postmenopausal women. The set of characteristics that the study group will have determines the "generalizability" of the study data.

Samples can be generated in various ways. The easiest one is choosing with a random number generator. required amount objects from a population or sample frame(sampling frame). This method is called simple random sample". If you randomly choose a starting point in the sampling frame, and then take every second, fifth, or tenth object (depending on what group sizes are required in the study), you get interval sampling. Interval sampling is not random, since the possibility of periodic repetitions of data within the sampling frame is never excluded.

It is possible to create the so-called " stratified sampling”, which assumes that the population consists of several different groups and this structure should be reproduced in the experimental group. For example, if the ratio of men to women in a population is 30:70, then in a stratified sample, their ratio should be the same. At this approach It is critically important not to balance the sample excessively, that is, to avoid the homogeneity of its characteristics, otherwise the researcher may miss the chance to find differences or relationships in the data.

In addition to the described methods of forming groups, there are also cluster and sampling quota. The first one is used when obtaining complete information about the sample frame is difficult due to its size. Then the sample is formed from several groups included in the population. The second - quota - is similar to a stratified sample, but here the distribution of objects does not correspond to that in the population.

Returning to the sample size, it should be said that it is closely related to the probability of statistical errors of the first and second kind. Statistical errors may be due to the fact that the study does not study the entire population, but part of it. Type I error is the erroneous deviation null hypothesis. In turn, the null hypothesis is the assumption that all the studied groups are taken from the same general population, which means that the differences or relationships between them are random. If we draw an analogy with diagnostic tests, then a type I error is a false positive result.

Type II error is an incorrect deviation alternative hypothesis, the meaning of which lies in the fact that the differences or relationships between groups are due not to a random coincidence, but to the influence of the studied factors. And again the analogy with diagnostics: an error of the second kind is a false negative result. Related to this error is the notion power, which tells about how effective a certain statistical method is under given conditions, about its sensitivity. The power is calculated by the formula: 1-β, where β is the probability of a Type II error. This indicator depends mainly on the sample size. The larger the group sizes, the lower the probability of a Type II error and the higher the power of statistical tests. This dependence is at least quadratic, that is, reducing the sample size by half will lead to a drop in power at least four times. The minimum allowable power is considered to be 80%, and the maximum allowable level of error of the first kind is 5%. However, it should always be remembered that these boundaries are arbitrary and may change depending on the nature and objectives of the study. As a rule, an arbitrary change in power is recognized by the scientific community, but in the overwhelming majority of cases, the level of error of the first kind cannot exceed 5%.

All of the above is directly related to the research planning stage. However, many researchers mistakenly refer to statistical data processing only as some kind of manipulation performed after the completion of the main part of the work. Often, after the end of an unplanned experiment, there is an irresistible desire to order the analysis of statistical data on the side. But it will be very difficult even for a statistician to extract the result expected by the researcher from the “heap of garbage”. Therefore, with insufficient knowledge of biostatistics, it is necessary to seek help in statistical analysis even before the start of the experiment.

Turning to the analysis procedure itself, two main types of statistical techniques should be pointed out: descriptive and evidence-based (analytical). Descriptive techniques include techniques to present data in a compact and easy-to-understand manner. These include tables, graphs, frequencies (absolute and relative), measures of central tendency (mean, median, mode), and measures of data scatter (variance, standard deviation, interquartile interval, etc.). In other words, descriptive methods characterize the studied samples.

The most popular (though often misleading) way of describing available quantitative data is to define the following indicators:

  • the number of observations in the sample or its size;
  • average value (arithmetic mean);
  • standard deviation is a measure of how widely the values ​​of variables change.

It is important to remember that the arithmetic mean and standard deviation are measures of central tendency and scatter in a fairly small number of samples. In such samples, the values ​​of most objects with equally likely deviated from the mean, and their distribution forms a symmetrical "bell" (Gaussian or Gauss-Laplace curve). Such a distribution is also called “normal”, but in the practice of a medical experiment it occurs only in 30% of cases. If the values ​​of the variable are distributed asymmetrically about the center, then the groups are best described using the median and quantiles (percentiles, quartiles, deciles).

Having completed the description of the groups, it is necessary to answer the question about their relationships and the possibility of generalizing the results of the study to the entire population. For this, evidence-based methods of biostatistics are used. It is about them that researchers first of all remember when it comes to statistical data processing. Usually this stage of work is called "testing statistical hypotheses".

The tasks of hypothesis testing can be divided into two large groups. The first group answers the question of whether there are differences between groups in the level of some indicator, for example, differences in the level of hepatic transaminases in patients with hepatitis and healthy people. The second group allows you to prove the existence of a relationship between two or more indicators, for example, the function of the liver and the immune system.

In practical terms, tasks from the first group can be divided into two subtypes:

  • comparison of the indicator in only two groups (healthy and sick, men and women);
  • comparison of three or more groups (study of different doses of the drug).

It should be taken into account that statistical methods differ significantly for qualitative and quantitative data.

In a situation where the variable being studied is qualitative and only two groups are being compared, the chi-square test can be used. This is a fairly powerful and widely known criterion, however, it is not effective enough if the number of observations is small. To solve this problem, there are several methods, such as the Yates correction for continuity and Fisher's exact method.

If the variable under study is quantitative, then one of two types of statistical tests can be used. Criteria of the first type are based on a specific type of distribution of the general population and operate with the parameters of this population. Such criteria are called "parametric", and they are usually based on the assumption of a normal distribution of values. Non-parametric tests are not based on the assumption about the type of distribution of the general population and do not use its parameters. Sometimes such criteria are called "distribution-free tests". To a certain extent, this is erroneous, since any non-parametric test assumes that the distributions in all compared groups will be the same, otherwise false positive results may be obtained.

There are two parametric tests applied to data drawn from a normally distributed population: Student's t-test to compare two groups and Fisher's F-test to test for equality of variances (aka ANOVA). There are much more nonparametric criteria. Different tests differ from each other in the assumptions on which they are based, in the complexity of the calculations, in statistical power, etc. However, the Wilcoxon test (for related groups) and the Mann-Whitney test, also known as the test Wilcoxon for independent samples. These tests are convenient in that they do not require assumptions about the nature of the data distribution. But if it turns out that the samples are taken from a normally distributed general population, then their statistical power will not differ significantly from that for the Student's test.

A full description of statistical methods can be found in special literature, however, the key point is that each statistical test requires a set of rules (assumptions) and conditions for its use, and mechanical enumeration of several methods to find the “desired” result is absolutely unacceptable with scientific point vision. In this sense, statistical tests are close to drugs - each has indications and contraindications, side effects and the likelihood of failure. And just as dangerous is the uncontrolled use of statistical tests, because hypotheses and conclusions are based on them.

For a more complete understanding of the issue of the accuracy of statistical analysis, it is necessary to define and analyze the concept of " confidence level." Confidence probability is a value taken as a boundary between probable and improbable events. Traditionally, it is denoted by the letter "p". For many researchers, the sole purpose of performing statistical analysis is to calculate the coveted p value, which seems to put commas in famous phrase"execution cannot be pardoned." The maximum allowable confidence level is 0.05. It should be remembered that the confidence level is not the probability of some event, but a matter of confidence. By exposing the confidence probability before starting the analysis, we thereby determine the degree of confidence in the results of our research. And, as you know, excessive gullibility and excessive suspicion equally negatively affect the results of any work.

The level of confidence indicates the maximum probability of a Type I error that the researcher considers acceptable. Decreasing the level of confidence, in other words, tightening the conditions for testing hypotheses, increases the likelihood of type II errors. Therefore, the choice of the level of confidence must be made taking into account the possible damage from the occurrence of errors of the first and second kind. For example, the strict limits adopted in biomedical statistics, which determine the proportion of false positive results of no more than 5%, is a severe necessity, because new treatments are introduced or rejected based on the results of medical research, and this is a matter of life for many thousands of people.

It must be borne in mind that the p value itself is not very informative for a doctor, since it only tells about the probability of an erroneous rejection of the null hypothesis. This indicator does not say anything, for example, about the size of the therapeutic effect when using the study drug in the general population. Therefore, there is an opinion that instead of the level of confidence, it would be better to evaluate the results of the study by the size of the confidence interval. Confidence interval is the range of values ​​within which the true population value (for mean, median, or frequency) is contained with a certain probability. In practice, it is more convenient to have both of these values, which makes it possible to more confidently judge the applicability of the results obtained to the population as a whole.

In conclusion, a few words should be said about the tools used by a statistician or a researcher who independently analyzes data. Manual calculations are long gone. The statistical computer programs that exist today make it possible to carry out statistical analysis without having a serious mathematical training. Such powerful systems as SPSS, SAS, R, etc. enable the researcher to use complex and powerful statistical methods. However, this is not always a good thing. Without knowing the degree of applicability of the statistical tests used to specific experimental data, the researcher can make calculations and even get some numbers at the output, but the result will be very doubtful. So, prerequisite to carry out statistical processing of the results of the experiment, there must be a good knowledge mathematical foundations statistics.


Statistical methods - methods for analyzing statistical data. Allocate methods of applied statistics that can be applied in all areas scientific research and any industries National economy, and other statistical methods, the applicability of which is limited to a particular area. This refers to methods such as statistical acceptance control, statistical control of technological processes, reliability and testing, and design of experiments.

Statistical methods of data analysis are used in almost all areas of human activity. They are used whenever it is necessary to obtain and substantiate any judgments about a group (objects or subjects) with some internal heterogeneity. It is advisable to distinguish three types of scientific and applied activities in the field of statistical methods of data analysis (according to the degree of specificity of methods associated with immersion in specific problems):

a) development and research of general purpose methods, without taking into account the specifics of the field of application;

b) development and research of statistical models real phenomena and processes in accordance with the needs of a particular area of ​​activity;

c) application of statistical methods and models for statistical analysis of specific data.

Dispersion analysis. Analysis of variance (from the Latin Dispersio - dispersion / in English Analysis Of Variance - ANOVA) is used to study the influence of one or more qualitative variables (factors) on one dependent quantitative variable (response). The analysis of variance is based on the assumption that some variables can be considered as causes (factors, independent variables), and others as consequences (dependent variables). The independent variables are sometimes called adjustable factors precisely because in the experiment the researcher has the opportunity to vary them and analyze the resulting result.

main goal analysis of variance (ANOVA) is the study of the significance of differences between means by means of a comparison (analysis) of variances. Dividing the total variance into multiple sources allows one to compare the variance due to intergroup difference with the variance due to within-group variability. If the null hypothesis is true (about the equality of means in several groups of observations selected from the general population), the estimate of the variance associated with intragroup variability should be close to the estimate of intergroup variance. If you are just comparing the means of two samples, the analysis of variance will give the same result as a regular independent sample t-test (if you are comparing two independent groups objects or observations) or a t-test for dependent samples (if two variables are compared on the same set of objects or observations).


The essence of analysis of variance consists in dividing the total variance of the studied trait into separate components due to the influence of specific factors, and testing hypotheses about the significance of the influence of these factors on the studied trait. Comparing the components of the variance with each other using the Fisher F-test, it is possible to determine what proportion of the total variability of the resulting trait is due to the action of adjustable factors.

starting material for analysis of variance are the data of the study of three or more samples, which can be either equal or unequal in number, both connected and disconnected. According to the number of identified adjustable factors, analysis of variance can be single-factor (in this case, the influence of one factor on the results of the experiment is studied), two-factor (when studying the influence of two factors) and multifactorial (allows you to evaluate not only the influence of each of the factors separately, but also their interaction).

Analysis of variance applies to the group of parametric methods and therefore it should only be applied when it has been proven that the distribution is normal.

Analysis of variance is used, if the dependent variable is measured on a scale of ratios, intervals, or order, and the influencing variables are of a non-numerical nature (name scale).

Task examples. In tasks that are being solved analysis of variance, there is a response of a numerical nature, which is affected by several variables of a nominal nature. For example, several types of livestock fattening rations or two ways of keeping them, etc.

Example 1: During the week, several pharmacy kiosks worked in three different places. In the future, we can leave only one. It is necessary to determine whether there is a statistical significant difference between the sales volumes of drugs in kiosks. If yes, we will select the kiosk with the highest average daily sales volume. If the difference in sales volume turns out to be statistically insignificant, then other indicators should be the basis for choosing a kiosk.

Example 2: Comparison of contrasts of group means. The seven political affiliations are ordered from extremely liberal to extremely conservative, and linear contrast is used to test whether there is a non-zero upward trend in group means—i.e., whether there is a significant linear increase in mean age when considering groups ordered in the direction from liberal to conservative.

Example 3: Two-way analysis of variance. The number of product sales, in addition to the size of the store, is often affected by the location of the shelves with the product. This example contains weekly sales figures characterized by four shelf layouts and three store sizes. The results of the analysis show that both factors - the location of the shelves with the goods and the size of the store - affect the number of sales, but their interaction is not significant.

Example 4: Univariate ANOVA: Randomized two-treatment full block design. The influence of all possible combinations of three fats and three dough rippers on the baking of bread is investigated. Four flour samples taken from four different sources served as block factors. It is necessary to identify the significance of the fat-ripper interaction. After that, to determine the various options for choosing contrasts, allowing you to find out which combinations of levels of factors differ.

Example 5: Model of a hierarchical (nested) plan with mixed effects. The influence of four randomly selected heads mounted in a machine tool on the deformation of manufactured glass cathode holders is studied. (The heads are built into the machine, so the same head cannot be used on different machines.) The head effect is treated as a random factor. The ANOVA statistics show that there are no significant differences between machines, but there are indications that the heads may differ. The difference between all the machines is not significant, but for two of them the difference between the types of heads is significant.

Example 6: Univariate repeated measurements analysis using a split-plot plan. This experiment was conducted to determine the effect of an individual's anxiety rating on exam performance on four consecutive attempts. The data are organized so that they can be considered as groups of subsets of the entire data set ("the whole plot"). The effect of anxiety was not significant, while the effect of trying was significant.

Covariance analysis. Covariance analysis - a set of methods of mathematical statistics related to the analysis of models of the dependence of the mean value of some random variable simultaneously on a set of (main) qualitative factors and (associated) quantitative factors. Factors F set combinations of conditions under which observations X, Y were obtained, and are described using indicator variables, and among the accompanying and indicator variables there can be both random and non-random (controlled in the experiment).

If the random variable Y is a vector, then one speaks of a multivariate analysis of covariance.

Analysis of covariance is often used before analysis of variance, to check the homogeneity (homogeneity, representativeness) of the sample of observations X,Y for all concomitant factors.