Selective observation: concept, types, sampling errors, evaluation of results. Examples of problem solving

During selective observation, it should be ensured accident unit selection. Each unit must have an equal opportunity to be selected with the others. This is what random sampling is based on.

TO proper random sample refers to the selection of units from the entire general population (without preliminary dividing it into any groups) by drawing lots (mainly) or some other similar method, for example, using a table of random numbers. Random selection This selection is not random. The principle of randomness suggests that the inclusion or exclusion of an object from the sample cannot be influenced by any factor other than chance. An example actually random selection can serve as circulations of winnings: from the total number of issued tickets, a certain part of the numbers that account for winnings is randomly selected. Moreover, all numbers are provided with an equal opportunity to get into the sample. In this case, the number of units selected in the sample set is usually determined based on the accepted proportion of the sample.

Sample share is the ratio of the number of units of the sample population to the number of units of the general population:

So, with a 5% sample from a batch of parts in 1000 units. sample size P is 50 units, and with a 10% sample - 100 units. etc. With the correct scientific organization of sampling, representativeness errors can be reduced to minimal values, as a result, selective observation becomes sufficiently accurate.

Proper random selection "in its pure form" is rarely used in the practice of selective observation, but it is the starting point among all other types of selection, it contains and implements the basic principles of selective observation.

Let us consider some questions of the theory of the sampling method and the error formula for a simple random sample.

When applying the sampling method in statistics, two main types of generalizing indicators are usually used: the average value of a quantitative trait And the relative value of the alternative feature(the proportion or proportion of units in the statistical population, which differ from all other units of this population only by the presence of the trait being studied).

Sample share (w), or frequency, is determined by the ratio of the number of units that have the characteristic under study T, to the total number of sampling units P:

For example, if out of 100 sample details ( n=100), 95 parts turned out to be standard (T=95), then the sample fraction

w=95/100=0,95 .

To characterize the reliability of sample indicators, there are middle And marginal sampling error.

Sampling error ? or, in other words, the representativeness error is the difference between the corresponding sample and general characteristics:

*

*

Sampling error is characteristic only of selective observations. The greater the value of this error, the more the sample indicators differ from the corresponding general indicators.

The sample mean and the sample share are inherently random variables, which can take on different values ​​depending on which units of the population were included in the sample. Therefore, sampling errors are also random variables and can take on different values. Therefore, determine the average of the possible errors - the average sample error.

What does it depend on mean sampling error? Subject to the principle of random selection, the average sampling error is determined primarily sample size: the larger the population, ceteris paribus, the smaller the average sampling error. Covering a sample survey with an increasing number of units of the general population, we more and more accurately characterize the entire population.

The mean sampling error also depends on degree of variation studied trait. The degree of variation, as you know, is characterized by dispersion? 2 or w(1-w)-- for an alternative feature. The smaller the variation of the feature, and hence the variance, the smaller the average sampling error, and vice versa. With zero dispersion (the attribute does not vary), the average sampling error is zero, i.e., any unit of the general population will accurately characterize the entire population according to this attribute.

The dependence of the average sampling error on its volume and the degree of variation of the attribute is reflected in the formulas that can be used to calculate the average sampling error under conditions of sample observation, when the general characteristics ( x,p) are unknown, and therefore, it is not possible to find the real sampling error directly from the formulas (form. 1), (form. 2).

W With random selection average errors theoretically calculated by the following formulas:

* for the average quantitative trait

* for share (alternative characteristic)

Since practically the variance of the attribute in the general population? 2 is not exactly known, in practice they use the value of the variance S 2 calculated for the sample population on the basis of the law of large numbers, according to which the sample population with a sufficiently large sample size accurately reproduces the characteristics of the general population.

Thus, calculation formulas middle sampling errors random resampling will be as follows:

* for the average quantitative trait

* for share (alternative characteristic)

However, the variance of the sample population is not equal to the variance of the general population, and therefore, the average sampling errors calculated by the formulas (form. 5) and (form. 6) will be approximate. But in the theory of probability it is proved that the general variance is expressed through the elective by the following relation:

Because P/(n-1) for sufficiently large P -- value close to unity, it can be assumed that, and therefore, in practical calculations of the average sampling errors, formulas (form. 5) and (form. 6) can be used. And only in cases of a small sample (when the sample size does not exceed 30) it is necessary to take into account the coefficient P/(n-1) and calculate small sample mean error according to the formula:

W X With random non-repetitive selection in the above formulas for calculating the average sampling errors, it is necessary to multiply the root expression by 1-(n / N), since the number of units in the general population is reduced in the process of non-repetitive sampling. Therefore, for a non-repetitive selection calculation formulas mean sampling error will take the following form:

* for the average quantitative trait

* for share (alternative characteristic)

. (form. 10)

Because P always less N, then the additional factor 1-( n/n) will always be less than one. It follows from this that the average error with non-repetitive selection will always be less than with repeated selection. At the same time, with a relatively small percentage of the sample, this factor is close to one (for example, with a 5% sample it is 0.95; with a 2% sample it is 0.98, etc.). Therefore, sometimes in practice, formulas (forms 5) and (forms 6) are used to determine the average sampling error without the specified multiplier, although the sample is organized as a non-repeated one. This occurs when the number of units of the general population N is unknown or unlimited, or when P very little compared to N, and in essence, the introduction of an additional factor, close in value to one, will practically not affect the value of the average sampling error.

Mechanical sampling consists in the fact that the selection of units in the sample from the general, divided by a neutral criterion into equal intervals (groups), is carried out in such a way that only one unit is selected from each such group in the sample. To avoid systematic error, the unit that is in the middle of each group should be selected.

When organizing mechanical selection, the units of the population are pre-arranged (usually in a list) in a certain order (for example, alphabetically, by location, in ascending or descending order of the values ​​of any indicator that is not associated with the property under study, etc.). etc.), after which a given number of units is selected mechanically, at a certain interval. In this case, the size of the interval in the general population is equal to the reciprocal of the sample share. So, with a 2% sample, every 50th unit (1: 0.02) is selected and checked, with a 5% sample, every 20th unit (1: 0.05), for example, descending detail from the machine.

With a sufficiently large population, the mechanical selection in terms of the accuracy of the results is close to proper random. Therefore, to determine the average error of a mechanical sample, the formulas for self-random non-repetitive sampling are used (form. 9), (form. 10).

To select units from a heterogeneous population, the so-called typical sample , which is used in cases where all units of the general population can be divided into several qualitatively homogeneous, similar groups according to the characteristics that affect the studied indicators.

When surveying enterprises, such groups can be, for example, industry and sub-sector, forms of ownership. Then, from each typical group, an individual selection of units into the sample is made by a random or mechanical sample.

A typical sample is usually used in the study of complex statistical populations. For example, in a sample survey of the family budgets of workers and employees in certain sectors of the economy, the labor productivity of workers in an enterprise, represented by separate skill groups.

A typical sample gives more accurate results compared to other methods of selecting units in a sample set. Typification of the general population ensures the representativeness of such a sample, the representation of each typological group in it, which makes it possible to exclude the influence of intergroup dispersion on the average sample error.

When determining average error of a typical sample as an indicator of variation is the average of the intragroup variances.

The mean sampling error are found by the formulas:

* for the average quantitative trait

(reselection); (form. 11)

(irreversible selection); (form. 12)

* for share (alternative characteristic)

(reselection); (form.13)

(non-repetitive selection), (form. 14)

where is the average of the intra-group variances for the sample population;

The average of the intra-group variances of the share (alternative trait) in the sample population.

serial sampling involves random selection from the general population not of individual units, but of their equal groups (nests, series) in order to subject all units without exception to observation in such groups.

The use of serial sampling is due to the fact that many goods for their transportation, storage and sale are packed in packs, boxes, etc. Therefore, when controlling the quality of packaged goods, it is more rational to check several packages (series) than to select the required amount of goods from all packages.

Since within groups (series) all units without exception are examined, the average sampling error (when selecting equal series) depends only on the intergroup (interseries) variance.

W The mean sampling error for the mean score during serial selection, they are found by the formulas:

(reselection); (form.15)

(non-repetitive selection), (form. 16)

Where r- number of selected series; R- total number of episodes.

The intergroup variance of the serial sample is calculated as follows:

where is the average i- th series; - the general average for the entire sample population.

W Average sampling error for share (alternative feature) in serial selection:

(reselection); (form. 17)

(non-repetitive selection). (form. 18)

Intergroup(inter-series) variance of the serial sample share determined by the formula:

, (form. 19)

where is the share of the feature in i th series; - the total share of the trait in the entire sample.

In the practice of statistical surveys, in addition to the previously considered selection methods, their combination is used (combined selection).

The concept and calculation of sampling error.

The task of selective observation is to give correct ideas about the summary indicators of the entire population based on some part of them subjected to observation. The possible deviation of the sample share and sample mean from the share and mean in the general population is called sampling error or representativeness error. The greater the value of this error, the more the indicators of sample observation differ from those of the general population.

Differ:

Sampling errors;

Registration errors.

Registration errors occur when a fact is incorrectly established in the process of observation. They are characteristic of both continuous observation and selective observation, but they are less in selective observation.

The nature of the error is:

Tendentious - deliberate, i.e. either the best or worst units of the population were selected. In this case, the observations lose their meaning;

Random - the main organizational principle of selective observation is to prevent deliberate selection, i.e. ensure strict adherence to the principle of random selection.

General rule of random selection is: individual units of the general population must have exactly the same conditions and opportunities to fall into the number of units included in the sample. This characterizes the independence of the sample result from the will of the observer. The will of the observer generates tendentious errors. Sampling error in random selection is random. It characterizes the size of the deviations of the general characteristics from the sample ones.

Due to the fact that the characteristics in the studied population vary, the composition of the units in the sample may not coincide with the composition of the units of the entire population. It means that R and do not match with W And . The possible discrepancy between these characteristics is determined by the sampling error, which is determined by the formula:

where is the general variance.

where is the sample variance.

This shows where the general variance differs from the sample variance in times.

There is repeated and non-repeated selection. The essence of re-selection is that each unit in the sample, after observation, returns to the general population and can be re-examined. When resampling, the average sampling error is calculated:

For the indicator of the share of an alternative attribute, the sample variance is determined by the formula:

In practice, re-selection is rarely used. With non-repetitive selection, the size of the general population N decreases during the sampling, the formula for the average sampling error for a quantitative attribute is:



One of the possible values ​​in which the share of the studied trait can be is equal to:

where is the sampling error of the alternative feature.

Example.

During a sample survey of 10% of the products of a batch of finished products according to the method without re-selection, the following data on the moisture content in the samples were obtained.

Determine the average moisture %, variance, standard deviation, with a probability of 0.954, the possible limits in which the average is expected. % moisture of all finished products, with a probability of 0.987, possible limits of the specific gravity of standard products, provided that products with a moisture content of up to 13 and above 19% belong to a non-standard batch.

Only with a certain probability can it be argued that the general share of the sample share and the general average of the sample mean deviate in t once.

In statistics, these deviations are called marginal sampling errors and are marked.

The probability of judgments can be increased or decreased in t once. With a probability of 0.683, with 0.954, with 0.987, then the indicators of the general population according to the indicators of the sample are determined:

Average sampling error is always present in sample studies and appears due to the fact that not all units of the statistical population are surveyed, but only part of it.

The mean sampling error becomes marginal error Δ when multiplied by the confidence factor t , which is pre-set based on the required observation accuracy. The marginal error allows you to judge the "true" size of the parameter in the general population with a certain degree of probability

For typical and serial selection, when calculating sampling error instead of total variance 2 ) use the mean of the within-group variances and the between-group variance
, Where
- private variance of group i, volume i group

Formulas for the marginal error of a random sample in determining the average

For re-selection

Formulas for the marginal error of a random sample in determining the share

For re-selection

For non-recurring selection

Formulas for the size of a random sample in determining the average value

Formulas for the number of random samples in determining the share of the studied trait

The marginal difference between the general and sample mean corresponds to the marginal error

Probability values ​​and respectively t are in the distribution tables:

  • Student (in the case of a small sample)

Random sampling formulas are also suitable for mechanical sampling.

If rounding is necessary, with random sampling - rounding up, with mechanical sampling - rounding down.

Small sample

If the sample size is not more than 30 units, then the average error of a small sample in determining the average value is calculated by the formula:

To calculate the error of a small sample, the refined variance formula is used

Types of sampling tasks

    definition of sampling error,

    determination of the sample size n ,

    determination of the probability that the sample mean (or share) deviates from the general one by no more than a given amount t=Δ/μ,

    assessment of the randomness of discrepancies in the indicators of sample observations,

    transfer of sample characteristics to the general population.

Mean and Proportion Hypothesis Testing

Estimation of the randomness of discrepancies in the indicators of sample observations


Methods for transferring sample data to the general population

    weighing method;

    reweighing method;

    method of filling by random selection in replacement classes.

marginal error- the maximum possible discrepancy between the means or the maximum error for a given probability of its occurrence.

1. The marginal sampling error for the average during repeated selection in is calculated by the formula:

where t - normalized deviation - "confidence factor", which depends on the probability that guarantees the marginal sampling error;

mu x is the mean sampling error.

2. Marginal Sampling Error for Proportion when re-selection is determined by the formula:

3. The marginal sampling error for the mean with non-repetitive selection:

Limit relative error sampling is defined as the percentage ratio of the marginal sampling error to the corresponding characteristic of the sampling population. It is defined like this:

Small sample

The theory of small samples was developed English statistician Student at the beginning of the 20th century. In 1908, he discovered a special distribution that makes it possible, even with small samples, to correlate t and the confidence probability F(t). For n greater than 100, they give the same results as the tables of the Laplace probability integral, for 30< n < 100 различия получаются незначительные. Поэтому на практике к малым выборкам относятся выборки объемом менее 30 единиц.

As is known, in statistics there are two ways of observing mass phenomena, depending on the completeness of the coverage of the object: continuous and non-continuous. A variation of discontinuous observation is selective observation.

Under selective observation is understood as a non-continuous observation, in which units of the studied population, selected randomly, are subjected to statistical examination (observation).

Selective observation sets itself the task of characterizing the entire population of units for the examined part, subject to all the rules and principles of statistical observation and scientifically organized work on the selection of units.

The set of units selected for the survey in statistics is usually called sample population , and the set of units from which the selection is made is called general population . The main characteristics of the general and sample population are presented in Table 1.

Table 1 - The main characteristics of the general and sample population
IndexDesignation or formula
Population Sample population
Number of units N n
The number of units that have a feature M m
Proportion of units with this feature p = M/N ω = m/n
Proportion of units that do not have this feature q = 1 - p 1 - w
Average value sign
Dispersion sign
Dispersion of an alternative feature (dispersion of a share) pq ω (1 - ω)

When conducting selective observation, systematic and random errors occur. Systematic errors arise due to violation of the rules for selecting units in the sample. By changing the selection rules, such errors can be eliminated.

Random errors arise due to the discontinuous nature of the survey. Otherwise, they are called representativeness (representativeness) errors. Random errors are divided into average and marginal sampling errors, which are determined both when calculating the feature and when calculating the share.

The average and limit errors are related by the following relation :Δ = tμ, where Δ is the marginal sampling error, μ is the average sampling error, t is the confidence factor determined depending on the level of probability. Table 2 shows some values ​​of t taken from probability theory.

The value of the average sampling error is calculated differentially depending on the selection method and sampling procedure. The main formulas for calculating sampling errors are presented in Table 3.

Table 3 - Basic Formulas for Calculating Sampling Errors in Repetitive and Non-Repeated Selection
IndexDesignation and formula
Population Sample population
Mean feature error for random resampling
Mean share error for random resampling
Limit error of a feature in case of random re-selection
Marginal Share Error in Random Reselection
Average error of a feature for random non-repetitive selection
Mean share error in random non-repetitive selection
Limit error of a feature with random non-repetitive selection
Marginal share error for random non-repetitive selection

The calculation of the average and marginal sampling errors allows you to determine the possible limits in which the characteristics of the general population will be .

For example, for a sample mean, such limits are set based on the following relationships:

Limits of the share of the trait in the general population p.

Examples of solving problems on the topic "Sampling observation in statistics"

Task 1 . There is information on the output of products (works, services) obtained on the basis of 10% sample observation of enterprises in the region:

Determine: 1) for the enterprises included in the sample: a) the average size of output per enterprise; b) dispersion of the volume of production; c) the share of enterprises with a production volume of more than 400 thousand rubles; 2) for the region as a whole, with a probability of 0.954, the limits within which one can expect: a) the average volume of production per enterprise; b) the share of enterprises with a production volume of more than 400 thousand rubles; 3) the total volume of output in the region.

Solution

To solve the problem, we expand the proposed table.

1) For enterprises included in the sample, the average size of output per enterprise

110800/400 = 277 thousand rubles

We calculate the dispersion of the volume of production in a simplified way σ 2 = 35640000/400 - 277 2 = 89100 - 76229 = 12371.

The number of enterprises whose production volume exceeds 400 thousand rubles. equals 36+12 = 48, and their share is equal to ω = 48:400 = 0.12 = 12%.

2) From the theory of probability it is known that with a probability P=0.954 the confidence factor t=2. Marginal sampling error

2√12371:400 = 11.12 thousand rubles

Let's set the boundaries of the general average: 277-11.12 ≤Xav ≤ 277+11.12; 265.88 ≤Xav ≤ 288.12

Marginal sampling error of the share of enterprises

2√0,12*0,88/400 = 0,03

Let's define the boundaries of the general share: 0.12-0.03≤ p ≤0.12+0.03; 0.09≤ p≤0.15

3) Since the considered group of enterprises is 10% of the total number of enterprises in the region, there are 4,000 enterprises in the region as a whole. Then the total volume of output in the region lies within 265.88×4000≤Q≤288.12×4000; 1063520 ≤ Q ≤ 1152480

Task 2 . According to the results of a control audit by the tax authorities of 400 business structures, 140 of them do not fully indicate the income subject to taxation in their tax returns. Determine in the general population (for the entire region) the share of business structures that hid part of their tax revenues with a probability of 0.954.

Solution

According to the condition of the problem, the number of units in the sample population is n=400, the number of units with the considered feature is m=140, the probability is P=0.954.

From the theory of probability it is known that with the probability P=0.954 the confidence factor t=2.

The proportion of units that have the indicated attribute is determined by the formula: p=w+∆p, where w = m/n=140/400=0.35=35%,
and the limit error of the feature ∆p is obtained from the formula: ∆p= t √w(1-w)/n = 2√0.35×0.65/400 ≈ 0.5 = 5%

Then p = 35±5%.

Answer : The share of business structures that hid part of their tax income with a probability of 0.954 is 35±5%.