Any variational series consists of. Variation Series

The rows built by quantity, are called variational.

The distribution series consist of options(characteristic values) and frequencies(number of groups). Frequencies expressed as relative values(shares, percent) are called frequencies. The sum of all frequencies is called the volume of the distribution series.

By type, the distribution series are divided into discrete(built on discontinuous values ​​of the feature) and interval(built on continuous values sign).

Variation series represents two columns (or rows); one of which provides individual values ​​of the variable attribute, called variants and denoted by X; and in the other - absolute numbers, showing how many times (how often) each option occurs. The indicators of the second column are called frequencies and are conventionally denoted by f. Note again that the second column can also use relative performance characterizing the proportion of the frequency of individual variants in total amount frequencies. These relative indicators are called frequencies and conventionally denoted by ω The sum of all frequencies in this case is equal to one. However, frequencies can also be expressed as a percentage, and then the sum of all frequencies gives 100%.

If the variants of the variational series are expressed as discrete values, then such a variational series is called discrete.

For continuous features, variation series are constructed as interval, that is, the values ​​of the attribute in them are expressed “from ... to ...”. In this case, the minimum values ​​of the attribute in such an interval are called the lower limit of the interval, and the maximum - the upper limit.

Interval variational series are also built for discrete features that vary over a wide range. The interval series can be equal and unequal intervals.

Consider how the value of equal intervals is determined. Let us introduce the following notation:

i– interval value;

- maximum value a sign in units of the population;

- the minimum value of the attribute for units of the population;

n- the number of allocated groups.

if n is known.

If the number of allocated groups is difficult to determine in advance, then the formula proposed by Sturgess in 1926 can be recommended to calculate the optimal size of the interval with a sufficient population size:

n = 1+ 3.322 log N, where N is the number of ones in the population.

The value of unequal intervals is determined in each individual case, taking into account the characteristics of the object of study.

The statistical distribution of the sample call the list of options and their corresponding frequencies (or relative frequencies).

The statistical distribution of the sample can be specified in the form of a table, in the first column of which there are options, and in the second - the frequencies corresponding to these options. ni, or relative frequencies Pi .

Statistical distribution of the sample

Variation series are called interval series, in which the values ​​of the features underlying their formation are expressed within certain limits (intervals). Frequencies in this case do not refer to individual values ​​of the attribute, but to the entire interval.

Interval distribution series are constructed according to continuous quantitative characteristics, as well as according to discrete characteristics, varying within a significant range.

An interval series can be represented by a statistical distribution of the sample, indicating the intervals and their corresponding frequencies. In this case, the sum of the frequencies of the variant that fell into this interval is taken as the frequency of the interval.

When grouping by quantitative continuous features, it is important to determine the size of the interval.

In addition to the sample mean and sample variance, other characteristics of the variation series are also used.

Fashion name the variant that has the highest frequency.

Parameter name Meaning
Article subject: Variation series
Rubric (thematic category) Production

Observed values random variable X 1 , X 2 , …, x k called options.

Frequency options X i is called a number n i (i=1,…,k) showing how many times this variant occurs in the sample.

Frequency(relative frequency, shares) options x i (i=1,…,k) is usually called the ratio of its frequency n i to sample size n.

Frequencies and frequencies are called scales.

Accumulated frequency it is customary to call the number of options, the values ​​​​of which are less than a given X:

Accumulated frequency It is customary to call the ratio of the accumulated frequency to the sample size:

variation series(statistical series) - it is customary to call a sequence of options written in ascending order and their corresponding weights.

The variation series should be discrete(sample of values ​​of a discrete random variable) and continuous (interval)(selection of values ​​of a continuous random variable).

The discrete variational series has the form:

When the number of options is large or the feature is continuous (a random variable can take any value in a certain interval), they are interval variation series.

To build an interval variation series, carry out grouping option - they are divided into separate intervals:

The number of intervals is sometimes determined using Sturges formulas:

Then the number of variants that fall into each interval is calculated - frequencies n i(or frequency n i/n). If the variant is on the border of the interval, then it is attached to the right interval.

The interval variational series has the form:

Options
Frequencies

Empirical (statistical) distribution function it is customary to call a function whose value at the point X equals relative frequency the fact that the variant will take a value less than X(cumulative frequency for X):

Frequency polygon is called a polyline whose segments connect points with coordinates ( X 1 ; n 1), (X 2 ; n 2), …, (x k; nk). The frequency polygon, which is a statistical analogue of the polygon of distributions.

It is worth saying that for a continuous variational series, a polygon can be built if the values X 1 , X 2 , …, x k take the midpoints of the intervals.

An interval variation series is usually graphically depicted using histograms.

bar graph- a stepped figure consisting of rectangles whose bases are partial length intervals h= x i +1 – x i, i= 0,…,k-1, and the heights are equal to the frequencies (or frequencies) of the intervals n i (w i).

Cumulate(cumulative curve) - curve of accumulated frequencies (frequencies). For discrete series the cumulate is a broken line connecting the points or , . For interval series cumulate starts from a point, the abscissa of which is equal to the beginning of the first interval, and the ordinate is the accumulated frequency (frequency), zero. Other points of this broken line correspond to the ends of the intervals.

Variation series - concept and types. Classification and features of the category "Variation series" 2017, 2018.

  • - Variation series of distribution

    Distribution of retail turnover Russian Federation in 1995 by type of ownership, million rubles Types of distribution series Lecture VIII. Distribution series As a result of processing and systematization of primary statistical data, they obtain ....


  • - Variation series

    The simplest transformation of statistical data is their ordering by magnitude. Sample volume from population, ordered in non-decreasing order of elements, i.e. , is called variational series: . In the case when the volume of observations ... .


  • - Task 2. Interval variation series

    1. Based on a given sample corresponding to the task variant, build an interval variation series; build a histogram and cumulate (use two methods: inserting an Excel chart and the "Histogram" mode of the "Data Analysis" package). 2. Analyze the resulting histogram. ... .


  • - Compile a variation series of the variability of the trait of bean seeds or leaves of any plant of the same age. Reveal patterns of trait variability.

    Population - structural unit kind. The number of populations. Causes of population fluctuations. The relationship of individuals in populations and between different populations of the same and different species. 1. An important feature of a species is its distribution in groups, populations in ...

  • (definition of a variational series; components of a variational series; three forms of a variational series; expediency of constructing an interval series; conclusions that can be drawn from the constructed series)

    A variational series is a sequence of all elements of a sample arranged in non-decreasing order. Identical elements are repeated

    Variational - these are series built on a quantitative basis.

    Variation Series distributions consist of two elements: variants and frequencies:

    Options are numerical values quantitative trait in the variational distribution series. They can be positive or negative, absolute or relative. So, when grouping enterprises according to the results economic activity positive options are profit, and negative numbers is a loss.

    Frequencies are the numbers of individual variants or each group of the variation series, i.e. these are numbers showing how often certain options occur in a distribution series. The sum of all frequencies is called the volume of the population and is determined by the number of elements of the entire population.

    Frequencies are frequencies expressed as relative values ​​(fractions of units or percentages). The sum of the frequencies is equal to one or 100%. Replacing frequencies with frequencies allows you to compare the variational series with different number observations.

    There are three forms of variation series: ranked series, discrete series and interval series.

    A ranked series is the distribution of individual units of the population in ascending or descending order of the trait under study. Ranking makes it easy to divide quantitative data into groups, immediately detect the smallest and greatest value feature, highlight the values ​​that are most often repeated.

    Other forms of the variation series are group tables compiled according to the nature of the variation in the values ​​of the trait under study. By the nature of the variation, discrete (discontinuous) and continuous signs are distinguished.

    Discrete series- this is such a variational series, the construction of which is based on signs with a discontinuous change (discrete signs). The latter include the tariff category, the number of children in the family, the number of employees in the enterprise, etc. These signs can take only a finite number of certain values.

    A discrete variational series is a table that consists of two columns. The first column indicates the specific value of the attribute, and the second - the number of population units with certain value sign.

    If a sign has a continuous change (the amount of income, work experience, the cost of fixed assets of an enterprise, etc., which can take any value within certain limits), then an interval variation series must be built for this sign.



    The group table here also has two columns. The first indicates the value of the feature in the interval "from - to" (options), the second - the number of units included in the interval (frequency).

    Frequency (repetition frequency) - the number of repetitions of a particular variant of the attribute values, denoted fi , and the sum of frequencies equal to the volume of the studied population, denoted

    Where k is the number of attribute value options

    Very often, the table is supplemented with a column in which the accumulated frequencies S are calculated, which show how many units of the population have a feature value of no more than given value.

    A discrete variational distribution series is a series in which groups are composed according to a trait that varies discretely and takes only integer values.

    The interval variation series of distribution is a series in which the grouping attribute, which forms the basis of the grouping, can take any values ​​in a certain interval, including fractional ones.

    An interval variational series is an ordered set of intervals of variation of the values ​​of a random variable with the corresponding frequencies or frequencies of the values ​​of the quantity falling into each of them.

    It is expedient to build an interval distribution series, first of all, with a continuous variation of a trait, and also if a discrete variation manifests itself over a wide range, i.e. the number of options for a discrete feature is quite large.

    Several conclusions can already be drawn from this series. For example, the average element of a variation series (median) can be an estimate of the most probable result of a measurement. The first and last element of the variational series (i.e., the minimum and maximum element of the sample) show the spread of the elements of the sample. Sometimes, if the first or last element is very different from the rest of the sample, then they are excluded from the measurement results, considering that these values ​​were obtained as a result of some kind of gross failure, for example, technology.

    Practice 1

    VARIATIONAL SERIES OF DISTRIBUTION

    variation series or near distribution called the ordered distribution of population units according to increasing (more often) or decreasing (less often) values ​​of the attribute and counting the number of units with one or another value of the attribute.

    There are 3 kind distribution range:

    1) ranked row- this is a list of individual units of the population in ascending order of the studied trait; if the number of population units is large enough, the ranked series becomes cumbersome, and in such cases, the distribution series is constructed by grouping the population units according to the values ​​of the trait under study (if the trait takes a small number of values, then a discrete series is constructed, and otherwise, an interval series);

    2) discrete series- this is a table consisting of two columns (rows) - specific values ​​\u200b\u200bof a varying attribute X i and the number of population units with the given value of the feature f i– frequencies; the number of groups in a discrete series is determined by the number of actually existing values ​​of the variable attribute;

    3) interval series- this is a table consisting of two columns (rows) - intervals of a varying sign X i and the number of population units falling within a given interval (frequencies), or the proportion of this number in the total number of populations (frequencies).

    Numbers showing how many times individual options occur in a given population are called frequencies or scales option and are marked lower case Latin alphabet f. The total sum of the frequencies of the variational series is equal to the volume of this population, i.e.

    where k– number of groups, ntotal number observations, or population size.

    Frequencies (weights) are expressed not only in absolute, but also in relative numbers - in fractions of a unit or as a percentage of the total number of variants that make up this set. In such cases, the weights are called relative frequencies or frequencies. The total sum of particulars is equal to one

    or
    ,

    if the frequencies are expressed as a percentage of the total number of observations P. The replacement of frequencies by frequencies is not obligatory, but sometimes it turns out to be useful and even necessary in those cases when it is necessary to compare with each other variational series that differ greatly in their volumes.

    Depending on how the attribute varies - discretely or continuously, in a wide or narrow range - the statistical population is distributed in intervalless or interval variation lines. In the first case, the frequencies refer directly to the ranked values ​​of the feature, which acquire the position individual groups or classes of the variational series, in the second - the frequencies related to individual intervals or intervals (from - to) are calculated, into which the general variation of the trait is divided within the range from the minimum to the maximum variants of this population. These spaces, or class spaces, may or may not be equal in width. From here they distinguish equal and unequal interval variational series. In unequal interval series, the nature of the frequency distribution changes as the width of the class intervals changes. Unequal-interval grouping in biology is used relatively rarely. As a rule, biometric data are distributed in equal interval series, which allows not only to identify the pattern of variation, but also facilitates the calculation of summary data. numerical characteristics variation series, comparison of distribution series with each other.

    When starting to construct an equal-interval variational series, it is important to correctly outline the width of the class interval. The fact is that a rough grouping (when very wide class intervals are set) distorts the typical features of variation and leads to a decrease in the accuracy of the numerical characteristics of the series. When choosing excessively narrow intervals, the accuracy of the generalizing numerical characteristics increases, but the series turns out to be too extended and does not give a clear picture of the variation.

    To obtain a well-defined variational series and To ensure sufficient accuracy of the numerical characteristics calculated from it, it is necessary to divide the variation of the trait (in the range from the minimum to the maximum options) into such a number of groups or classes that would satisfy both requirements. This problem is solved by dividing the range of variation of a feature by the number of groups or classes that are planned when constructing a variation series:

    ,

    where h– interval value; X m a x i X min is the maximum and minimum value In total; k is the number of groups.

    When constructing an interval distribution series, it is necessary to choose the optimal number of groups (character intervals) and set the length (range) of the interval. Since the analysis of the distribution series compares the frequencies in different intervals, it is necessary that the length of the intervals be constant. If you have to deal with an interval series of distribution with unequal intervals, then for comparability you need to bring the frequency or frequency to the unit of the interval, the resulting value is called density ρ , i.e
    .

    The optimal number of groups is chosen so that the variety of values ​​of the attribute in the aggregate is reflected to a sufficient extent and, at the same time, the regularity of the distribution, its shape is not distorted by random frequency fluctuations. If there are too few groups, there will be no pattern of variation; if there are too many groups, random frequency jumps will distort the shape of the distribution.

    Most often, the number of groups in a distribution series is determined by the Sturgess formula:

    where n- the size of the population.

    A graphical representation provides essential assistance in the analysis of a distribution series and its properties. The interval series is represented by a bar chart, in which the bases of the bars, located along the abscissa axis, are the intervals of values ​​of the varying attribute, and the heights of the bars are the frequencies corresponding to the scale along the ordinate axis. This type of diagram is called histogram.

    If there is a discrete distribution series or the middle intervals are used, then the graphic representation of such a series is called polygon, which is obtained by connecting straight points with coordinates X i and f i .

    If the class values ​​are plotted along the abscissa axis, and the accumulated frequencies are plotted along the ordinate axis, followed by connecting the points with straight lines, a graph is obtained called cumulative. The accumulated frequencies are found by successive summation, or cumulation frequencies in the direction from the first class to the end of the variation series.

    Example. There are data on the egg production of 50 laying hens for 1 year kept on a poultry farm (Table 1.1).

    T a b l e 1.1

    Egg laying hens

    No. of laying hens

    Egg production, pcs.

    No. of laying hens

    Egg production, pcs.

    No. of laying hens

    Egg production, pcs.

    No. of laying hens

    Egg production, pcs.

    No. of laying hens

    Egg production, pcs.

    It is required to build an interval distribution series and display it graphically in the form of a histogram, polygon and cumulate.

    It can be seen that the trait varies from 212 to 245 eggs obtained from a laying hen in 1 year.

    In our example, using the Sturgess formula, we determine the number of groups:

    k = 1 + 3,322lg 50 = 6,643 ≈ 7.

    Calculate the length (range) of the interval using the formula:

    .

    Let's build an interval series with 7 groups and an interval of 5 pieces. eggs (Table 1.2). To build graphs in the table, we calculate the middle of the intervals and the accumulated frequency.

    T a b l e 1.2

    Interval series of distribution of egg production

    Group of laying hens according to the size of egg production

    X i

    Number of laying hens

    f i

    Interval midpoint

    X i'

    Accumulated frequency

    f i

    Let's build a histogram of the distribution of egg production (Fig. 1.1).

    Rice. 1.1. Histogram of egg production distribution

    These histograms show the form of distribution characteristic of many traits: the values ​​of the average intervals of the trait are more common, and the extreme (small and large) values ​​of the trait are less common. The form of this distribution is close to the normal distribution law, which is formed if a variable variable is influenced by a large number of factors, none of which has a predominant value.

    The polygon and cumulate of the distribution of egg production have the form (Fig. 1.2 and 1.3).

    Rice. 1.2. Egg distribution polygon

    Rice. 1.3. Cumulate distribution of egg production

    Problem solving technology in spreadsheet processor Microsoft excel next.

    1. Enter the initial data in accordance with fig. 1.4.

    2. Rank the row.

    2.1. Select cells A2:A51.

    2.2. Left click on the toolbar on the button<Сортировка по возрастанию > .

    3. Determine the size of the interval for constructing the interval series of the distribution.

    3.1. Copy cell A2 to cell E53.

    3.2. Copy cell A51 to cell E54.

    3.3. Calculate the range of variation. To do this, enter the formula in cell E55 =E54-E53.

    3.4. Calculate the number of variation groups. To do this, enter the formula in cell E56 =1+3.322*LOG10(50).

    3.5. Enter in cell E57 the rounded number of groups.

    3.6. Calculate the length of the interval. To do this, enter the formula in cell E58 =E55/E57.

    3.7. Enter in cell E59 the rounded length of the interval.

    4. Build an interval series.

    4.1. Copy cell E53 to cell B64.

    4.2. Enter the formula in cell B65 =B64+$E$59.

    4.3. Copy cell B65 to cells B66:B70.

    4.4. Enter the formula in cell C64 =B65.

    4.5. Enter the formula in cell C65 =C64+$E$59.

    4.6. Copy cell C65 to cells C66:C70.

    The results of the solution are displayed on the display screen in the following form (Fig. 1.5).

    5. Calculate the interval frequency.

    5.1. Execute the command Service,Data analysis by clicking alternately with the left mouse button.

    5.2. In the dialog box Data analysis set with the left mouse button: Analysis Tools <Гистограмма>(Fig. 1.6).

    5.3. Left click on the button<ОК>.

    5.4. On the tab bar graph set the parameters according to fig. 1.7.

    5.5. Left click on the button<ОК>.

    The results of the solution are displayed on the display screen in the following form (Fig. 1.8).

    6. Fill in the table "Interval series of distribution".

    6.1. Copy cells B74:B80 to cells D64:D70.

    6.2. Calculate the sum of the frequencies. To do this, select cells D64:D70 and left-click on the button on the toolbar<Автосумма > .

    6.3. Calculate the middle of the intervals. To do this, enter the formula in cell E64 =(B64+C64)/2 and copy to cells E65:E70.

    6.4. Calculate the accumulated frequencies. To do this, copy cell D64 to cell F64. In cell F65, enter the formula =F64+D65 and copy it to cells F66:F70.

    The results of the solution are displayed on the display screen in the following form (Fig. 1.9).

    7. Edit the histogram.

    7.1. Right-click on the diagram on the name "pocket" and in the tab that appears, click the button<Очистить>.

    7.2. Right-click on the chart and on the tab that appears, click the button<Исходные данные>.

    7.3. In the dialog box Initial data change the x-axis labels. To do this, select cells B64:C70 (Fig. 1.10).

    7.5. Press key .

    The results are displayed on the screen in following form(Fig. 1.11).

    8. Build an egg distribution polygon.

    8.1. Left click on the toolbar on the button<Мастер диаграмм > .

    8.2. In the dialog box Chart Wizard (Step 1 of 4) use the left mouse button to set: Standard <График>(Fig. 1.12).

    8.3. Left click on the button<Далее>.

    8.4. In the dialog box Chart Wizard (Step 2 of 4) set the parameters according to fig. 1.13.

    8.5. Left click on the button<Далее>.

    8.6. In the dialog box Chart Wizard (Step 3 of 4) enter the names of the chart and axis Y (Fig. 1.14).

    8.7. Left click on the button<Далее>.

    8.8. In the dialog box Chart Wizard (Step 4 of 4) set the parameters according to fig. 1.15.

    8.9. Left click on the button<Готово>.

    The results are displayed on the display screen in the following form (Fig. 1.16).

    9. Insert data labels on the chart.

    9.1. Right-click on the chart and on the tab that appears, click the button<Исходные данные>.

    9.2. In the dialog box Initial data change the x-axis labels. To do this, select cells E64:E70 (Fig. 1.17).

    9.3. Press key .

    The results are displayed on the display screen in the following form (Fig. 1.18).

    The distribution cumulate is constructed similarly to the distribution polygon based on the accumulated frequencies.

    They are presented in the form of distribution series and are formatted as .

    A distribution series is one type of grouping.

    Distribution range- represents an ordered distribution of units of the studied population into groups according to a certain varying attribute.

    Depending on the trait underlying the formation of a distribution series, there are attributive and variational distribution ranks:

    • attributive- call the distribution series built on qualitative grounds.
    • Distribution series built in ascending or descending order of values ​​of a quantitative attribute are called variational.
    The variation series of the distribution consists of two columns:

    The first column contains quantitative values variable trait, which are called options and are marked. Discrete variant - expressed as an integer. The interval option is in the range from and to. Depending on the type of variants, it is possible to construct a discrete or interval variational series.
    The second column contains amount specific option , expressed in terms of frequencies or frequencies:

    Frequencies- these are absolute numbers showing how many times in the aggregate a given feature value occurs, which denote . The sum of all frequencies should be equal to the number of units of the entire population.

    Frequencies() are the frequencies expressed as a percentage of the total. The sum of all frequencies expressed as a percentage must be equal to 100% in fractions of one.

    Graphical representation of distribution series

    The distribution series are visualized using graphic images.

    The distribution series are displayed as:
    • Polygon
    • Histograms
    • Cumulates
    • ogives

    Polygon

    When constructing a polygon on the horizontal axis (abscissa), the values ​​of the varying attribute are plotted, and on vertical axis(y-axis) - frequencies or frequencies.

    The polygon in fig. 6.1 was built according to the micro-census of the population of Russia in 1994.

    6.1. Distribution of households by size

    Condition: Data are given on the distribution of 25 employees of one of the enterprises by tariff categories:
    4; 2; 4; 6; 5; 6; 4; 1; 3; 1; 2; 5; 2; 6; 3; 1; 2; 3; 4; 5; 4; 6; 2; 3; 4
    Task: Build a discrete variational series and depict it graphically as a distribution polygon.
    Decision:
    AT this example options is the wage category of the employee. To determine the frequencies, it is necessary to calculate the number of employees with the appropriate wage category.

    The polygon is used for discrete variation series.

    To build a distribution polygon (Fig. 1), along the abscissa (X), we plot the quantitative values ​​of the varying trait - variants, and along the ordinate - frequencies or frequencies.

    If the characteristic values ​​are expressed as intervals, then such a series is called an interval series.
    interval series distributions are shown graphically as a histogram, cumulate or ogive.

    Statistical table

    Condition: Data on the size of deposits 20 are given individuals in one bank (thousand rubles) 60; 25; 12; ten; 68; 35; 2; 17; 51; nine; 3; 130; 24; 85; 100; 152; 6; eighteen; 7; 42.
    Task: Build an interval variation series with equal intervals.
    Decision:

    1. The initial population consists of 20 units (N = 20).
    2. Using the Sturgess formula, we define required amount used groups: n=1+3.322*lg20=5
    3. Let's calculate the value of the equal interval: i=(152 - 2) /5 = 30 thousand rubles
    4. We divide the initial population into 5 groups with an interval of 30 thousand rubles.
    5. The grouping results are presented in the table:

    With such a recording of a continuous feature, when the same value occurs twice (as the upper limit of one interval and the lower limit of another interval), then this value belongs to the group where this value acts as the upper limit.

    bar graph

    To build a histogram along the abscissa, indicate the values ​​of the boundaries of the intervals and, based on them, construct rectangles whose height is proportional to the frequencies (or frequencies).

    On fig. 6.2. the histogram of distribution of the population of Russia in 1997 by age groups is shown.

    Rice. 6.2. Distribution of the population of Russia by age groups

    Condition: The distribution of 30 employees of the company according to the size of the monthly salary is given

    Task: Display the interval variation series graphically as a histogram and cumulate.
    Decision:

    1. The unknown border of the open (first) interval is determined by the value of the second interval: 7000 - 5000 = 2000 rubles. With the same value we find lower bound the first interval: 5000 - 2000 = 3000 rubles.
    2. To construct a histogram in a rectangular coordinate system, along the abscissa axis, we set aside segments whose values ​​correspond to the intervals of the variant series.
      These segments serve as the lower base, and the corresponding frequency (frequency) serves as the height of the rectangles formed.
    3. Let's build a histogram:

    To construct the cumulate, it is necessary to calculate the accumulated frequencies (frequencies). They are determined by successive summation of the frequencies (frequencies) of the previous intervals and are denoted by S. The accumulated frequencies show how many units of the population have a feature value no greater than the one under consideration.

    Cumulate

    The distribution of a trait in a variational series according to the accumulated frequencies (frequencies) is depicted using the cumulate.

    Cumulate or the cumulative curve, in contrast to the polygon, is built on the accumulated frequencies or frequencies. At the same time, the values ​​of the feature are placed on the abscissa axis, and the accumulated frequencies or frequencies are placed on the ordinate axis (Fig. 6.3).

    Rice. 6.3. Cumulative distribution of households by size

    4. Calculate the accumulated frequencies:
    The knee frequency of the first interval is calculated as follows: 0 + 4 = 4, for the second: 4 + 12 = 16; for the third: 4 + 12 + 8 = 24, etc.

    When constructing the cumulate, the accumulated frequency (frequency) of the corresponding interval is assigned to its upper bound:

    Ogiva

    Ogiva is constructed similarly to the cumulate with the only difference that the accumulated frequencies are placed on the abscissa axis, and the feature values ​​are placed on the ordinate axis.

    A variation of the cumulate is the concentration curve or Lorenz plot. To plot the concentration curve on both axes rectangular system coordinates, a scale scale is applied in percent from 0 to 100. In this case, the abscissas indicate the accumulated frequencies, and the ordinates show the accumulated values ​​of the share (in percent) by the volume of the feature.

    The uniform distribution of the sign corresponds to the diagonal of the square on the graph (Fig. 6.4). With uneven distribution, the graph is a concave curve depending on the concentration level of the trait.

    6.4. concentration curve