Frequency of letters in Russian. Frequency of use of letters in the Russian language Which letters are most often found in words

It is known that the letter layout on the keyboard of a printing press or PC is not composed randomly, but obeys certain rules. Thus, the most frequently used letters are located in the central part of the keyboard, and those that are less common are located at the edges. It is also known that vowels are used more often than consonants. This information was obtained using a special formula in the National Corpus of the Russian Language.

The most common vowels

Oddly enough, the letter “o” is the leader in the number of uses in written speech, both among vowels and among consonants. It is followed by “a” and “and”, and after that the consonants begin. According to experts, the frequency of use of the letter “o” is one tenth of a percent, while the frequency of other vowels ranges from seven to eight hundredths of a percent.

Most popular consonants

The most commonly used consonant is "n". Moreover, the largest number of words in the Russian language begin with the letter “p”. Among vowels, “o” is the leader in this regard.

The rarest consonant in Russian speech is the letter “f”, used in words that come from foreign languages, as well as onomatopoeias, for example “snort”.

Such statistics can be useful when compiling tautograms. The point of this word game is to create a coherent story, each word in which must begin with the same letter.

Frequency of use of letters in Russian

Do you know that some letters of the alphabet are found in words more often than others... Moreover, the frequency of use of vowels in the language is higher than consonants.

Which letters of the Russian alphabet are most often or least often found in words used to write text?

Statistics deals with the identification and study of general patterns. With the help of this scientific direction, you can answer the question posed above by counting the number of each letter of the Russian alphabet, the words used, and selecting an excerpt from the works of various authors. For their own interest and for the sake of something to do out of boredom, everyone can do this on their own. I will refer to the statistics of an already conducted study...

Russian alphabet Cyrillic. During its existence, it experienced several reforms, as a result of which the modern Russian alphabet system, including 33 letters, was formed.

o — 9.28%
a — 8.66%
e — 8.10%
and - 7.45%
n — 6.35%
t — 6.30%
p — 5.53%
s — 5.45%
l - 4.32%
in — 4.19%
k — 3.47%
n — 3.35%
m — 3.29%
y - 2.90%
d — 2.56%
I - 2.22%
s — 2.11%
b — 1.90%
z — 1.81%
b — 1.51%
g — 1.41%
th — 1.31%
h — 1.27%
yu — 1.03%
x — 0.92%
f — 0.78%
w — 0.77%
c — 0.52%
sch — 0.49%
f — 0.40%
e - 0.17%
ъ — 0.04%

The Russian letter with the highest frequency of use is the vowel “ ABOUT", as has already been rightly suggested here. There are also typical examples like “ DEFENSE"(7 pieces in one word and nothing exotic or surprising; very common for the Russian language). The high popularity of the letter “O” is largely explained by such a grammatical phenomenon as full vowel. That is, “cold” instead of “cold” and “frost” instead of “scum”.

And at the very beginning of words, the consonant letter “” is most often found P" This leadership is also confident and unconditional. Most likely, the explanation is provided by a large number of prefixes starting with the letter “P”: pere-, pre-, pre-, pri-, pro- and others.

The frequency of use of letters is the basis of cryptanalysis.

I wrote a funny PHP script. I ran all the texts on the Spectator through it to check the language. In total, 39,110 different word forms are used in the texts. How many different ones exactly? words- quite difficult to determine. To get at least somehow closer to this figure, I took only the first 5 letters of the word and compared them. The result was 14,373 such combinations. It would be a stretch to call this the “Spectator” vocabulary.

Then I took the words and examined them for the frequency of repetition of letters. Ideally, you need to take some kind of dictionary, to complete the picture. You cannot run texts, you only need unique words. In the text, some words are repeated more often than others. So, the following results were obtained:

o - 9.28%
a - 8.66%
e - 8.10%
and - 7.45%
n - 6.35%
t - 6.30%
p - 5.53%
s - 5.45%
l - 4.32%
in - 4.19%
k - 3.47%
n - 3.35%
m - 3.29%
y - 2.90%
d - 2.56%
I - 2.22%
s - 2.11%
b - 1.90%
z - 1.81%
b - 1.51%
g - 1.41%
th - 1.31%
h - 1.27%
yu - 1.03%
x - 0.92%
f - 0.78%
w - 0.77%
c - 0.52%
sch - 0.49%
f - 0.40%
e - 0.17%
ъ - 0.04%

I advise those who go to the “Field of Miracles” to memorize this table. And name the words in that order. So, for example, it would seem that such a “familiar” letter “b” is used less often than the “rare” letter “s”. We must also remember that a word has more than one vowel. And that if you guessed one vowel, then you need to start following the consonants. And besides, the word is guessed precisely by its consonants. Compare: “**a**i*e” and “sr*vn*t*”. In both cases, the word is “compare”.

And one more consideration. How did you learn English? Remember? E pen, e pencil, e table. What I see is what I sing about. What’s the point?.. How often do you say the word “pencil” in normal life? If the task is to teach how to speak as quickly and efficiently as possible, then you need to teach accordingly. We analyze the language and highlight the most commonly used words. And we start learning from them. To more or less speak English, only one and a half thousand words are enough.

Another pampering: to form words from letters randomly, but taking into account the frequency of occurrence, so that it looks like normal words. In the first ten “random” four-letter words, “donkey” popped up. In the next fifty - the words “rushing” and “NATO”. But, alas, there are a lot of dissonant combinations, such as “bltt” or “nrro”.

Therefore - the next step. I divided all the words into two-letter combinations and began to combine them randomly (but taking into account the frequency of repetition). Steel in large quantities will produce words similar to “normal.” For example: “koivdiot”, “voabma”, “apy”, “depoid”, “debyako”, “orfa”, “poesnavy”, “ozza”, “chenya”, “rhetoria”, “urdeed”, “utoichi”, “stikh”, “sapot”, “gravda”, “ababap”, “obarto”, “eleuet”, “lyarezy”, “myni”, “bromomer” and even “todebyst”.

Where to apply... there are options. For example, write a generator of beautiful branded playful names. For yoghurts. Like, “memoliso” or “utororerto”. Or - the generator of futuristic poems "Burliuk-php": "opeldiy miaton, linoaz okmiaya... deesopen odesson."

And there is one more option. Need to try...

Some statistics on the use of Russian words:

  • The average word length is 5.28 characters.
  • The average sentence length is 10.38 words.
  • The 1000 most frequent lemmas cover 64.0708% of the text.
  • The 2000 most frequent lemmas cover 71.9521% of the text.
  • The 3000 most frequent lemmas cover 76.5104% of the text.
  • The 5000 most frequent lemmas cover 82.0604% of the text.

After the note I received this letter:


Hello Dmitry!

After analyzing the article “Language will bring you to Kyiv” and the part where you describe your program, an idea arose.
The script you wrote seems to me to be intended absolutely not for “Field of Miracles” to a greater extent, but for something else.
The first most reasonable use of the results of your script is determining the order of letters when programming buttons for mobile devices. Yes, yes - it is in mobile phones that all this is needed.

I distributed it in waves ()

The following is the distribution by buttons:
1. All letters from the first wave go to 4 buttons in the first row
2. All letters from the second wave are also on the remaining 4 buttons in the same first row
3. All letters from the third wave go to the remaining two buttons
4. 4.5 and 6 waves go to the second row
5. 7,8,9 waves go to the third row, and the 9th wave goes completely (despite the seemingly large number of letters) to the third row of the 9th button, so that the 10th button is left for all sorts of punctuation marks ( period, comma, etc.).

I think everything is clear as it is, without detailed explanations. But still, could you process with your script (including punctuation marks) the following texts:

And then post the statistics? It seemed to me? that the texts reflect our modern speech as much as possible, and yet we both speak and write SMS.

Thank you very much in advance.

So, there are two ways to analyze the frequency of repetition of letters. Method 1. Take a text, find unique (non-repeating) word forms in it and analyze them. The method is good for building statistics based on words in the Russian language, and not on texts. Method 2. Do not look for unique words in the text, but go straight to counting the frequency of repetition of letters. We get the frequency of letters in Russian text, and not in Russian words. To create keyboards and other things, you need to use exactly this method: texts are typed on the keyboard.

Keyboards should take into account not only the frequency of letters, but also the most persistent words (word forms). It’s not so difficult to guess which words are the most commonly used: these are, firstly, official parts of speech, because their role is to serve always and everywhere, and pronouns, whose role is no less important: to replace any thing/person in speech (this, he, she). Well, the main verbs (to be, to say). Based on the results of the analysis of the texts listed above, I received the following “popular” words: “and, not, in, that, he, I, on, with, she, how, but, his, this, to, a, all, her, was, so, then, said, for, you, oh, at, him, me, only, for, me, yes, you, from, was, when, from, for, still, now, they, said, already, him, no, was, her, to be, well, nor, if, very, nothing, here, herself, so that, to herself, this, maybe, that, before, we, them, whether, were, is, than, or, her” and so on.

Returning to keyboards, it is obvious that in the keyboard the letter combinations “not”, “what”, “he”, “on” and others should be as close to each other as possible, or if not close, then in some optimal way. It is necessary to conduct research into exactly how the fingers move across the keyboard, find the most “comfortable” positions and place the most commonly used letters in them, without forgetting, however, about letter combinations.

The problem, as always, is one: even if it is possible to create a Unique Keyboard, what will happen to the millions of people who are already accustomed to qwerty/ytsuken?

As for mobile devices... Probably it makes sense. At the very least, the letters "o", "a", "e" and "i" must be exactly on the same key. Punctuation marks in order of frequency of use: , . - ? ! " ; :) (

Look at the "F" and "J" keys on your keyboard and you will see little clues. This is our guide to the world of touch typing.

Having started to study touch typing, I was faced with the feeling that something was wrong in our layout. The point was a discrepancy between the frequency of occurrence of letters in the Russian language and their location on the keyboard.

What do you think is the most common letter in the Russian language? And if you were at the “Field of Miracles”, what letter would you name first? The most common letter is "O", and the least common is "F". There is not a single native Russian word beginning with the letter "F".

Here is a table of the probability distribution of letters in Russian texts:

Probability

Probability

Probability

Probability

The letter "F" is found 45 times less than "O", but occupies the same convenient place as "O". Who was the person who adopted this standard? You will find the answer to this question in the article The Tragedy of the Comma: "... think, the comma is found much more often than the period, and yet the comma is located in uppercase. This is not found in any language in the world except Russian...".

Having looked at the table, you could be convinced of the following: in order to type blindly, you can learn the location of not all letters, but only, for example, 20 - they occur in more than 90% of cases. I don't believe that a person who frequently types cannot remember the location of the keys and work without looking at them. It's all a matter of habit. Please note: in any service where paperwork is completed, operators look at the keyboard, although they type very quickly.

But I understood in drawing up the layout probability was taken into account. Only it was designed for those... who typing while looking on the keyboard!!!

It is easy to notice that all the most frequently encountered letters are located in the line of sight, and the less frequently encountered ones are placed on the periphery.

With the English layout the situation is a little worse:

Programs for touch typing. There are many of them, you can look at reviews at http://www.urikor.net. I chose Solo and Stamina. I decided to start with Solo. It turned out to be paid, but a demo was available. To complete 1! typing exercise 2 characters needed read more than 10 pages - a kind of “simulator” for speed reading.

And they won’t let you through further until you read everything and fulfill the standard. I was almost about to delete the program when I received a letter from the Solo website, where they were interested in my progress. The letter was long, and I thought: “Well done, they learned how to type quickly and write long letters to everyone.”

But after studying the letter carefully, I realized that it was written by an answering robot, although it was signed by a person. Now I understand why the questionnaire asked me so much about my interests and hair color. I deleted solo.

I myself worked with the Stamina program. It is made with soul! You don’t even have to work with the program, but download it just for reference. This is the funniest reference!


How I remembered the keys.
You will learn "fyva" and "oldzh" quickly. A letter for each finger. Total already 8! I taught them not in Staminе, but in a program from the site http://www.urikor.net. And then I remembered the movements themselves. For example, many people have difficulty with the letter “i” when learning touch typing. Having placed my fingers on “fyva” and “olj”, in order to press the “i” key I need to make a full turn of my right index finger.

With this rotation I can only hit the "and" key. For each finger, I memorized the following movements: “p” - left index finger to the left, “k” - up, “e” - up and to the right, etc.

Problems: since the layout is not optimized for touch typing, it turns out that similar letters are mirror images of each other, these are the keys “a” and “o”, “k” and “g”. And what’s more interesting: exercises for the index fingers are given simultaneously!, i.e. They teach “a” and “o”, “e” and “n”, “p” and “r” at the same time.

In my opinion this is wrong - confusion occurs in the brain. At least I get confused sometimes. When you learn touch typing, think about the movements - then it’s difficult to relearn. By the way, some women have a problem with working on the keyboard; because of their long nails, they press other keys.

And when I had learned everything and decided that I would type blindly, the next one came stage - "laziness". Every day I needed to type a lot and since the speed with peeking is higher, I peeked all the time. After a couple of months I conquered myself and taped it up All keys are stickers from video cassettes.

Attention: If you don't tape your keys, habit will defeat you. When I work on keyboards where the letters are visible, I'm tempted to peek. Now there is no turning back and this is the first article written completely blindly.

Why do I need it. So far I feel a feeling of deep satisfaction. The speed is still a little less than with peeping and there are still errors, but already while typing this article, I noticed how the speed increases and sometimes I forget myself, and then I look - it’s printed. It’s as if consciousness is removing blocks.

It’s interesting to watch how you learn yourself, because you won’t have such an experience again. Now I plan to learn to play the piano. I even think I know how to play(!), I just need to Remember.

P.S.
A year has passed. I only type touch and at high speed. If you work on a computer, be sure to learn how to touch-type. It's easier than you think.
Here is a short note from Inna Igolkina about how she learned to type touch-touch.

Do you know that some letters of the alphabet are found in words more often than others... Moreover, the frequency of use of vowels in the language is higher than consonants.

Which letters of the Russian alphabet are most often or least often found in words used to write text?

Statistics deals with the identification and study of general patterns. With the help of this scientific direction, you can answer the question posed above by counting the number of each letter of the Russian alphabet, the words used, and selecting an excerpt from the works of various authors. For their own interest and for the sake of something to do out of boredom, everyone can do this on their own. I will refer to the statistics of an already conducted study...

Russian alphabet Cyrillic. During its existence, it experienced several reforms, as a result of which the modern Russian alphabet system, including 33 letters, was formed.

o — 9.28%
a — 8.66%
e — 8.10%
and - 7.45%
n — 6.35%
t — 6.30%
p — 5.53%
s — 5.45%
l - 4.32%
in — 4.19%
k — 3.47%
n — 3.35%
m — 3.29%
y - 2.90%
d — 2.56%
I - 2.22%
s — 2.11%
b — 1.90%
z — 1.81%
b — 1.51%
g — 1.41%
th — 1.31%
h — 1.27%
yu — 1.03%
x — 0.92%
f — 0.78%
w — 0.77%
c — 0.52%
sch — 0.49%
f — 0.40%
e - 0.17%
ъ — 0.04%

The Russian letter with the highest frequency of use is the vowel “ ABOUT", as has already been rightly suggested here. There are also typical examples like “ DEFENSE"(7 pieces in one word and nothing exotic or surprising; very common for the Russian language). The high popularity of the letter “O” is largely explained by such a grammatical phenomenon as full vowel. That is, “cold” instead of “cold” and “frost” instead of “scum”.

And at the very beginning of words, the consonant letter “” is most often found P" This leadership is also confident and unconditional. Most likely, the explanation is provided by a large number of prefixes starting with the letter “P”: pere-, pre-, pre-, pri-, pro- and others.

The frequency of use of letters is the basis of cryptanalysis.