Linguistic encyclopedic dictionary. Mathematical Methods in Comparative Linguistics

The Formation of Structural Linguistics at the Turn of the 19th – 20th Centuries. Statistical methods in language learning. Application of mathematical methods in linguistics in the second half of the twentieth century. Learning the language by methods of formal logic. Features of machine translation.

INTRODUCTION

Chapter 1. The history of the application of mathematical methods in linguistics

1.1. The Formation of Structural Linguistics at the Turn of the 19th - 20th Centuries

1.2. Application of mathematical methods in linguistics in the second half of the twentieth century

Chapter 2. Selected examples of the use of mathematics in linguistics

2.1. Machine translate

2.2.Statistical methods in language learning

2.3. Learning a language by methods of formal logic

2.4. Prospects for the application of mathematical methods in linguistics

Conclusion

Literature

Appendix 1. Ronald Schleifer. Ferdinand de Saussure

Appendix 2. Ferdinand de Saussure (translation)

INTRODUCTION

In the 20th century, there has been a continuing trend towards the interaction and interpenetration of various fields of knowledge. The boundaries between the individual sciences are gradually blurred; there are more and more branches of mental activity that are "at the junction" of humanitarian, technical and natural science knowledge.

Another obvious feature of modernity is the desire to study structures and their constituent elements. Therefore, an increasing place, both in scientific theory and in practice, is given to mathematics. Coming into contact, on the one hand, with logic and philosophy, on the other hand, with statistics (and, consequently, with the social sciences), mathematics penetrates deeper and deeper into those areas that for a long time were considered to be purely “humanitarian”, expanding their heuristic potential (the answer to the question "how much" will often help answer the questions "what" and "how"). Linguistics was no exception.

The purpose of my course work is to briefly highlight the connection between mathematics and such a branch of linguistics as linguistics. Since the 50s of the last century, mathematics has been used in linguistics to create a theoretical apparatus for describing the structure of languages ​​(both natural and artificial). At the same time, it should be said that it did not immediately find such a practical application for itself. Initially, mathematical methods in linguistics began to be used in order to clarify the basic concepts of linguistics, however, with the development of computer technology, such a theoretical premise began to find application in practice. The resolution of such tasks as machine translation, machine information retrieval, automatic text processing required a fundamentally new approach to the language. A question has arisen before linguists: how to learn to represent linguistic patterns in the form in which they can be applied directly to technology. The term “mathematical linguistics”, which is popular in our time, refers to any linguistic research that uses exact methods (and the concept of exact methods in science is always closely related to mathematics). Some scientists of the past years believe that the expression itself cannot be elevated to the rank of a term, since it does not mean any special “linguistics”, but only a new direction focused on improving, increasing the accuracy and reliability of language research methods. Linguistics uses both quantitative (algebraic) and non-quantitative methods, which brings it closer to mathematical logic, and, consequently, to philosophy, and even to psychology. Even Schlegel noted the interaction of language and consciousness, and Ferdinand de Saussure, a prominent linguist of the early twentieth century (I will tell about his influence on the development of mathematical methods in linguistics later), connected the structure of the language with its belonging to the people. The modern researcher L. Perlovsky goes further, identifying the quantitative characteristics of the language (for example, the number of genders, cases) with the peculiarities of the national mentality (more on this in Section 2.2, "Statistical Methods in Linguistics").

The interaction of mathematics and linguistics is a multifaceted topic, and in my work I will not dwell on all, but, first of all, on its applied aspects.

Chapter IHistory of application of mathematical methods in linguistics

1.1 Formation of structural linguisticsat the turn of the XIX - XX centuries

The mathematical description of language is based on the idea of ​​language as a mechanism, which goes back to the famous Swiss linguist of the early twentieth century, Ferdinand de Saussure.

The initial link of his concept is the theory of language as a system consisting of three parts (the language itself is language, speech - password, and speech activity - language), in which each word (member of the system) is considered not in itself, but in connection with other members. As another prominent linguist, the Dane Louis Hjelmslev, later noted, Saussure "was the first to demand a structural approach to language, that is, a scientific description of the language by recording the relationships between units."

Understanding language as a hierarchical structure, Saussure was the first to pose the problem of the value and significance of language units. Separate phenomena and events (say, the history of the origin of individual Indo-European words) should be studied not by themselves, but in a system in which they are correlated with similar components.

The structural unit of the language of Saussure considered the word, "sign", in which sound and meaning were combined. None of these elements exist without each other: therefore, the native speaker understands the various shades of the meaning of a polysemantic word as a separate element in the structural whole, in the language.

Thus, in the theory of F. de Saussure one can see the interaction of linguistics, on the one hand, with sociology and social psychology (it should be noted that at the same time, Husserl's phenomenology, Freud's psychoanalysis, Einstein's theory of relativity were developing, experiments were taking place on form and content in literature, music and fine arts), on the other hand, with mathematics (the concept of systemicity corresponds to the algebraic concept of language). Such a concept changed the concept of linguistic interpretation as such: Phenomena began to be interpreted not in relation to the causes of their occurrence, but in relation to the present and future. Interpretation ceased to be independent of a person's intentions (despite the fact that intentions may be impersonal, "unconscious" in the Freudian sense of the word).

The functioning of the linguistic mechanism is manifested through the speech activity of native speakers. The result of speech is the so-called "correct texts" - sequences of speech units that obey certain patterns, many of which allow a mathematical description. The theory of ways to describe the syntactic structure deals with the study of methods for mathematical description of correct texts (primarily sentences). In such a structure, linguistic analogies are defined not with the help of their inherent qualities, but with the help of system (“structural”) relations.

In the West, Saussure's ideas are developed by the younger contemporaries of the great Swiss linguist: in Denmark - L. Hjelmslev, already mentioned by me, who gave rise to the algebraic theory of language in his work "Fundamentals of Linguistic Theory", in the USA - E. Sapir, L. Bloomfield, C. Harris, in the Czech Republic - the Russian émigré scientist N. Trubetskoy.

Statistical regularities in the study of language began to be dealt with by none other than the founder of genetics, Georg Mendel. It was only in 1968 that philologists discovered that, it turns out, in the last years of his life he was fascinated by the study of linguistic phenomena using the methods of mathematics. Mendel brought this method to linguistics from biology; in the 1990s, only the most daring linguists and biologists claimed the feasibility of such an analysis. In the archives of the monastery of St. Tomasz in Brno, whose abbot was Mendel, sheets were found with columns of surnames ending in "mann", "bauer", "mayer", and with some fractions and calculations. In an effort to discover the formal laws of the origin of family names, Mendel makes complex calculations, in which he takes into account the number of vowels and consonants in the German language, the total number of words he considers, the number of surnames, etc.

In our country, structural linguistics began to develop at about the same time as in the West - at the turn of the 19th-20th centuries. Simultaneously with F. de Saussure, the concept of language as a system was developed in their works by professors of Kazan University F.F. Fortunatov and I.A. Baudouin de Courtenay. The latter corresponded for a long time with de Saussure, respectively, the Geneva and Kazan schools of linguistics collaborated with each other. If Saussure can be called the ideologist of "exact" methods in linguistics, then Baudouin de Courtenay laid the practical foundations for their application. He was the first to separate linguistics (as accurate a science using statistical methods and functional dependence) from philology (a community of humanitarian disciplines that study spiritual culture through language and speech). The scientist himself believed that "linguistics can be useful in the near future, only freed from the mandatory union with philology and literary history" . Phonology became the "testing ground" for the introduction of mathematical methods into linguistics - sounds as "atoms" of the language system, having a limited number of easily measurable properties, were the most convenient material for formal, rigorous methods of description. Phonology denies the existence of meaning in sound, so the "human" factor was eliminated in the studies. In this sense, phonemes are like physical or biological objects.

Phonemes, as the smallest linguistic elements acceptable for perception, represent a separate sphere, a separate "phenomenological reality". For example, in English, the sound "t" can be pronounced differently, but in all cases, a person who speaks English will perceive it as "t". The main thing is that the phoneme will perform its main - meaningful - function. Moreover, the differences between languages ​​are such that varieties of one sound in one language can correspond to different phonemes in another; for example, "l" and "r" in English are different, while in other languages ​​they are varieties of the same phoneme (like the English "t", pronounced with or without aspiration). The vast vocabulary of any natural language is a set of combinations of a much smaller number of phonemes. In English, for example, only 40 phonemes are used to pronounce and write about a million words.

The sounds of a language are a systematically organized set of features. In the 1920s -1930s, following Saussure, Jacobson and N.S. Trubetskoy singled out the "distinctive features" of phonemes. These features are based on the structure of the speech organs - tongue, teeth, vocal cords. For example, in English the difference between "t" and "d" is the presence or absence of a "voice" (the tension of the vocal cords) and the level of voice that distinguishes one phoneme from another. Thus, phonology can be considered an example of the general language rule described by Saussure: "There are only differences in language". Even more important is not this: the difference usually implies the exact conditions between which it is located; but in language there are only differences without precise conditions. Whether we are considering "designation" or "signified" - in the language there are neither concepts nor sounds that would have existed before the development of the language system.

Thus, in Saussurean linguistics, the studied phenomenon is understood as a set of comparisons and oppositions of language. Language is both an expression of the meaning of words and a means of communication, and these two functions never coincide. We can notice the alternation of form and content: linguistic contrasts define its structural units, and these units interact to create a certain meaningful content. Since the elements of language are random, neither contrast nor combination can be the basis. This means that in a language, distinctive features form a phonetic contrast at a different level of understanding, phonemes are combined into morphemes, morphemes - into words, words - into sentences, etc. In any case, an entire phoneme, word, sentence, etc. is more than just the sum of its parts.

Saussure proposed the idea of ​​a new science of the twentieth century, separate from linguistics, studying the role of signs in society. Saussure called this science semiology (from the Greek "semeion" - a sign). The "science" of semiotics, which developed in Eastern Europe in the 1920s and 1930s and in Paris in the 1950s and 1960s, expanded the study of language and linguistic structures into literary findings composed (or formulated) with the help of these structures. In addition, in the twilight of his career, in parallel with his course in general linguistics, Saussure engaged in a "semiotic" analysis of late Roman poetry, trying to discover deliberately composed anagrams of proper names. This method was in many ways the opposite of rationalism in its linguistic analysis: it was an attempt to study in a system the problem of "probability" in language. Such research helps to focus on the "real side" of probability; the "key word" for which Saussure is looking for an anagram is, as Jean Starobinsky argues, "a tool for the poet, not the source of the life of the poem." The poem serves to swap the sounds of the keyword. According to Starobinsky, in this analysis, "Saussure does not delve into the search for hidden meanings." On the contrary, in his works, a desire to avoid questions related to consciousness is noticeable: “since poetry is expressed not only in words, but also in what these words give rise to, it goes beyond the control of consciousness and depends only on the laws of language” (see . Appendix 1).

Saussure's attempt to study proper names in late Roman poetry emphasizes one of the components of his linguistic analysis - the arbitrary nature of signs, as well as the formal essence of Saussure's linguistics, which excludes the possibility of analyzing meaning. Todorov concludes that today the works of Saussure seem to be extremely consistent in their reluctance to study the symbols of a phenomenon that have a clearly defined meaning [Appendix 1]. Exploring anagrams, Saussure pays attention only to repetition, but not to previous options. . . . Studying the Nibelungenlied, he defines the symbols only to assign them to erroneous readings: if they are unintentional, the symbols do not exist. After all, in his writings on general linguistics, he makes the assumption of the existence of a semiology that describes not only linguistic signs; but this assumption is limited by the fact that semiology can only describe random, arbitrary signs.

If this is really so, it is only because he could not imagine "intention" without an object; he could not completely bridge the gap between form and content - in his writings this turned into a question. Instead, he turned to "linguistic legitimacy". Between, on the one hand, nineteenth-century concepts based on history and subjective conjectures, and methods of random interpretation based on these concepts, and, on the other hand, structuralist concepts that erase the opposition between form and content (subject and object), meaning and origins in structuralism, psychoanalysis, and even quantum mechanics - the writings of Ferdinand de Saussure on linguistics and semiotics mark a turning point in the study of meanings in language and culture.

Russian scientists were also represented at the First International Congress of Linguists in The Hague in 1928. S. Kartsevsky, R. Yakobson and N. Trubetskoy made a report that examined the hierarchical structure of the language - in the spirit of the most modern ideas for the beginning of the last century. Jakobson in his writings developed Saussure's ideas that the basic elements of a language should be studied, first of all, in connection with their functions, and not with the reasons for their occurrence.

Unfortunately, after Stalin came to power in 1924, Russian linguistics, like many other sciences, is thrown back. Many talented scientists were forced to emigrate, were expelled from the country or died in camps. Only since the mid-1950s has a certain pluralism of theories become possible - more on this in Section 1.2.

1.2 Application of mathematical methods in linguistics in the second half of the twentieth century

By the middle of the twentieth century, four world linguistic schools had formed, each of which turned out to be the ancestor of a certain “exact” method. Leningrad Phonological School(its ancestor was a student of Baudouin de Courtenay L.V. Shcherba) used a psycholinguistic experiment based on the analysis of the speech of native speakers as the main criterion for generalizing sound in the form of a phoneme.

Scientists Prague Linguistic Circle, in particular - its founder N.S. Trubetskoy, who emigrated from Russia, developed the theory of oppositions - the semantic structure of the language was described by them as a set of oppositionally constructed semantic units - Sem. This theory was applied in the study of not only language, but also artistic culture.

Ideologists American descriptivism were linguists L. Bloomfield and E. Sapir. Language was presented to descriptivists as a set of speech statements, which were the main object of their study. Their focus was on the rules of scientific description (hence the name) of texts: the study of organization, arrangement and classification of their elements. Formalization of analytical procedures in the field of phonology and morphology (development of principles for the study of language at different levels, distributive analysis, the method of direct constituents, etc.) led to the formulation of general questions of linguistic modeling. Inattention to the plan of the content of the language, as well as the paradigmatic side of the language, did not allow descriptivists to interpret the language as a system fully enough.

In the 1960s, the theory of formal grammars developed, which arose mainly due to the work of the American philosopher and linguist N. Chomsky. He is rightfully considered one of the most famous modern scientists and public figures, many articles, monographs and even a full-length documentary are devoted to him. By the name of a fundamentally new way of describing the syntactic structure invented by Chomsky - generative (generative) grammar - the corresponding trend in linguistics was called generativism.

Chomsky, a descendant of immigrants from Russia, studied linguistics, mathematics and philosophy at the University of Pennsylvania since 1945, being strongly influenced by his teacher Zelig Harris - like Harris, Chomsky considered and still considers his political views close to anarchism (he is still known as critic of the existing US political system and as one of the spiritual leaders of anti-globalism).

Chomsky's first major scientific work, master's thesis "Morphology of Modern Hebrew » (1951) has remained unpublished. Chomsky received his doctorate from the University of Pennsylvania in 1955, but much of the research underlying his dissertation (published in full only in 1975 under the title The Logical Structure of Linguistic Theory) and his first monograph, Syntactic Structures (1957, Rus. trans. 1962), was performed at Harvard University in 1951-1955. In the same 1955, the scientist moved to the Massachusetts Institute of Technology, where he became a professor in 1962.

Chomsky's theory has gone through several stages in its development.

In the first monograph "Syntactic Structures", the scientist presented the language as a mechanism for generating an infinite set of sentences using a finite set of grammatical means. To describe linguistic properties, he proposed the concepts of deep (hidden from direct perception and generated by a system of recursive, i.e., can be applied repeatedly, rules) and surface (directly perceived) grammatical structures, as well as transformations that describe the transition from deep structures to surface ones. Several surface structures can correspond to one deep structure (for example, a passive structure The decree is signed by the President derived from the same Deep Structure as the active construct The President signs the decree) and vice versa (thus, the ambiguity mother loves daughter described as the result of a coincidence of surface structures that go back to two different deep ones, in one of which the mother is the one who loves the daughter, and in the other, the one who is loved by the daughter).

Chomsky's standard theory is considered to be the "Aspects" model set forth in Chomsky's book "Aspects of the Theory of Syntax". In this model, for the first time, rules of semantic interpretation were introduced into formal theory, attributing meaning to deep structures. In Aspects, linguistic competence is opposed to the use of language (performance), the so-called Katz-Postal hypothesis about the preservation of meaning during transformation is adopted, in connection with which the concept of optional transformation is excluded, and an apparatus of syntactic features describing lexical compatibility is introduced.

In the 1970s, Chomsky worked on the theory of control and binding (GB-theory - from the words government and binding) is more general than the previous one. In it, the scientist abandoned the specific rules that describe the syntactic structures of specific languages. All transformations have been replaced with one universal move transformation. Within the framework of the GB theory, there are also private modules, each of which is responsible for its own part of the grammar.

Already recently, in 1995, Chomsky put forward a minimalist program, where human language is described like machine language. This is just a program - not a model or a theory. In it, Chomsky identifies two main subsystems of the human language apparatus: the lexicon and the computing system, as well as two interfaces - phonetic and logical.

Chomsky's formal grammars have become classic for describing not only natural but also artificial languages ​​- in particular, programming languages. The development of structural linguistics in the second half of the 20th century can rightfully be considered a "Chomskian revolution".

Moscow Phonological School, whose representatives were A.A. Reformatsky, V.N. Sidorov, P.S. Kuznetsov, A.M. Sukhotin, R.I. Avanesov, used a similar theory to study phonetics. Gradually, "exact" methods are beginning to be applied with regards not only to phonetics, but also to syntax. Both linguists and mathematicians, both here and abroad, are beginning to study the structure of the language. In the 1950s and 60s, a new stage in the interaction between mathematics and linguistics began in the USSR, associated with the development of machine translation systems.

The impetus for the beginning of these works in our country was the first developments in the field of machine translation in the United States (although the first mechanized translation device by P.P. Smirnov-Troyansky was invented in the USSR back in 1933, it, being primitive, did not become widespread). In 1947, A. Butt and D. Britten came up with a code for word-by-word translation using a computer; a year later, R. Richens proposed a rule for splitting words into stems and endings in machine translation. Those years were quite different from today. These were very large and expensive machines that took up entire rooms and required a large staff of engineers, operators and programmers to maintain them. Basically, these computers were used to carry out mathematical calculations for the needs of military institutions - the new in mathematics, physics and technology served, first of all, military affairs. In the early stages, the development of the MP was actively supported by the military, with all this (in the conditions of the Cold War), the Russian-English direction developed in the USA, and the Anglo-Russian direction in the USSR.

In January 1954, the "Georgetown Experiment" took place at the Massachusetts Technical University - the first public demonstration of translation from Russian into English on the IBM-701 machine. Abstract of the message about the successful passage of the experiment, made by D.Yu. Panov, appeared in the RJ "Mathematics", 1954, No. 10: "Translation from one language to another using a machine: a report on the first successful test."

D. Yu. Panov (at that time director of the Institute of Scientific Information - INI, later VINITI) attracted I. K. Belskaya, who later headed the machine translation group at the Institute of Precise Mathematics and Computer Engineering of the USSR Academy of Sciences, to work on machine translation. By the end of 1955, the first experience of translating from English into Russian with the help of the BESM machine dates back. Programs for BESM were compiled by N.P. Trifonov and L.N. Korolev, whose Ph.D. thesis was devoted to methods for constructing dictionaries for machine translation.

In parallel, work on machine translation was carried out at the Department of Applied Mathematics of the Mathematical Institute of the USSR Academy of Sciences (now the M.V. Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences). At the initiative of the mathematician A.A. Lyapunov. He involved O.S. Kulagin and her students T.D. Wentzel and N.N. Ricco. The ideas of Lyapunov and Kulagina about the possibility of using technology to translate from one language into another were published in the journal Nature, 1955, No. 8. From the end of 1955, T.N. Moloshnaya, who then began independent work on the English-Russian translation algorithm.

R. Frumkina, who at that time was engaged in the translation algorithm from Spanish, recalls that at this stage of the work it was difficult to take any consistent steps. Much more often I had to follow the heuristic experience - my own or colleagues.

At the same time, the first generation of machine translation systems was very imperfect. All of them were based on sequential translation algorithms "word by word", "phrase by phrase" - semantic connections between words and sentences were not taken into account in any way. For example, the sentences are: John was looking for his toy box.Finally he found it. The box was in the pen.John was very happy. (John was looking for his toy box. Finally he found it. The box was in the playpen. John was very happy.).” "Pen" in this context is not a "pen" (writing tool), but a "playpen" ( play-pen). Knowledge of synonyms, antonyms and figurative meanings is difficult to enter into a computer. A promising direction was the development of computer systems focused on the use of a human translator.

Over time, direct translation systems were replaced by T-systems (from the English word "transfer" - transformation), in which translation was carried out at the level of syntactic structures. The algorithms of T-systems used a mechanism that made it possible to build a syntactic structure according to the grammar rules of the language of the input sentence (similar to how a foreign language is taught in high school), and then synthesize the output sentence by transforming the syntactic structure and substituting the necessary words from the dictionary.

Lyapunov talked about translation by extracting the meaning of the translated text and presenting it in another language. The approach to building machine translation systems based on obtaining the semantic representation of the input sentence by semantic analysis and synthesis of the input sentence according to the obtained semantic representation is still considered the most perfect. Such systems are called I-systems (from the word "interlingua"). At the same time, the task of creating them, set back in the late 50s and early 60s, has not been fully resolved so far, despite the efforts of the International Federation of IFIP - the world community of scientists in the field of information processing.

Scientists thought about how to formalize and build algorithms for working with texts, what dictionaries should be entered into the machine, what linguistic patterns should be used in machine translation. Traditional linguistics did not have such ideas - not only in terms of semantics, but also in terms of syntax. At that time, there were no lists of syntactic constructions for any language, the conditions for their compatibility and interchangeability were not studied, the rules for constructing large units of syntactic structure from smaller constituent elements were not developed.

The need to create the theoretical foundations of machine translation led to the formation and development of mathematical linguistics. The leading role in this matter in the USSR was played by the mathematicians A.A. Lyapunov, O.S. Kulagina, V.A. Uspensky, linguists V.Yu. Rosenzweig, P.S. Kuznetsov, R.M. Frumkina, A.A. Reformatsky, I.A. Melchuk, V.V. Ivanov. Kulagina's dissertation was devoted to the study of the formal theory of grammars (simultaneously with N. Khomsky in the USA), Kuznetsov put forward the task of axiomatization of linguistics, which goes back to the works of F.F. Fortunatov.

On May 6, 1960, the Decree of the Presidium of the USSR Academy of Sciences "On the development of structural and mathematical methods for the study of language" was adopted, and corresponding divisions were created at the Institute of Linguistics and the Institute of the Russian Language. Since 1960, the country's leading humanitarian universities - the Faculty of Philology of Moscow State University, Leninrad, Novosibirsk Universities, Moscow State Institute of Foreign Languages ​​- began training personnel in the field of automatic text processing.

At the same time, works on machine translation of this period, called "classical", are of more theoretical than practical interest. Cost-effective machine translation systems began to be created only in the eighties of the last century. I will talk about this later in Section 2.1, Machine Translation.

The 1960s - 70s include deep theoretical developments using the methods of set theory and mathematical logic, such as field theory and fuzzy set theory.

The author of field theory in linguistics was the Soviet poet, translator and linguist V.G. Admoni. He initially developed his theory on the basis of the German language. For Admoni, the concept of "field" denotes an arbitrary non-empty set of linguistic elements (for example, "lexical field", "semantic field").

The structure of the field is heterogeneous: it consists of a core, the elements of which have a complete set of features that define a set, and a periphery, the elements of which can have both the features of a given set (not all) and neighboring ones. I will give an example illustrating this statement: for example, in English, the field of compound words (“day-dream” - “dream” is difficult to separate from the field of phrases (“tear gas” - “tear gas”).

The theory of fuzzy sets already mentioned above is closely related to field theory. In the USSR, linguists V.G. Admoni, I.P. Ivanova, G.G. Pochentsov, however, its ancestor was the American mathematician L. Zadeh, who in 1965 published the article “Fuzzy Logic”. Giving a mathematical justification for the theory of fuzzy sets, Zade considered them on the basis of linguistic material.

In this theory, we are talking not so much about the belonging of elements to a given set (Aa), but about the degree of this membership (Aa), since peripheral elements can belong to several fields to one degree or another. Zade (Lofti-zade) was a native of Azerbaijan, until the age of 12 he had the practice of communicating in four languages ​​- Azerbaijani, Russian, English and Persian - and used three different alphabets: Cyrillic, Latin, Arabic. When a scientist is asked what is common between fuzzy set theory and linguistics, he does not deny this connection, but clarifies: “I am not sure that the study of these languages ​​\u200b\u200bhas had a big impact on my thinking. If this was the case, then only subconsciously. In his youth, Zadeh studied at a Presbyterian school in Tehran, and after World War II he emigrated to the United States. “The question is not whether I am an American, Russian, Azerbaijani or someone else,” he said in one of the conversations, “I am shaped by all these cultures and peoples and feel quite comfortable among each of them.” In these words there is something akin to what characterizes the theory of fuzzy sets - a departure from unambiguous definitions and sharp categories.

In our country, in the 70s, the works of Western linguists of the twentieth century were translated and studied. I.A. Melchuk translated the works of N. Chomsky into Russian. ON THE. Slyusareva in her book "The Theory of F. de Saussure in the Light of Modern Linguistics" connects the postulates of Saussure's teaching with the actual problems of linguistics of the 70s. There is a tendency towards further mathematization of linguistics. The leading domestic universities are training personnel in the specialty "Mathematical (theoretical, applied) linguistics". At the same time in the West there is a sharp leap in the development of computer technology, which requires more and more new linguistic foundations.

In the 1980s, Professor of the Institute of Oriental Studies of the Academy of Sciences Yu.K. Lekomtsev, while analyzing the language of linguistics through the analysis of schemes, tables and other types of notation used in linguistic descriptions, considers mathematical systems suitable for these purposes (mainly systems of matrix algebra).

Thus, throughout the twentieth century, there was a convergence of the exact and humanities. The interaction of mathematics with linguistics increasingly found practical applications. More on this in the next chapter.

Chapter 2. Selected examples of the use of mathematics in linguistics

2.1 Machine translation

The idea of ​​translating from one language into another with the help of a universal mechanism arose several centuries before the first developments in this area began - back in 1649, Rene Descartes proposed the idea of ​​a language in which the equivalent ideas of different languages ​​would be expressed by one symbol. The first attempts to implement this idea in the 1930s-40s, the beginning of theoretical developments in the middle of the century, the improvement of translation systems with the help of technology in the 1970s-80s, the rapid development of translation technology in the last decade - these are the stages in the development of machine translation as an industry. It is from the works on machine translation that computer linguistics as a science has grown.

With the development of computer technology in the late 70s and early 80s, researchers set themselves more realistic and cost-effective goals - the machine became not a competitor (as was previously assumed), but an assistant to a human translator. Machine translation ceases to serve exclusively military tasks (all Soviet and American inventions and research, focused primarily on Russian and English, contributed to the Cold War in one way or another). In 1978, natural language words were transmitted over the Arpa interconnected network, and six years later, the first microcomputer translation programs appeared in the United States.

In the 70s, the Commission of the European Communities buys the English-French version of the Systran computer translator, ordering also the French-English and Italian-English versions, and the Russian-to-English translation system used by the US Armed Forces. This is how the foundations of the EUROTRA project were laid.

About the revival of machine translation in the 70-80s. The following facts testify: the Commission of the European Communities (CEC) buys the English-French version of Systran, as well as the translation system from Russian into English (the latter developed after the ALPAC report and continued to be used by the US Air Force and NASA); in addition, the CEC orders the development of the French-English and Italian-English versions. Simultaneously, there is a rapid expansion of machine translation activities in Japan; in the USA, the Pan American Health Organization (PAHO) orders the development of a Spanish-English direction (SPANAM system); The US Air Force is funding the development of a machine translation system at the Linguistic Research Center at the University of Texas at Austin; The TAUM group in Canada is making notable progress in developing their METEO (meteorological translation) system. A number of projects started in the 70s and 80s. subsequently developed into full-fledged commercial systems.

During the period 1978-93, 20 million dollars were spent on research in the field of machine translation in the USA, 70 million in Europe, and 200 million in Japan.

One of the new developments is the TM (translation memory) technology, which works on the principle of accumulation: during the translation process, the original segment (sentence) and its translation are saved, resulting in the formation of a linguistic database; if an identical or similar segment is found in the newly translated text, it is displayed along with the translation and an indication of the percentage match. The translator then makes a decision (to edit, reject or accept the translation), the result of which is stored by the system, so there is no need to translate the same sentence twice. A well-known commercial system based on TM technology is currently developed by the TRADOS system (founded in 1984).

Currently, several dozen companies are developing commercial machine translation systems, including: Systran, IBM, L&H (Lernout & Hauspie), Transparent Language, Cross Language, Trident Software, Atril, Trados, Caterpillar Co., LingoWare; Ata Software; Linguistica b.v. and others. Now you can use the services of automatic translators directly on the Web: alphaWorks; PROMT's Online Translator; LogoMedia.net; AltaVista's Babel Fish Translation Service; InfiniT.com; Translating the Internet.

Commercially effective translation systems appeared in the second half of the 80s in our country as well. The very concept of machine translation has expanded (it began to include “the creation of a number of automatic and automated systems and devices that automatically or semi-automatically perform the entire translation cycle or individual tasks in a dialogue with a person”), and government appropriations for the development of this industry have increased.

Russian, English, German, French and Japanese became the main languages ​​of domestic translation systems. The All-Union Translation Center (VTsP) developed a system for translating from English and German into Russian on a computer ES-1035-ANRAP. It consisted of three dictionaries - input English and German and output Russian - under a single software. There were several interchangeable specialized dictionaries - for computer technology, programming, radio electronics, mechanical engineering, agriculture, metallurgy. The system could work in two modes - automatic and interactive, when the screen displayed the source text and translation per phrase, which a person could edit. The speed of translating text into ANRAP (from the beginning of typing to the end of printing) was approximately 100 pages per hour.

In 1989, a family of commercial translators of the SPRINT type was created, working with Russian, English, German and Japanese. Their main advantage was their compatibility with the IBM PC - thus, domestic machine translation systems reached the international level of quality. At the same time, a system of machine translation from French into Russian FRAP is being developed, which includes 4 stages of text analysis: graphematic, morphological, syntactic and semantic. In LGPI them. Herzen, work was underway on a four-language (English, French, Spanish, Russian) SILOD-MP system (English-Russian and Franco-Russian dictionaries were used in industrial mode.

For specialized translation of texts on electrical engineering, the ETAP-2 system existed. The analysis of the input text in it was carried out at two levels - morphological and syntactic. The ETAP-2 dictionary contained about 4 thousand entries; the stage of text transformation - about 1000 rules (96 general, 342 private, the rest are dictionary). All this ensured a satisfactory quality of translation (say, the title of the patent "Optical phase grid arrangement and coupling device having such an arrangement" was translated as "An optical phase grid device and a connecting device with such a device" - despite the tautology, the meaning is preserved).

At the Minsk Pedagogical Institute of Foreign Languages, on the basis of the English-Russian dictionary of word forms and phrases, a system for machine translation of titles was invented, at the Institute of Oriental Studies of the Academy of Sciences - a system for translating from Japanese into Russian. The first automatic vocabulary and terminology service (SLOTHERM) for computing and programming, created at the Moscow Research Institute of Automation Systems, contained approximately 20,000 terms in an explanatory dictionary and special dictionaries for linguistic research.

Machine translation systems gradually began to be used not only for their intended purpose, but also as an important component of automatic learning systems (for teaching translation, checking spelling and grammatical knowledge).

The 90s brought with it the rapid development of the PC market (from desktop to pocket) and information technology, the widespread use of the Internet (which is becoming more international and multilingual). All this made the further development of automated translation systems in demand. Since the early 1990s Domestic developers are also entering the PC systems market.

In July 1990, the first commercial machine translation system in Russia called PROMT (PROgrammer's Machine Translation) was presented at the PC Forum in Moscow. In 1991, ZAO [!!! In accordance with Federal Law-99 of 05.05. 2014, this form was replaced by a non-public joint-stock company] "Proekt MT", and already in 1992, the PROMT company won the NASA competition for the supply of MP systems (PROMT was the only non-American company in this competition). a whole family of systems under the new name STYLUS for translating from English, German, French, Italian and Spanish into Russian and from Russian into English, and in 1993, based on STYLUS, the world's first machine translation system for Windows was created. STYLUS 2.0 for Windows 3.X/95/NT was released, and in 1995-1996 the third generation of machine translation systems, fully 32-bit STYLUS 3.0 for Windows 95/NT, was introduced, at the same time, the development of a completely new, the world's first Russian-German and Russian-French machine translation systems.

In 1997, an agreement was signed with the French company Softissimo on the creation of translation systems from French into German and English and vice versa, and in December of this year, the world's first German-French translation system was released. In the same year, the PROMT company released a system implemented using the Giant technology, which supports several language directions in one shell, as well as a special translator for working on the Internet WebTranSite.

In 1998, a whole constellation of programs was released under the new name PROMT 98. A year later, PROMT released two new products: a unique software package for working on the Internet - PROMT Internet, and a translator for corporate mail systems - PROMT Mail Translator. In November 1999, PROMT was recognized as the best machine translation system tested by the French magazine PC Expert, outperforming its competitors by 30 percent. Special server solutions have also been developed for corporate clients - the corporate translation server PROMT Translation Server (PTS) and the Internet solution PROMT Internet Translation Server (PITS). In 2000, PROMT updated its entire line of software products by releasing a new generation of MT systems: PROMT Translation Office 2000, PROMT Internet 2000 and Magic Gooddy 2000.

Online translation with the support of the PROMT system is used on a number of domestic and foreign sites: PROMT's Online Translator, InfiniT.com, Translate.Ru, Lycos, etc., as well as in institutions of various profiles for translating business documents, articles and letters (there are translation systems built directly into Outlook Express and other email clients).

Nowadays, new machine translation technologies are emerging based on the use of artificial intelligence systems and statistical methods. About the latter - in the next section.

2.2 Extraical methods in language learning

Considerable attention in modern linguistics is given to the study of linguistic phenomena using the methods of quantitative mathematics. Quantitative data often help to more deeply comprehend the phenomena under study, their place and role in the system of related phenomena. The answer to the question "how much" helps to answer the questions "what", "how", "why" - such is the heuristic potential of a quantitative characteristic.

Statistical methods play a significant role in the development of machine translation systems (see Section 2.1). In the statistical approach, the translation problem is considered in terms of a noisy channel. Imagine that we need to translate a sentence from English into Russian. The noisy channel principle offers us the following explanation of the relationship between an English and a Russian sentence: an English sentence is nothing but a Russian sentence distorted by some kind of noise. In order to recover the original Russian sentence, we need to know what people usually say in Russian and how Russian phrases are distorted into English. The translation is carried out by searching for such a Russian sentence that maximizes the products of the unconditional probability of the Russian sentence and the probability of the English sentence (original) given the given Russian sentence. According to Bayes' theorem, this Russian sentence is the most likely translation of English:

where e is the translation sentence and f is the original sentence

So we need a source model and a channel model, or a language model and a translation model. The language model must assign a probability score to any sentence in the target language (in our case, Russian), and the translation model to the original sentence. (see table 1)

In general, a machine translation system operates in two modes:

1. System training: a training corpus of parallel texts is taken, and using linear programming, such values ​​of translation correspondence tables are searched for that maximize the probability of (for example) the Russian part of the corpus with the available English according to the selected translation model. A model of the Russian language is built on the Russian part of the same corpus.

2. Exploitation: based on the obtained data for an unfamiliar English sentence, a Russian is searched that maximizes the product of the probabilities assigned by the language model and the translation model. The program used for such a search is called a decoder.

The simplest statistical translation model is the literal translation model. In this model, it is assumed that to translate a sentence from one language to another, it is enough to translate all the words (create a “bag of words”), and the model will provide their placement in the correct order. To reduce P(a, f | e) to P(a | e , f), i.e. probabilities of a given alignment given a pair of sentences, each probability P(a, f | e) is normalized by the sum of the probabilities of all alignments of a given pair of sentences:

The implementation of the Viterbi algorithm used to train Model #1 is as follows:

1. The entire table of translation correspondence probabilities is filled with the same values.

2. For all possible variants of pairwise connections of words, the probability P(a, f | e) is calculated:

3. The values ​​of P(a, f | e) are normalized to obtain the values ​​of P(a | e, f).

4. The frequency of each translation pair is calculated, weighted by the probability of each alignment option.

5. The resulting weighted frequencies are normalized and form a new table of translation correspondence probabilities

6. The algorithm is repeated from step 2.

Consider, as an example, the training of a similar model on a corpus of two pairs of sentences (Fig. 2):

White House

After a large number of iterations, we will get a table (Table 2), which shows that the translation is carried out with high accuracy.

Also, statistical methods are widely used in the study of vocabulary, morphology, syntax, and style. Scientists from Perm State University conducted a study based on the assertion that stereotypical phrases are an important "building material" of the text. These phrases consist of "nuclear" repeated words and dependent words-specifiers and have a pronounced stylistic coloring.

In the scientific style, "nuclear" words can be called: research, study, task, problem, question, phenomenon, fact, observation, analysis etc. In journalism, other words will be “nuclear”, which have an increased value specifically for the text of the newspaper: time, person, power, business, action, law, life, history, place etc. (total 29)

Of particular interest to linguists is also the professional differentiation of the national language, the peculiarity of the use of vocabulary and grammar, depending on the type of occupation. It is known that drivers in professional speech use the form w about fer, the medics say k about club instead of cocktail Yu sh - such examples can be given. The task of statistics is to track the variability of pronunciation and the change in the language norm.

Professional differences lead to differences not only grammatical, but also lexical. Yakut State University named after M.K. Ammosov, 50 questionnaires were analyzed with the most common reactions to certain words among physicians and builders (Table 3).

Builders

Human

patient (10), personality (5)

man (5)

good

help (8), help (7)

evil (16)

a life

death (10)

lovely (5)

death

corpse (8)

life (6)

the fire

heat (8), burn (6)

fire (7)

finger

hand (14), panaritium (5)

large (7), index (6)

eyes

vision (6), pupil, ophthalmologist (5 each)

brown (10), large (6)

head

mind (14), brains (5)

big (9), smart (8), smart (6)

lose

consciousness, life (4 each)

money (5), find (4)

It can be noted that physicians more often than builders give associations related to their professional activities, since the stimulus words given in the questionnaire have more to do with their profession than with the profession of a builder.

Statistical regularities in a language are used to create frequency dictionaries - dictionaries that give numerical characteristics of the frequency of words (word forms, phrases) of any language - the language of the writer, any work, etc. Usually, the frequency of occurrence of a word is used as a characteristic of the frequency of occurrence of a word in the text of a certain volume

The model of speech perception is impossible without a dictionary as its essential component. In the perception of speech, the basic operational unit is the word. From this it follows, in particular, that each word of the perceived text must be identified with the corresponding unit of the listener's (or reader's) internal vocabulary. It is natural to assume that from the very beginning the search is limited to some subdomains of the dictionary. According to most modern theories of speech perception, the actual phonetic analysis of the sounding text in a typical case provides only some partial information about the possible phonological appearance of the word, and this kind of information corresponds to not one, but a certain MANY words of the dictionary; Therefore, two problems arise:

(a) select the appropriate set according to certain parameters;

(b) within the bounds of the outlined set (if it is allocated adequately) to "eliminate" all words, except for the only one that best corresponds to the given word of the recognized text. One of the "dropout" strategies is to exclude low-frequency words. It follows that the vocabulary for speech perception is a frequency dictionary. It is the creation of a computer version of the frequency dictionary of the Russian language that is the initial task of the presented project.

Based on the material of the Russian language, there are 5 frequency dictionaries (not counting branch dictionaries). Let us note only some general shortcomings of the existing dictionaries.

All known frequency dictionaries of the Russian language are based on processing arrays of written (printed) texts. Partly for this reason, when the identity of a word is largely based on formal, graphic coincidence, semantics is not sufficiently taken into account. As a result, the frequency characteristics are also shifted, distorted; for example, if the compiler of the frequency dictionary includes words from the combination "each other" in the general statistics of the use of the word "friend", then this is hardly justified: given the semantics, we must admit that these are already different words, or rather, that an independent dictionary unit is just the combination as a whole.

Also, in all existing dictionaries, words are placed only in their basic forms: nouns in the singular form, nominative case, verbs in the infinitive form, etc. Some of the dictionaries provide information about the frequency of word forms, but usually they do not do it consistently enough, not in an exhaustive way. The frequencies of different word forms of the same word obviously do not match. The developer of a speech perception model must take into account that in a real perceptual process, it is precisely a specific word form that is “immersed” in the text that is subject to recognition: based on the analysis of the initial section of the exponent of the word form, a set of words with an identical beginning is formed, and the initial section of the word form is not necessarily identical to the initial section of the dictionary form . It is the word form that has a specific rhythmic structure, which is also an extremely important parameter for the perceptual selection of words. Finally, in the final representation of the recognized utterance, again, the words are represented by the corresponding word forms.

There are many works that demonstrate the importance of frequency in the process of speech perception. But we are not aware of works where the frequency of word forms would be used - on the contrary, all authors practically ignore the frequency of individual word forms, referring exclusively to lexemes. If the results obtained by them are not considered artifacts, one has to assume that the native speaker somehow has access to information about the ratio of the frequencies of word forms and dictionary forms, i.e., in fact, lexemes. Moreover, such a transition from a word form to a lexeme, of course, cannot be explained by natural knowledge of the corresponding paradigm, since frequency information must be used before the final identification of the word, otherwise it simply loses its meaning.

According to the primary statistical characteristics, it is possible to determine with a given relative error that part of the vocabulary, which includes words with a high frequency of occurrence, regardless of the type of text. It is also possible, by introducing stepwise ordering into the dictionary, to obtain a series of dictionaries covering the first 100, 1000, 5000, etc. of frequent words. The statistical characteristics of the dictionary are of interest in connection with the semantic analysis of vocabulary. The study of subject-ideological groups and semantic fields shows that lexical associations are supported by semantic links that are concentrated around lexemes with the most common meaning. The description of meanings within the lexico-semantic field can be carried out by identifying words with the most abstract lexemes in meaning. Apparently, "empty" (from the point of view of nominative potencies) dictionary units constitute a statistically homogeneous layer.

Vocabularies for individual genres are no less valuable. Studying the measure of their similarity and the nature of statistical distributions will provide interesting information about the qualitative stratification of vocabulary depending on the sphere of speech use.

Compilation of large frequency dictionaries requires the use of computer technology. INTRODUCING partial mechanization and automation into the process of working on a dictionary is of interest as an experiment in the machine processing of dictionaries for different texts. Such a dictionary requires a more rigorous system for processing and accumulating vocabulary material. In miniature, this is an information retrieval system that is able to provide information about various aspects of the text and vocabulary. Some basic requests to this system are planned from the very beginning: the total number of inventory words, the statistical characteristics of a single word and entire dictionaries, the ordering of frequent and rare zones of the dictionary, etc. The machine card file allows you to automatically build reverse dictionaries for individual genres and sources. Many other useful statistical information about the language will be extracted from the accumulated array of information. The computer frequency dictionary creates an experimental basis for the transition to a more extensive automation of vocabulary work.

The statistical data of frequency dictionaries can also be widely used in solving other linguistic problems - for example, in analyzing and determining the active means of word formation of the modern Russian language, solving issues of improving graphics and spelling, which are related to taking into account statistical information about the vocabulary (with all this, it is important to take into account probabilistic characteristics of grapheme combinations, types of letter combinations realized in words), practical transcription and transliteration. The statistical parameters of the dictionary will also be useful in solving problems of automating typing, recognition and automatic reading of literal text.

Modern explanatory dictionaries and grammars of the Russian language are mainly built on the basis of literary and artistic texts. There are frequency dictionaries of the language of A.S. Pushkin, A.S. Griboedova, F.M. Dostoevsky, V.V. Vysotsky and many other authors. At the Department of History and Theory of Literature of the Smolensk State. Pedagogical University has been working for a number of years to compile frequency dictionaries of poetic and prose texts. For this study, frequency dictionaries of all the lyrics of Pushkin and two more poets of the golden age - "Woe from Wit" by Griboyedov and all of Lermontov's poetry were selected; Pasternak and five other poets of the Silver Age - Balmont 1894-1903, "Poems about the Beautiful Lady" by Blok, "Stone" by Mandelstam, "Pillar of Fire" by Gumilyov, "Anno Domini MCMXXI" by Akhmatova and "Sisters of My Life" by Pasternak and four more poets of the Iron Age - "Poems by Yuri Zhivago", "When it clears up", the entire corpus of lyrics by M. Petrovs, "The road is far away", "Windscreen", "Farewell to the snow" and "Horseshoes" by Mezhirov, "Antimirov" by Voznesensky and "Snowballs » Rylenkova.

It should be noted that these dictionaries are different in nature: some represent the vocabulary of one dramatic work, others - books of lyrics, or several books, or the entire corpus of the poet's poems. The results of the analysis presented in this paper should be taken with caution, they cannot be taken as an absolute. At the same time, with the help of special measures, the difference in the ontological nature of texts can be reduced to a certain extent.

In recent years, the opposition between colloquial and book speech has become more and more clearly realized. This issue is especially sharply discussed among methodologists who demand a turn in teaching towards the spoken language. At the same time, the specificity of colloquial speech still remains unexplained.

Dictionaries were processed by creating a user application in the environment of the EXCEL97 office program. The application includes four worksheets of the EXCEL book - "Title Sheet", "Dictionaries" sheet with initial data, "Proximities" and "Distances" with results, as well as a set of macros.

The initial information is entered on the "Dictionaries" sheet. Dictionaries of the studied texts are written into EXCEL cells, the last column S is formed from the results obtained and is equal to the number of words found in other dictionaries. The tables "Proximity" and "Distances" contain calculated measures of proximity M, correlation R and distance D.

Application macros are event-based programming procedures written in Visual Basic for Application (VBA). Procedures are based on VBA library objects and their processing methods. So, for operations with worksheets of the application, the key object Worksheet (worksheet) and the corresponding method of activating the sheet Activate (activate) are used. Setting the range of the analyzed source data on the Dictionary sheet is performed by the Select method of the Range object (range), and the transfer of words as values ​​to variables is performed as the Value property (value) of the same Range object.

Despite the fact that rank correlation analysis makes us cautious about the dependence of topics between different texts, most of the most frequent words in each text have matches in one or more other texts. Column S shows the number of such words among the 15 most frequent words for each author. Words in bold type appear only in one poet's words in our table. Blok, Akhmatova and Petrovs have no highlighted words at all, they have S = 15. These three poets have the same 15 most frequent words, they differ only in the place in the list. But even Pushkin, whose vocabulary is the most original, has S = 8, and there are 7 highlighted words.

The results show that there is a certain layer of vocabulary that concentrates the main themes of poetry. As a rule, these words are short: out of the total number (225) of single-syllable word usages 88, two-syllable 127, three-syllable 10. Often these words represent the main mythologemes and can fall into pairs: night - day, earth - sky (sun), God - man (people), life - death, body - soul, Rome - world(at Mandelstam); can be combined into mythologems of a higher level: sky, star, sun, earth; in a person, as a rule, the body, heart, blood, arm, leg, cheek, eyes stand out. Of the human states, preference is given to sleep and love. The house and cities belong to the human world - Moscow, Rome, Paris. Creativity is represented by lexemes word and song.

Griboedov and Lermontov have almost no words denoting nature among the most frequent words. They have three times as many words denoting a person, parts of his body, elements of his spiritual world. Pushkin and poets of the twentieth century. designations of man and nature are approximately equal. In this important aspect of the subject, we can say that the twentieth century. followed Pushkin.

Minimal Theme case among the most frequent words, it is found only in Griboyedov and Pushkin. Lermontov and poets of the twentieth century. it gives way to a minimal theme word. The word does not exclude deeds (the biblical interpretation of the topic: in the New Testament, all the teachings of Jesus Christ are regarded as the word of God or the word of Jesus, and the apostles sometimes call themselves ministers of the Word). The sacred meaning of the lexeme word is convincingly manifested, for example, in Pasternak's verse "And the image of the world, revealed in the Word." The sacred meaning of the lexeme word in conjunction with and contrast with human affairs, it is convincingly manifested in the poem of the same name by Gumilyov.

Tokens that are found only in one text characterize the originality of a given book or a collection of books. For example, the word "mind" is the most frequent in Griboedov's comedy "Woe from Wit" - but it does not occur among the frequency words of other texts. The theme of the mind is by far the most significant in comedy. This lexeme accompanies the image of Chatsky, and the name of Chatsky is the most frequent in comedy. Thus, the work organically combines the most frequent common noun with the most frequent proper name.

The highest correlation coefficient connects the themes of the tragic books "The Pillar of Fire" by Gumilyov and "Anno Domini MCMXXI" by Akhmatova. Among the 15 most frequent nouns, there are 10 common ones, including blood, heart, soul, love, word, sky. Recall that Akhmatova's book included a miniature "You will not be alive ...", written between the arrest of Gumilyov and his execution.

The themes of the candle and the crowd in the studied material are found only in the "Poems of Yuri Zhivago". The theme of the candle in the verses from the novel has many contextual meanings: it is associated with the image of Jesus Christ, with the themes of faith, immortality, creativity, love date. The candle is the most important source of light in the central scenes of the novel. The theme of the crowd develops in connection with the main idea of ​​the novel, in which the private life of a person with its unshakable values ​​is opposed to the immorality of the new state, built on the principles of pleasing the crowd.

The work also involves the third stage, also reflected in the program - this is the calculation of the difference in the ordinal numbers of words common to two dictionaries and the average distance between the same words of two dictionaries. This stage allows moving from the general trends in the interaction of dictionaries identified with the help of statistics to a level approaching the text. For example, the books of Gumilyov and Akhmatova correlate statistically significantly. We look at which words turned out to be common for their dictionaries, and, first of all, we choose those whose serial numbers differ minimally or equal to zero. It is these words that have the same rank number and, consequently, it is these minimal themes in the minds of the two poets that are equally important. Next, you should move to the level of texts and contexts.

Quantitative methods also help to study the characteristics of peoples - native speakers. Say, there are 6 cases in Russian, there are no cases in English, and in some languages ​​of the peoples of Dagestan, the number of cases reaches 40. L. Perlovsky in his article “Consciousness, Language and Culture” correlates these characteristics with the tendency of peoples to individualism or collectivism, with perception of things and phenomena separately or in connection with others. After all, it was in the English-speaking world (there are no cases - the thing is perceived “by itself”) that such concepts as individual freedom, liberalism and democracy appeared (I note that I use these concepts only in connection with the language, without any evaluative characteristics). Despite the fact that such guesses still remain only at the level of bold scientific hypotheses, they help to look at already familiar phenomena in a new way.

As we can see, quantitative characteristics can be applied in completely different areas of linguistics, which increasingly blurs the boundaries between "exact" and "humanitarian" methods. Linguistics is increasingly resorting to the help of not only mathematics, but also computer technology to solve its problems.

2.3 Learning Ilanguage by methods of formal logic

With non-quantitative methods of mathematics, in particular, with logic, modern theoretical linguistics interacts no less fruitfully than with quantitative ones. The rapid development of computer technologies and the growth of their role in the modern world required a revision of the approach to the interaction of language and logic in general.

The methods of logic are widely used in the development of formalized languages, in particular, programming languages, the elements of which are some symbols (akin to mathematical), chosen (or constructed from previously selected symbols) and interpreted in a certain way, related to any "traditional" use, understanding and understanding. functions of the same symbols in other contexts. A programmer constantly deals with logic in his work. The meaning of programming is just to teach the computer to reason (in the broadest sense of the word). At the same time, the methods of "reasoning" turn out to be very different. Every programmer spends a certain amount of time looking for bugs in their own and other people's programs. That is, to search for errors in reasoning, in logic. And this also leaves its mark. It is much easier to detect logical errors in ordinary speech. The relative simplicity of the languages ​​studied by logicians allows them to elucidate the structures of these languages ​​more clearly than is achievable by linguists who analyze exclusively complex natural languages. In view of the fact that the languages ​​studied by logicians use relations copied from natural languages, logicians are able to make significant contributions to the general theory of language. The situation here is similar to that which takes place in physics: the physicist also formulates theorems for ideally simplified cases that do not occur in nature at all - he formulates laws for ideal gases, ideal liquids, talks about motion in the absence of friction, etc. For these idealized cases, simple laws can be established that would greatly contribute to the understanding of what really happens and what would probably remain unknown to physics if it tried to consider reality directly, in all its complexity.

In the study of natural languages, logical methods are used so that language learners can not stupidly “memorize” as many words as possible, but better understand its structure. L. Shcherba also used in his lectures an example of a sentence built according to the laws of the Russian language: “The glitched kuzdra shteko boked the bokra and curls the bokra,” and then asked the students what this meant. Despite the fact that the meaning of the words in the sentence remained unclear (they simply do not exist in Russian), it was possible to clearly answer: “kuzdra” is the subject, a feminine noun, in the singular, nominative case, “bokr” is animated, and etc. The translation of the phrase turns out to be something like this: “Something feminine in one go did something over some kind of male creature, and then began to do something long, gradual with its cub.” A similar example of a text (artistic) from non-existent words, built entirely according to the laws of the language, is Lewis Carroll's Jabberwock (in Alice in Wonderland, Carroll, through the mouth of his character Humpty Dumpty, explains the meaning of the words he invented: "cooked" - eight o'clock in the evening, when it's time to cook dinner, "chlivky" - flimsy and dexterous, "shorek" - a cross between a ferret, a badger and a corkscrew, "dive" - ​​jump, dive, spin, "nava" - grass under the sundial (extends a little to the right , a little to the left and a little back), “grunt” - grunt and laugh, “zelyuk” - a green turkey, “myumzik” - a bird; her feathers are disheveled and stick out in all directions, like a broom, “mova” - far from home) .

One of the main concepts of modern logic and theoretical linguistics, used in the study of languages ​​of various logico-mathematical calculus, natural languages, to describe the relationship between languages ​​of different "levels" and to characterize the relationship between the languages ​​under consideration and the subject areas described with their help, is the concept of metalanguage. A metalanguage is a language used to express judgments about another language, the language-object. With the help of a metalanguage, they study the structure of character combinations (expressions) of the language-object, prove theorems about its expressive properties, about its relation to other languages, etc. The language being studied is also called the subject language in relation to this metalanguage. Both the subject language and the metalanguage can be ordinary (natural) languages. The metalanguage may differ from the object language (for example, in an English textbook for Russians, Russian is the metalanguage, and English is the object language), but it may also coincide with it or differ only partially, for example, in special terminology (Russian linguistic terminology is an element of the metalanguage to describe the Russian language, the so-called semantic factors are part of the metalanguage for describing the semantics of natural languages).

The concept of "metalinguage" has become very fruitful in connection with the study of formalized languages ​​that are built within the framework of mathematical logic. Unlike formalized subject languages, in this case the metalanguage, by means of which the metatheory is formulated (studying the properties of the subject theory formulated in the subject language), is, as a rule, an ordinary natural language, in some special way a limited fragment of a natural language that does not contain any kind of ambiguity. , metaphors, "metaphysical" concepts, etc. elements of ordinary language that prevent its use as a tool for accurate scientific research. At the same time, the metalanguage itself can be formalized and (regardless of this) become the subject of research carried out by means of the metametalanguage, and such a series can be “thought” as growing indefinitely.

Logic teaches us a fruitful distinction between the language-object and the metalanguage. The language-object is the very subject of logical research, and the metalanguage is that inevitably artificial language in which such research is conducted. Logical thinking just consists in formulating the relations and structure of a real language (object language) in the language of symbols (metalanguage).

The metalanguage must in any case be “not poorer” than its objective language (that is, for each expression of the latter in the metalanguage there must be its name, “translation”) - otherwise, if these requirements are not met (which certainly takes place in natural languages, if special agreements do not provide otherwise) semantic paradoxes (antinomies) arise.

As more and more new programming languages ​​were created, in connection with the problem of programming translators, there was an urgent need to create metalanguages. At present, the Backus-Naur form metalanguage (abbreviated as BNF) is the most commonly used for describing the syntax of programming languages. It is a compact form in the form of some formulas similar to mathematical ones. For each concept of the language there is a unique metaformula (normal formula). It consists of left and right parts. The left side specifies the concept being defined, and the right side specifies the set of admissible language constructs that are combined into this concept. The formula uses special metacharacters in the form of angle brackets, which contain the defined concept (in the left side of the formula) or a previously defined concept (in its right side), and the separation of the left and right parts is indicated by the "::=" metacharacter, the meaning of which is equivalent to the words "by definition there is". Metalinguistic formulas are embedded in translators in some form; with their help, the constructs used by the programmer are checked for formal compliance with any of the constructs that are syntactically valid in this language. There are also separate metalanguages ​​of various sciences - thus, knowledge exists in the form of various metalanguages.

Logical methods also served as the basis for the creation of artificial intelligence systems based on the concept of connectionism. Connectionism is a special trend in philosophical science, the subject of which is questions of knowledge. Within the framework of this trend, attempts are being made to explain the intellectual abilities of a person using artificial neural networks. Composed of a large number of structural units similar to neurons, with a weight assigned to each element that determines the strength of the connection with other elements, neural networks are simplified models of the human brain. Experiments with neural networks of this kind have demonstrated their ability to learn to perform tasks such as pattern recognition, reading, and identifying simple grammatical structures.

Philosophers began to take an interest in connectionism, as the connectionist approach promised to provide an alternative to the classical theory of the mind and the idea widely held within this theory that the workings of the mind are similar to the processing of symbolic language by a digital computer. This concept is very controversial, but in recent years it has found more and more supporters.

The logical study of language continues Saussure's concept of language as a system. The fact that it is constantly continuing confirms once again the boldness of scientific conjectures of the beginning of the last century. I will devote the last section of my work to the prospects for the development of mathematical methods in linguistics today.

2.4 Prospects for the application of mathematical methods in linguistics

In the era of computer technology, the methods of mathematical linguistics have received a new development perspective. The search for solutions to the problems of linguistic analysis is now increasingly being implemented at the level of information systems. At the same time, automation of the process of processing linguistic material, providing the researcher with significant opportunities and advantages, inevitably puts forward new requirements and tasks for him.

The combination of "exact" and "humanitarian" knowledge has become fertile ground for new discoveries in the field of linguistics, computer science and philosophy.

Machine translation from one language to another remains a rapidly growing branch of information technology. Despite the fact that computer-assisted translation can never be compared in quality to human translation (especially for literary texts), the machine has become an indispensable assistant to a person in translating large volumes of text. It is believed that in the near future more advanced translation systems will be created, based primarily on the semantic analysis of the text.

An equally promising direction is the interaction of linguistics and logic, which serves as a philosophical foundation for understanding information technology and the so-called "virtual reality". In the near future, work will continue on the creation of artificial intelligence systems - although, again, it will never be equal to the human in its capabilities. Such competition is meaningless: in our time, the machine should become (and becomes) not a rival, but an assistant to man, not something from the realm of fantasy, but part of the real world.

The study of the language by statistical methods continues, which makes it possible to more accurately determine its qualitative properties. It is important that the most daring hypotheses about language find their mathematical, and therefore logical, proof.

The most significant thing is that various branches of the application of mathematics in linguistics, previously quite isolated, in recent years have been correlated with each other, connecting into a coherent system, by analogy with the language system discovered a century ago by Ferdinand de Saussure and Yvan Baudouin de Courtenay. This is the continuity of scientific knowledge.

Linguistics in the modern world has become the foundation for the development of information technology. As long as computer science remains a rapidly developing branch of human activity, the union of mathematics and linguistics will continue to play its role in the development of science.

Conclusion

Over the 20th century, computer technologies have come a long way - from military to peaceful use, from a narrow range of goals to penetration into all branches of human life. Mathematics as a science found ever new practical significance with the development of computer technology. This process continues today.

The previously unthinkable "tandem" of "physicists" and "lyricists" has become a reality. For the full interaction of mathematics and computer science with the humanities, qualified specialists were required from both sides. While computer scientists are increasingly in need of systematic humanitarian knowledge (linguistic, cultural, philosophical) in order to comprehend changes in the reality around them, in the interaction of man and technology, to develop more and more new linguistic and mental concepts, to write programs, then any "Humanities" in our time for their professional growth must master at least the basics of working with a computer.

Mathematics, being closely interconnected with informatics, continues to develop and interact with natural sciences and the humanities. In the new century, the trend towards the mathematization of science is not weakening, but, on the contrary, is increasing. On the basis of quantitative data, the laws of the development of the language, its historical and philosophical characteristics are comprehended.

Mathematical formalism is most suitable for describing patterns in linguistics (as, indeed, in other sciences - both the humanities and the natural). The situation sometimes develops in science in such a way that without the use of an appropriate mathematical language, it is impossible to understand the nature of physical, chemical, etc. process is not possible. Creating a planetary model of the atom, the famous English physicist of the XX century. E. Rutherford experienced mathematical difficulties. At first, his theory was not accepted: it did not sound convincing, and the reason for this was Rutherford's ignorance of the theory of probability, on the basis of the mechanism of which it was only possible to understand the model representation of atomic interactions. Realizing this, already by that time an outstanding scientist, the owner of the Nobel Prize, enrolled in the seminar of the mathematician Professor Lamb and for two years, together with the students, attended a course and worked out a workshop on the theory of probability. Based on it, Rutherford was able to describe the behavior of the electron, giving his structural model convincing accuracy and gaining recognition. The same is with linguistics.

This begs the question, what is so mathematical in objective phenomena, thanks to which they can be described in the language of mathematics, in the language of quantitative characteristics? These are homogeneous units of matter distributed in space and time. Those sciences that have gone farther than others towards the isolation of homogeneity, and turn out to be better suited for the use of mathematics in them.

The Internet, which rapidly developed in the 1990s, brought together representatives of various countries, peoples and cultures. Despite the fact that English continues to be the main language of international communication, the Internet has become multilingual in our time. This led to the development of commercially successful machine translation systems that are widely used in various fields of human activity.

Computer networks have become an object of philosophical reflection - more and more new linguistic, logical, worldview concepts have been created that help to understand "virtual reality". In many works of art, scenarios were created - more often pessimistic ones - about the dominance of machines over a person, and virtual reality - over the outside world. Far from always such forecasts turned out to be meaningless. Information technology is not only a promising industry for investing human knowledge, it is also a way to control information, and, consequently, over human thought.

This phenomenon has both a negative and a positive side. Negative - because control over information is contrary to the inalienable human right to free access to it. Positive - because the lack of this control can lead to catastrophic consequences for humanity. Suffice it to recall one of the wisest films of the last decade - "When the World Ends" by Wim Wenders, whose characters are completely immersed in the "virtual reality" of their own dreams recorded on a computer. At the same time, not a single scientist and not a single artist can give an unambiguous answer to the question: what awaits science and technology in the future.

Focusing on the "future", sometimes seeming fantastic, was a distinctive feature of science in the mid-twentieth century, when inventors sought to create perfect models of technology that could work without human intervention. Time has shown the utopian nature of such research. At the same time, it would be superfluous to condemn scientists for this - without their enthusiasm in the 1950s - 60s, information technology would not have made such a powerful leap in the 90s, and we would not have what we have now.

The last decades of the twentieth century changed the priorities of science - research, inventive pathos gave way to commercial interest. Again, this is neither good nor bad. This is a reality in which science is increasingly integrated into everyday life.

The 21st century has continued this trend, and in our time behind inventions are not only fame and recognition, but, first of all, money. This is also why it is important to ensure that the latest achievements of science and technology do not fall into the hands of terrorist groups or dictatorial regimes. The task is difficult to the point of impossibility; to realize it as much as possible is the task of the entire world community.

Information is a weapon, and weapons are no less dangerous than nuclear or chemical weapons - only it does not act physically, but rather psychologically. Humanity needs to think about what is more important for it in this case - freedom or control.

The latest philosophical concepts related to the development of information technologies and an attempt to comprehend them have shown the limitations of both natural-science materialism, which dominated during the 19th and early 20th centuries, and extreme idealism, which denies the significance of the material world. It is important for modern thought, especially the thought of the West, to overcome this dualism in thinking, when the surrounding world is clearly divided into material and ideal. The path to this is a dialogue of cultures, a comparison of different points of view on the surrounding phenomena.

Paradoxically, information technology can play an important role in this process. Computer networks, and especially the Internet, are not only a resource for entertainment and vigorous commercial activity, they are also a means of meaningful, controversial communication between representatives of various civilizations in the modern world, as well as for a dialogue between the past and the present. We can say that the Internet pushes the spatial and temporal boundaries.

And in the dialogue of cultures through information technology, the role of language as the oldest universal means of communication is still important. That is why linguistics, in interaction with mathematics, philosophy and computer science, has experienced its second birth and continues to develop today. The trend of the present will continue in the future - "until the end of the world", as 15 years ago, the same V. Wenders predicted. True, it is not known when this end will occur - but is it important now, because the future will sooner or later become the present anyway.

Appendix 1

Ferdinand de Saussure

The Swiss linguist Ferdinand de Saussure (1857-1913) is widely considered to be the founder of modern linguistics in its attempts to describe the structure of language rather than the history of particular languages ​​and language forms. In fact, the method of Structuralism in linguistics and literary studies and a significant branch of Semiotics find their major starting point in his work at the turn of the twentieth century. It has even been argued that the complex of strategies and conceptions that has come to be called "poststructuralism" - the work of Jacques Derrida, Michel Foucault, Jacques Lacan, Julia Kristeva, Roland Barthes, and others - is suggested by Saussure"s work in linguistics and anagrammatic readings of late Latin poetry. literary modernism to psychoanalysis and philosophy in the early twentieth century. As Algirdas Julien Greimas and Joseph Courtes argue in Semiotics and Language: An Analytic Dictionary, under the heading "Interpretation," a new mode of interpretation arose in the early twentieth century which they identify with Saussurean linguistics, Husserlian Phenomenology, and Freudian psychoanalysis. In this mode, "interpretation is no longer a matter of attributing a given content to a form which would otherwise lack one; rather, it is a paraphrase which formulates in another fashion the equivalent content of a signifying element within a given semiotic system" ( 159). in this understanding of "interpretation," form and content are not distinct; rather, every "form" is, alternatively, a semantic "content" as well, a "signifying form," so that interpretation offers an analogical paraphrase of something that already signifies within some other system of signification.

Such a reinterpretation of form and understanding - which Claude Levi-Strauss describes in one of his most programmatic articulations of the concept of structuralism, in "Structure and Form: Reflections on a Work by Vladimir Propp" - is implicit in Saussure"s posthumous Course in General Linguistics (1916, trans., 1959, 1983).In his lifetime, Saussure published relatively little, and his major work, the Course, was the transcription by his students of several courses in general linguistics he offered in 1907-11. In the Course Saussure called for the "scientific" study of language as opposed to the work in historical linguistics that had been done in the nineteenth century. That work is one of the great achievements of Western intellect: taking particular words as the building blocks of language, historical (or "diachronic") linguistics traced the origin and development of Western languages ​​from a putative common language source, first an "Indo-European" language and then an earlier "p roto-Indo-European" language.

It is precisely this study of the unique occurrences of words, with the concomitant assumption that the basic "unit" of language is, in fact, the positive existence of these "word-elements," that Saussure questioned. His work was an attempt to reduce the mass of facts about language, studied so minutely by historical linguistics, to a manageable number of propositions. The "comparative school" of nineteenth-century Philology, Saussure says in the Course, "did not succeed in setting up the true science of linguistics" because "it failed to seek out the nature of its object of study" ( 3). That "nature," he argues, is to be found not simply in the "elemental" words that a language comprises - the seeming "positive" facts (or "substances") of language - but in the formal relationships that give rise to those "substances."

Saussure"s systematic reexamination of language is based upon three assumptions. The first is that the scientific study of language needs to develop and study the system rather than the history of linguistic phenomena. For this reason, he distinguishes between the particular occurrences of language - its particular "speech-events," which he designates as parole - and the proper object of linguistics, the system (or "code") governing those events, which he designates as langue. Such a systematic study, moreover, calls for a " synchronic" conception of the relationship among the elements of language at a particular instant rather than the "diachronic" study of the development of language through history.

This assumption gave rise to what Roman Jakobson in 1929 came to designate as "structuralism," in which "any set of phenomena examined by contemporary science is treated not as a mechanical agglomeration but as a structural whole the mechanical conception of processes yields to the question of their function" ("Romantic" 711). In this passage Jakobson is articulating Saussure"s intention to define linguistics as a scientific system as opposed to a simple, "mechanical" accounting of historical accidents. Along with this, moreover, Jakobson is also describing the second foundational assumption in Saussurean - we can now call it "structural" - linguistics: that the basic elements of language can only be studied in relation to their functions rather than in relation to their causes. European "words"), those events and entities have to be situated within a systemic framework in which they are related to other so-called events and entities. This is a radical reorientation in conceiving of experience and phenomena, one whose importance the philosopher Ernst Cassirer has compared to "the new science of Galileo which in the seventeenth century changed our whole concept of the physical world" (cited in Culler, Pursuit 2 4). This change, as Greimas and Courtes note, reconceives "interpretation" and thus reconceives explanation and understanding themselves. Instead of explanation "s being in terms of a phenomenon"s causes, so that, as an "effect," it is in some ways subordinate to its causes, explanation here consists in subordinating a phenomenon to its future-oriented "function" or "purpose." Explanation is no longer independent of human intentions or purposes (even though those intentions can be impersonal, communal, or, in Freudian terms, "unconscious").

In his linguistics Saussure accomplishes this transformation specifically in the redefinition of the linguistic "word," which he describes as the linguistic "sign" and defines in functionalist terms. The sign, he argues, is the union of "a concept and a sound image," which he called "signified and signifier " (66-67; Roy Harris"s 1983 translation offers the terms "signification" and "signal" ). The nature of their "combination" is "functional" in that neither the signified nor the signifier is the "cause" of the other; rather, "each its values ​​from the other" (8). element of language, the sign, relationally and makes the basic assumption of historical linguistics, namely, the identity of the elemental units of language and signification (i.e., "words"), subject to rigorous analysis. the word "tree" as the "same" word is not because the word is defined by inherent qualities - it is not a "mechanical agglomeration" of such qualities - but because it is defined as an element in a system, the "structural whole" ," of language.

Such a relational (or "diacritical") definition of an entity governs the conception of all the elements of language in structural linguistics. This is clearest in the most impressive achievement of Saussurean linguistics, the development of the concepts of the "phonemes" and "distinctive features" of language. Phonemes are the smallest articulated and signifying units of a language. They are not the sounds that occur in language but the "sound images" Saussure mentions, which are apprehended by speakers - phenomenally apprehended - as conveying meaning. (Thus, Elmar Holenstein describes Jakobson's linguistics, which follows Saussure in important ways, as "phenomenological structuralism.") It is for this reason that the leading spokesperson for Prague School Structuralism, Jan Mukarovsky, noted in 1937 that "structure . . . is a phenomenological and not an empirical reality; it is not the work itself, but a set of functional relationships which are located in the consciousness of a collective (generation, milieu, etc.)" (cited in Galan 35). Similarly, Levi-Strauss, the leading spokesperson for French structuralism , noted in 1960 that "structure has no distinct content; it is content itself, and the logical organization in which it is arrested is conceived as a property of the real" (167; see also Jakobson, Fundamentals 27-28).

Phonemes, then, the smallest perceptible elements of language, are not positive objects but a "phenomenological reality." In English, for instance, the phoneme /t/ can be pronounced in many different ways, but in all cases an English speaker will recognize it as functioning as a /t/. An aspirated t (i.e., a t pronounced with an h-like breath after it), a high-pitched or low-pitched t sound, an extended t sound, and so on, will all function in the same manner in distinguishing the meaning of "to" and "do" in English. Moreover, the differences between languages ​​are such that phonological variations in one language can constitute distinct phonemes in another; thus, English distinguishes between /l/ and /r/, whereas other languages ​​are so structured that these articulations are considered variations of the same phoneme (like the aspirated and unaspirated t in English). In every natural language, the vast number of possible words is a combination of a small number of phonemes. English, for instance, possesses less than 40 phonemes that combine to form over a million different words.

The phonemes of language are themselves systematically organized structures of features. In the 1920s and 1930s, following Saussure "s lead, Jakobson and N. S. Trubetzkoy isolated the "distinctive features" of phonemes. These features are based upon the physiological structure of the speech organs - tongue, teeth, vocal chords, and so on - that Saussure mentions in the Course and that Harris describes as "physiological phonetics" ( 39; Baskin"s earlier translation uses the term "phonology" [(1959) 38]) - and they combine in "bundles" of binary oppositions to form phonemes. For instance, in English the difference between /t/ and /d/ is the presence or absence of "voice" (the engagement of the vocal chords), and on the level of voicing these phonemes reciprocally define one another. In this way, phonology is a specific example of a general rule of language described by Saussure: In language there are only differences. even more important: a difference generally implies positive terms between which the difference is set up; but in language there are only differences without positive terms. Whether we take the signified or the signifier, the language has neither ideas nor sounds that existed before the linguistic system. ( 120)

In this framework, linguistic identities are determined not by inherent qualities but by systemic ("structural") relationships.

I have said that phonology "followed the lead" of Saussure, because even though his analysis of the physiology of language production "would nowadays," as Harris says, "be called "physical," as opposed to either "psychological" or "functional "" (Reading 49), consequently in the Course he articulated the direction and outlines of a functional analysis of language. Similarly, his only extended published work, Memoire sur le systeme primitif des voyelles dans les langues indo-europeennes (Memoir on the primitive system of vowels in Indo-European languages), which appeared in 1878, was fully situated within the project of nineteenth- century historical linguistics. Nevertheless, within this work, as Jonathan Culler has argued, Saussure demonstrated "the fecundity of thinking of language as a system of purely relational items, even when working at the task of historical reconstruction" (Saussure 66). By analyzing the systematic structural relationships among phonemes to account for patterns of vowel alternation in existing Indo-European languages, Saussure suggested that in addition to several different phonemes /a/, there must have been another phoneme that could be described formally. "What makes Saussure"s work so very impressive," Culler concludes, "is the fact that nearly fifty years later, when cuneiform Hittite was discovered and deciphered, it was found to contain a phoneme, written h, which behaved as Saussure had predicted . He had discovered, by a purely formal analysis, what are now known as the laryngeals of Indo-European" (66).

This conception of the relational or diacritical determination of the elements of signification, which is both implicit and explicit in the Course, suggests a third assumption governing structural linguistics, what Saussure calls "the arbitrary nature of the sign." By this he means that the relationship between the signifier and signified in language is never necessary (or "motivated"): one could just as easily find the sound signifier arbre as the signifier tree to unite with the concept "tree". But more than this, it means that the signified is arbitrary as well: one could as easily define the concept "tree" by its woody quality (which would exclude palm trees) as by its size (which excludes the "low woody plants" we call bushes). This should make clear that the numbering of assumptions I have been presenting does not represent an order of priority: each assumption - the systemic nature of signification (best apprehended by studying language "synchronously"), the relational or "diacritical" nature of the elements of signification, the arbitrary nature of signs - derives its value from the others.

That is, Saussurean linguistics the phenomena it studies in overarching relationships of combination and contrast in language. In this conception, language is both the process of articulating meaning (signification) and its product (communication), and these two functions of language are neither identical nor fully congruent (see Schleifer, "Deconstruction"). Here, we can see the alternation between form and content that Greimas and Courtes describe in modernist interpretation: language presents contrasts that formally define its units, and these units combine on succeeding levels to create the signifying content. Since the elements of language are arbitrary, moreover, neither contrast nor combination can be said to be basic. Thus, in language distinctive features combine to form contrasting phonemes on another level of apprehension, phonemes combine to form contrasting morphemes, morphemes combine to form words, words combine to form sentences, and so on. In each instance, the whole phoneme, or word, or sentence, and so on, is greater than the sum of its parts (just as water, H2O, in Saussure"s example [(1959) 103] is more than the mechanical agglomeration of hydrogen and oxygen).

The three assumptions of the Course in General Linguistics led Saussure to call for a new science of the twentieth century that would go beyond linguistic science to study "the life of signs within society." Saussure named this science "semiology (from Greek semeion "sign")" (16). The "science" of semiotics, as it came to be practiced in Eastern Europe in the 1920s and 1930s and Paris in the 1950s and 1960s, widened the study of language and linguistic structures to literary artifacts constituted (or articulated) by those structures. Throughout the late part of his career, moreover, even while he was offering the courses in general linguistics, Saussure pursued his own "semiotic" analysis of late Latin poetry in an attempt to discover deliberately concealed anagrams of proper names. The method of study was in many ways the opposite of the functional rationalism of his linguistic analyses: it attempted, as Saussure mentions in one of the 99 notebooks in which he pursued this study, to examine systematically the problem of "chance," which " becomes the inevitable foundation of everything" (cited in Starobinski 101). Such a study, as Saussure himself says, focuses on "the material fact" of chance and meaning (cited 101), so that the "theme-word" whose anagram Saussure is seeking, as Jean Starobinski argues, "is, for the poet , an instrument, and not a vital germ of the poem. The poem is required to re-employ the phonic materials of the theme-word" (45). In this analysis, Starobinski says, "Saussure did not lose himself in a search for hidden meanings." Instead, his work seems to demonstrate a desire to evade all the problems arising from consciousness: "Since poetry is not only realized in words but is something born from words, it escapes the arbitrary control of consciousness to depend solely on a kind of linguistic legality "(121).

That is, Saussure"s attempt to discover proper names in late Latin poetry - what Tzvetan Todorov calls the reduction of a "word . . . to its signifier" (266) - emphasizes one of the elements that governed his linguistic analysis, the arbitrary nature of the sign. (It also emphasizes the formal nature of Saussurean linguistics - "Language," he asserts, "is a form and not a substance" - which eliminate effectivelys semantics as a major object of analysis.) As Todorov concludes, Saussure"s work appears remarkably homogeneous today in its refusal to accept symbolic phenomena . . . . In his research on anagrams, he pays attention only to the phenomena of repetition, not to those of evocation. . . . In his studies of the Nibelungen, he recognizes symbols only in order to attribute them to mistaken readings: since they are not intentional, symbols do not exist. Finally in his courses on general linguistics, he contemplates the existence of semiology, and thus of signs other than linguistic ones; but this affirmation is at once limited by the fact that semiology is devoted to a single type of sign: those which are arbitrary. (269-70)

If this is true, it is because Saussure could not conceive of "intention" without a subject; he could not quite escape the opposition between form and content his work did so much to call into question. Instead, he resorted to "linguistic legality." Located between, on the one hand, nineteenth-century conceptions of history, subjectivity, and the mode of causal interpretation governed by these conceptions and, on the other hand, twentieth-century "structuralist" conceptions of what Levi-Strauss called "Kantianism without a transcendental subject" (cited in Connerton 23) - concepts that erase the opposition between form and content (or subject and object) and the hierarchy of foreground and background in full-blown structuralism, psychoanalysis, and even quantum mechanics - the work of Ferdinand de Saussure in linguistics and semiotics circumscribes a signal moment in the study of meaning and culture.

Ronald Schleifer

Annex 2

Ferdinand de Saussure (translation)

The Swiss linguist Ferdinand de Saussure (1857-1913) is considered the founder of modern linguistics - thanks to his attempts to describe the structure of the language, and not the history of individual languages ​​and word forms. By and large, the foundations of structural methods in linguistics and literary criticism and, to a large extent, semiotics were laid in his works at the very beginning of the twentieth century. It is proved that the methods and concepts of the so-called "post-structuralism", developed in the works of Jacques Derrida, Michel Foucault, Jacques Lacan, Julia Kristeva, Roland Barthes and others, go back to the linguistic works of Saussure and anagrammatic readings of late Roman poetry. It should be noted that Saussure's work on linguistics and linguistic interpretation helps to connect a wide range of intellectual disciplines - from physics to literary innovations, psychoanalysis and philosophy of the early twentieth century. A. J. Greimas and J. Kurte write in Semiotics and Language: “An analytical dictionary with the title “Interpretation” as a new kind of interpretation appeared at the beginning of the 20th century along with the linguistics of Saussure, the phenomenology of Husserl and the psychoanalysis of Freud. In such a case, "interpretation is not the attribution of a given content to a form that would otherwise lack one; rather, it is a paraphrase which formulates in another way the same content of a significant element within a given semiotic system" (159). In this understanding of "interpretation", form and content are inseparable; on the contrary, each form is filled with semantic meaning (“meaningful form”), so the interpretation offers a new, similar retelling of something meaningful in another sign system.

A similar understanding of form and content, presented by Claude Lévi-Strauss in one of the programmatic works of structuralism, ("Structure and Form: Reflections on the Works of Vladimir Propp"), can be seen in Saussure's posthumously published book A Course in General Linguistics (1916, trans., 1959, 1983). During his lifetime, Saussure published little, "Course" - his main work - was collected from the notes of students who attended his lectures on general linguistics in 1907-11. In the Course, Saussure called for a "scientific" study of language, contrasting it with nineteenth-century comparative-historical linguistics. This work can be considered one of the greatest achievements of Western thought: taking individual words as the structural elements of language as a basis, historical (or “diachronic”) linguistics proved the origin and development of Western European languages ​​​​from a common, Indo-European language - and an earlier Proto-Indo-European.

It is precisely this study of the unique occurrences of words, with the concomitant assumption that the basic "unit" of language is, in fact, the positive existence of these "word elements" that Saussure questioned. His work was an attempt to reduce the many facts about language casually studied by comparative linguistics to a small number of theorems. The comparative philological school of the 19th century, writes Saussure, "did not succeed in creating a real school of linguistics" because "it did not understand the essence of the object of study" (3). This "essence", he argues, lies not only in individual words - the "positive substances" of language - but also in the formal connections that help these substances to exist.

Saussure's "test" of language is based on three assumptions. First, the scientific understanding of language is based not on a historical, but on a structural phenomenon. Therefore, he distinguished between individual phenomena of the language - "events of speech", which he defines as "parole" - and the proper, in his opinion, object of study of linguistics, the system (code, structure) that controls these events ("langue"). Such a systematic study, moreover, requires a "synchronous" conception of the relationship between the elements of language at a given moment, rather than a "diachronic" study of the development of a language through its history.

This hypothesis was the forerunner of what Roman Jakobson in 1929 would call "structuralism" - a theory where "any set of phenomena investigated by modern science is considered not as a mechanical accumulation, but as a structural whole in which the constructive component is correlated with the function" ("Romantic "711). In this passage, Jakobson formulated Saussure's idea of ​​defining language as a structure, as opposed to the "mechanical" enumeration of historical events. In addition, Jakobson develops another Saussurean assumption, which became the forerunner of structural linguistics: the basic elements of language should be studied in connection not so much with their causes, but with their functions. Separate phenomena and events (say, the history of the origin of individual Indo-European words) should be studied not by themselves, but in a system in which they are correlated with similar components. This was a radical turn in the comparison of phenomena with the surrounding reality, the significance of which was compared by the philosopher Ernst Cassirer with "the science of Galileo, which turned the ideas about the material world in the seventeenth century." Such a turn, as Greimas and Kurthe note, changes the idea of ​​"interpretation", consequently, the explanations themselves. Phenomena began to be interpreted not in relation to the causes of their occurrence, but in relation to the effect that they can have in the present and future. Interpretation ceased to be independent of a person’s intentions (despite the fact that intentions can be impersonal, “unconscious” in the Freudian sense of the word).

In his linguistics, Saussure especially shows this turn in the change in the concept of the word in linguistics, which he defines as a sign and describes in terms of its functions. A sign for him is a combination of sound and meaning, "signified and designation" (66-67; in the English translation of 1983 by Roy Harris - "signification" and "signal"). The nature of this compound is "functional" (neither one nor the other element can exist without each other); moreover, "one borrows qualities from the other" (8). Thus, Saussure defines the main structural element of language - the sign - and makes the basis of historical linguistics the identity of signs to words, which requires a particularly rigorous analysis. Therefore, we can understand different meanings of, say, the same word "tree" - not because the word is only a set of certain qualities, but because it is defined as an element in the sign system, in the "structural whole", in the language.

Such a relative ("diacritical") concept of unity underlies the concept of all elements of the language in structural linguistics. This is especially clear in the most original discovery of Saussurean linguistics, in the development of the concept of "phonemes" and "distinctive features" of language. Phonemes are the smallest of the spoken and meaningful language units. They are not only sounds that occur in the language, but "sound images", notes Saussure, which are perceived by native speakers as having meaning. (It should be noted that Elmar Holenstein calls Jakobson's linguistics, which continues the ideas and concepts of Saussure in its main provisions, "phenomenological structuralism"). That is why the leading speaker of the Prague School of Structuralism, Jan Mukarowski, observed in 1937 that “structure. . . not an empirical, but a phenomenological concept; it is not the result itself, but a set of significant relations of the collective consciousness (generation, others, etc.)”. A similar thought was expressed in 1960 by Lévi-Strauss, the leader of French structuralism: “The structure has no definite content; it is meaningful in itself, and the logical construction in which it is enclosed is the imprint of reality.

In turn, phonemes, as the smallest linguistic elements acceptable for perception, represent a separate integral "phenomenological reality". For example, in English, the sound "t" can be pronounced differently, but in all cases, a person who speaks English will perceive it as "t". Aspirated, raised or lowered, a long "t" and the like will equally distinguish the meaning of the words "to" and "do". Moreover, the differences between languages ​​are such that varieties of one sound in one language can correspond to different phonemes in another; for example, "l" and "r" in English are different, while in other languages ​​they are varieties of the same phoneme (like the English "t", pronounced with and without aspiration). The vast vocabulary of any natural language is a set of combinations of a much smaller number of phonemes. In English, for example, only 40 phonemes are used to pronounce and write about a million words.

The sounds of a language are a systematically organized set of features. In the 1920s -1930s, following Saussure, Jacobson and N.S. Trubetskoy singled out the "distinctive features" of phonemes. These features are based on the structure of the organs of speech - tongue, teeth, vocal cords - Saussure notices this in the "Course of General Linguistics", and Harris calls it "physiological phonetics" (in Baskin's earlier translation, the term "phonology" is used) - they are connected in "knots » durg against a friend to make sounds. For example, in English, the difference between "t" and "d" is the presence or absence of a "voice" (the tension of the vocal cords), and the level of voice that distinguishes one phoneme from another. Thus, phonology can be considered an example of the general language rule described by Saussure: "There are only differences in language." Even more important is not this: the difference usually implies the exact conditions between which it is located; but in language there are only differences without precise conditions. Whether we are considering "designation" or "signified" - in the language there are neither concepts nor sounds that would have existed before the development of the language system.

In such a structure, linguistic analogies are defined not with the help of their inherent qualities, but with the help of system (“structural”) relations.

I have already mentioned that phonology in its development relied on the ideas of Saussure. Although his analysis of linguistic physiology in modern times, Harris says, "would be called 'physical', as opposed to 'psychological' or 'functional', in The Course he clearly articulated the direction and basic principles of the functional analysis of language. His only work published during his lifetime, Memoire sur le systeme primitif des voyelles dans les langues indo-europeennes (Notes on the original vowel system in the Indo-European languages), published in 1878, was completely in line with comparative historical linguistics of the 19th century. Nevertheless, in this work, says Jonathan Culler, Saussure showed "the fruitfulness of the idea of ​​language as a system of interconnected phenomena, even with its historical reconstruction." Analyzing the relationship between phonemes, explaining the alternation of vowels in the modern languages ​​of the Indo-European group, Saussure suggested that in addition to several different sounds "a", there must be other phonemes that are described formally. “What makes Saussure’s work particularly impressive,” Kaller concludes, “is that almost 50 years later, when Hittite cuneiform was discovered and deciphered, a phoneme was found, in writing denoted by “h”, which behaved as Saussure predicted. Through formal analysis, he discovered what is now known as the guttural sound in the Indo-European languages.

In the concept of a relative (diacritical) definition of signs, both explicit and implied in the Course, there is a third key assumption of structural linguistics, called by Saussure the "arbitrary nature of the sign." By this is meant that the relation between sound and meaning in language is not motivated by anything: one can just as easily connect the word "arbre" and the word "tree" with the concept of "tree". Moreover, this means that the sound is also arbitrary: you can define the concept of "tree" by the presence of its bark (except for palm trees) and by size (except for "low woody plants" - shrubs). From this it should be clear that all the assumptions I present are not divided into more and less important ones: each of them - the systemic nature of signs (most understandable in the "synchronous" study of the language), their relative (diacritical) essence, the arbitrary nature of signs - comes from from the rest.

Thus, in Saussurean linguistics, the studied phenomenon is understood as a set of comparisons and oppositions of language. Language is both an expression of the meaning of words (designation) and their result (communication) - and these two functions never coincide (see Shleifer's "Deconstruction of Language"). We can see the alternation of form and content that Greimas and Kurte describe in the latest version of interpretation: linguistic contrasts define its structural units, and these units interact on successive levels to create a certain meaningful content. Since the elements of language are random, neither contrast nor combination can be the basis. This means that in a language, distinctive features form a phonetic contrast at a different level of understanding, phonemes are combined into contrasting morphemes, morphemes - into words, words - into sentences, etc. In any case, an entire phoneme, word, sentence, etc. is more than the sum of its parts (just like water, in Saussure's example, more than the combination of hydrogen and oxygen).

Three assumptions of the "Course of General Linguistics" led Saussure to the idea of ​​a new science of the twentieth century, separate from linguistics, studying "the life of signs in society." Saussure called this science semiology (from the Greek "semeion" - a sign). The "science" of semiotics, which developed in Eastern Europe in the 1920s and 1930s and in Paris in the 1950s and 1960s, extended the study of language and linguistic structures into literary finds composed (or formulated) in terms of these structures. In addition, in the twilight of his career, in parallel to his course in general linguistics, Saussure engaged in a "semiotic" analysis of late Roman poetry, trying to discover deliberately composed anagrams of proper names. This method was in many ways the opposite of rationalism in its linguistic analysis: it was an attempt, as Saussure writes in one of the 99 notebooks, to study in the system the problem of "probability", which "becomes the basis of everything." Such an investigation, Saussure himself claims, helps to focus on the "real side" of probability; The “key word” for which Saussure is looking for an anagram is, according to Jean Starobinsky, “a tool for the poet, and not the source of life for the poem. The poem serves to reverse the sounds of the key word. According to Starobinsky, in this analysis, "Saussure does not delve into the search for hidden meanings." On the contrary, in his works, a desire to avoid questions related to consciousness is noticeable: “since poetry is expressed not only in words, but also in what these words give rise to, it goes beyond the control of consciousness and depends only on the laws of language.”

Saussure's attempt to study proper names in late Roman poetry (Tsvetan Todorov called this an abbreviation of "a word ... only before it is written") emphasizes one of the components of his linguistic analysis - the arbitrary nature of signs, as well as the formal essence of Saussurean linguistics ("Language," claims he, "the essence of the form, not the phenomenon"), which excludes the possibility of analyzing the meaning. Todorov concludes that today Saussure's writings seem remarkably consistent in their reluctance to study symbols [phenomena that have a well-defined meaning]. . . . Exploring anagrams, Saussure pays attention only to repetition, but not to previous options. . . . Studying the Nibelungenlied, he defines the symbols only to assign them to erroneous readings: if they are unintentional, the symbols do not exist. After all, in his writings on general linguistics, he makes the assumption of the existence of a semiology that describes not only linguistic signs; but this assumption is limited by the fact that semilogy can only describe random, arbitrary signs.

If this is really so, it is only because he could not imagine "intention" without an object; he could not completely bridge the gap between form and content - in his writings this turned into a question. Instead, he turned to "linguistic legitimacy". Standing between, on the one hand, nineteenth-century concepts based on history and subjective conjectures, and methods of accidental interpretation based on these concepts, and, on the other hand, structuralist concepts, which Lévi-Strauss called "Kantianism without a transcendent actor" - erasing the opposition between form and content (subject and object), meaning and origin in structuralism, psychoanalysis and even quantum mechanics, Ferlinand de Saussure's writings on linguistics and semiotics mark a turning point in the study of meanings in language and culture.

Ronald Shleifer

Literature

1. Admoni V.G. Fundamentals of the theory of grammar / V.G. Admoni; USSR Academy of Sciences.-M.: Nauka, 1964.-104p.

3. Arapov, M.V., Herts, M.M. Mathematical methods in linguistics. M., 1974.

4. Arnold I.V. The semantic structure of the word in modern English and the methodology for its study. /I.V. Arnold-L .: Education, 1966. - 187 p.

6.Bashlykov A.M. Automatic translation system. / A.M. Bashlykov, A.A. Sokolov. - M.: LLC "FIMA", 1997. - 20 p.

7. Baudouin de Courtenay: Theoretical heritage and modernity: Abstracts of the reports of the international scientific conference / Ed.I.G. Kondratiev. - Kazan: KGU, 1995. - 224 p.

8. A. V. Gladkiy, Elements of Mathematical Linguistics. / . Gladkiy A.V., Melchuk I.A. -M., 1969. - 198 p.

9. Golovin, B.N. Language and statistics. /B.N. Golovin - M., 1971. - 210 p.

10. Zvegintsev, V.A. Theoretical and applied linguistics. / V.A. Zvegintsev - M., 1969. - 143 p.

11. Kasevich, V.B. Semantics. Syntax. Morphology. // V.B. Kasevich - M., 1988. - 292 p.

12. Lekomtsev Yu.K. INTRODUCTION to the formal language of linguistics / Yu.K. Lekomtsev. - M.: Nauka, 1983, 204 p., ill.

13. Linguistic legacy of Baudouin de Courtenay at the end of the twentieth century: Abstracts of the reports of the international scientific and practical conference March 15-18, 2000. - Krasnoyarsk, 2000. - 125 p.

Matveeva G.G. Hidden grammatical meanings and identification of the social person (“portrait”) of the speaker / G.G. Matveev. - Rostov, 1999. - 174 p.

14. Melchuk, I.A. Experience in building linguistic models "Meaning<-->Text". / I.A. Melchuk. - M., 1974. - 145 p.

15. Nelyubin L.L. Translation and applied linguistics / L.L. Nelyubin. - M.: Higher School, 1983. - 207 p.

16. On the exact methods of language research: on the so-called "mathematical linguistics" / O.S. Akhmanova, I.A. Melchuk, E.V. Paducheva and others - M., 1961. - 162 p.

17. Piotrovsky L.G. Mathematical Linguistics: Textbook / L.G. Piotrovsky, K.B. Bektaev, A.A. Piotrovskaya. - M.: Higher School, 1977. - 160 p.

18. He is. Text, machine, person. - L., 1975. - 213 p.

19. He is. Applied Linguistics / Ed. A.S. Gerda. - L., 1986. - 176 p.

20. Revzin, I.I. language models. M., 1963. Revzin, I.I. Modern structural linguistics. Problems and methods. M., 1977. - 239 p.

21. Revzin, I.I., Rozentsveig, V.Yu. Fundamentals of general and machine translation / Revzin I.I., Rozentsveig, V.Yu. - M., 1964. - 401 p.

22. Slyusareva N.A. The theory of F. de Saussure in the light of modern linguistics / N.A. Slyusareva. - M.: Nauka, 1975. - 156 p.

23. Owl, L.Z. Analytical linguistics / L.Z. Owl - M., 1970. - 192 p.

24. Saussure F. de. Notes on General Linguistics / F. de Saussure; Per. from fr. - M.: Progress, 2000. - 187 p.

25. He is. Course of General Linguistics / Per. from fr. - Yekaterinburg, 1999. -426 p.

26. Speech statistics and automatic text analysis / Ed. ed. R.G. Piotrovsky. L., 1980. - 223 p.

27. Stoll, P. Sets. Logic. Axiomatic theories. / R. Stoll; Per. from English. - M., 1968. - 180 p.

28. Tenier, L. Fundamentals of structural syntax. M., 1988.

29. Ubin I.I. Automation of translation activities in the USSR / I.I. Ubin, L.Yu. Korostelev, B.D. Tikhomirov. - M., 1989. - 28 p.

30. Faure, R., Kofman, A., Denis-Papin, M. Modern Mathematics. M., 1966.

31. Shenk, R. Processing of conceptual information. M., 1980.

32. Shikhanovich, Yu.A. INTRODUCTION to modern mathematics (initial concepts). M., 1965

33. Shcherba L.V. Russian vowels in qualitative and quantitative terms / L.V. Shcherba - L.: Nauka, 1983. - 159 p.

34. Abdullah-zade F. Citizen of the world // Spark - 1996. - No. 5. - p.13

35. V.A. Uspensky. Preliminary for the readers of the "New Literary Review" to the semiotic messages of Andrei Nikolaevich Kolmogorov. - New Literary Review. -1997. - No. 24. - S. 18-23

36. Perlovsky L. Consciousness, language and culture. - Knowledge is power. -2000. №4 - S. 20-33

Introduction? Lecture Translation Theory

During the last century, linguistics has always been cited as an example of a science that developed rapidly and very quickly reached methodological maturity. Already in the middle of the last century, young science confidently took its place in the circle of sciences that had a thousand-year tradition, and one of its most prominent representatives - A. Schleicher - had the courage to believe that with his works he was already summing up the final line.<113>The history of linguistics, however, has shown that such an opinion was too hasty and unjustified. At the end of the century, linguistics underwent its first great shock associated with the criticism of neo-grammatical principles, followed by others. It should be noted that all the crises that we can uncover in the history of the science of language, as a rule, did not shake its foundations, but, on the contrary, contributed to the strengthening and ultimately brought with them a refinement and improvement of the methods of linguistic research, expanding along with themes and scientific issues.

But next to linguistics, other sciences also lived and developed, including a large number of new ones. The physical, chemical and technical (so-called "exact") sciences have received especially rapid development in our time, and their theoretical basis, mathematics, has reigned over all of them. The exact sciences have not only greatly pressed all the humanities, but at present they are striving to "bring them into their faith", to subordinate them to their customs, to impose their research methods on them. In the current situation, using a Japanese expression, one can say that now linguists-philologists are defiling the very edge of the mat, where the exact sciences, headed by mathematics, are triumphantly and freely located.

Wouldn't it be more expedient from the point of view of general scientific interests to capitulate to mathematics, to surrender entirely to the power of its methods, to which some voices are openly calling 59 , and thereby, perhaps, gain new strength? To answer these questions, we must first look at what mathematics claims in this case, in what area of ​​linguistics mathematical methods find their application, to what extent they are consistent with the specifics of the language material and whether they are able to give or even just suggest answers to those questions. set by the science of language.

From the very beginning, it should be noted that among the enthusiasts of the new, mathematical trend in linguistics<114>There is no unanimity of opinions regarding its goals and objectives in static research. Acad. A. A. Markov, who was the first to apply mathematical methods to language, Boldrini, Yul, Mariotti consider language elements as suitable illustrative material for constructing quantitative methods, or for statistical theorems, without at all wondering whether the results of such a study are of interest to linguists 6 0 . Ross believes that probability theory and mathematical statistics provide a tool or, as they now prefer to say, a mathematical model for testing and confirming those linguistic conclusions that allow for a numerical interpretation. Thus, mathematical methods are conceived only as auxiliary means of linguistic research 6 1 . Much more is claimed by Herdan, who in his book not only summed up and systematized all attempts at the mathematical study of language problems, but also tried to give them a clear orientation in relation to further work. He focuses the presentation of the entire material of his book on “understanding literary statistics (as he calls the study of texts by methods of mathematical statistics. - IN 3.) as an integral part of linguistics” 6 2 , and formulates the essence and tasks of this new section in linguistics in the following words: “Literary statistics as a quantitative philosophy of language is applicable to all branches of linguistics. In our opinion, literary statistics is structural linguistics raised to the level of a quantitative science or a quantitative philosophy. Thus, it is equally wrong to define its results as being out of scope<115>linguistics or treat it as an auxiliary tool for research” 6 3 .

It is hardly advisable to go into theorizing as to whether it is legitimate in this case to speak of the emergence of a new branch of linguistics and to resolve the issue of its claims, without first referring to the consideration of what has actually been done in this area, and to clarifying in what direction the application of new methods 6 4 . This will help us understand the differences of opinion.

The use of mathematical (or, more precisely, statistical) criteria for solving linguistic problems is by no means new to the science of language and, to one degree or another, has long been used by linguists. After all, in fact, such traditional concepts of linguistics as phonetic law (and related<116>nee with it - an exception to the law), the productivity of grammatical elements (for example, derivational suffixes), or even the criteria for related relations between languages, to a certain extent, are based on relative statistical features. After all, the sharper and clearer the statistical opposition of the observed cases, the more reason we have to talk about productive and unproductive suffixes, about the phonetic law and exceptions to it, about the presence or absence of kinship between languages. But if in such cases the statistical principle was used more or less spontaneously, then in the future it began to be applied consciously and already with a certain goal setting. So, in our time, the so-called frequency dictionaries of vocabulary and expressions of individual languages ​​6 5 or even the meanings of multilingual words with a "general focus on reality" 6 6 have become very widespread. The data of these dictionaries are used to compile foreign language textbooks (the texts of which are built on the most commonly used vocabulary) and minimum dictionaries. Statistical calculus found a special linguistic use in the method of lexicostatistics or glottochronology by M. Swadesh, where, on the basis of statistical formulas that take into account the cases of disappearance from the languages ​​of the words of the main fund, it is possible to establish the absolute chronology of the dismemberment of language families 6 7 .

In recent years, cases of applying mathematical methods to linguistic material have increased significantly, and in the mass of such attempts, more or less definite directions have been outlined. Let's turn<117>to their sequential consideration, without going into details.

Let's start with the direction that has been given the name of stylostatistics. In this case, we are talking about the definition and characterization of the stylistic features of individual works or authors through the quantitative relations of the linguistic elements used. The statistical approach to the study of stylistic phenomena is based on the understanding of literary style as an individual way of mastering the means of language. At the same time, the researcher is completely distracted from the question of the qualitative significance of the countable linguistic elements, focusing all his attention only on the quantitative side; the semantic side of the studied language units, their emotional and expressive load, as well as their share in the fabric of a work of art - all this remains out of account, refers to the so-called redundant phenomena. Thus, a work of art appears in the form of a mechanical aggregate, the specificity of the construction of which finds its expression only through the numerical relations of its elements. Representatives of stylostatistics do not turn a blind eye to all the circumstances noted, opposing the methods of traditional stylistics, which undoubtedly include elements of subjectivity, with one single quality of the mathematical method, which, in their opinion, compensates for all its shortcomings - the objectivity of the results achieved. “We strive,” writes, for example, V. Fuchs, “... to characterize the style of linguistic expression by mathematical means. For this purpose, methods should be created, the results of which should be as objective as the results of the exact sciences... This suggests that, at least initially, we will deal only with formal structural qualities, and not with the semantic content of linguistic expressions. . In this way we will obtain a system of ordinal relations, which in its totality will be the basis and starting point of the mathematical theory of style” 6 8 .<118>

The simplest type of statistical approach to the study of the language of writers or individual works is to count the words used, since the richness of the dictionary, apparently, should characterize the author himself in a certain way. However, the results of such calculations give somewhat unexpected results in this regard and do not contribute in any way to aesthetic knowledge and evaluation of a literary work, which is not least one of the tasks of stylistics. Here are some data on the total number of words used in a number of works:

Bible (Latin). . . . . . . . . . 5649 words

Bible (Hebrew). . . . 5642 words

Demosthenes (speech). . . . . . . . . . . . 4972 words

Sallust. . . . . . . . . . . . . . . . . 3394 words

Horace. . . . . . . . . . . . . . . . . . . .6084 words

Dante (Divine Comedy) 5860 words

(this includes 1615 proper names and geographical names)

Tasso (Furious Orland). . . . 8474 words

Milton. . . . . . . . . . . . . . . . . . . . .8000 words (approx. given)

Shakespeare. . . . . . . . . . . . . . . . . . .15000 words

(approximately, according to other sources 20,000 words)

O. Jespersen points out that the dictionary of Zola, Kipling and Jack London significantly exceeds the dictionary of Milton, i.e. the number is 8000 6 9 . The calculation of the dictionary of speeches of US President W. Wilson found that it is richer than that of Shakespeare. To this should be added the data of psychologists. Thus, Terman, based on observations of a large number of cases, found that the vocabulary of an average child is about 3600 words, and at the age of 14 - already 9000. The average adult uses 11700 words, and a person of "increased intelligence" up to 13500 7 0 . Thus, such numerical data in themselves do not provide any grounds for identifying the stylistic qualities of works and only "objectively" con<119>they state the use of a different number of words by different authors, which, as the above calculations show, is not related to the relative artistic value of their works.

Calculations of the relative frequency of the use of words by individual authors are built somewhat differently. In this case, not only the total amount of words is taken into account, but also the frequency of use of individual words. Statistical processing of the material obtained in this way consists in the fact that words with equal frequency of use are grouped into classes (or ranks), which leads to the establishment of the frequency distribution of all words used by a given author. A special case of this kind of calculation is the determination of the relative frequency of special words (for example, Romance vocabulary in Chaucer's works, as was done by Mersand 7 1). The relative frequency of the words used by the authors contains the same objective information about the style of individual authors as the above total calculations, with the only difference that the result is more accurate numerical data. But it is also used to date individual works of the same author on the basis of a preliminary calculation of the relative frequency of his use of words in different periods of his life (according to works dated by the author himself). Another type of use of data from such calculations is to establish the authenticity of the authorship of works for which this question seems doubtful 7 2 . In this last case, everything is based on a comparison of statistical formulas for the frequency of use in genuine and controversial works. There is no need to talk about the very great relativity and approximateness of the results obtained by such methods. After all, the relative frequency of use varies not only with the age of the author, but also depending on the genre, plot, and historical environment of the work (cf., for example, "Bread" and "Peter I" by A. Tolstoy).<120>

Deepening the method described above, stylostatistics as a style characteristic began to resort to the criterion of stability of the relative frequency of the most commonly used words. The method used in this case can be illustrated by the statistical processing of Pushkin's story "The Captain's Daughter" by Esselson and Epstein at the Institute of Slavic Languages ​​at the University of Detroit (USA) 7 3 . The entire text of the story (about 30,000 occurrences of words) was subjected to the survey, and then passages containing about 10,000 and 5,000 occurrences. Further, in order to determine the stability of the relative frequency of the use of words, the 102 most common words (with a frequency of 1160 times to 35) were compared with the calculated relative frequency (made on the basis of selective passages) with the actual one. For example, the union "and" was used 1,160 times throughout the story. In a passage containing 5,000 occurrences of all words, this conjunction should be expected to be used 5,000 x 1,160:30,000, or rounded up 193 times, and in a passage containing 10,000 occurrences of all words, it is expected to be used 10,000 x 1,160: 30,000, or 386 times. Comparison of the data obtained using this kind of calculations with the actual data shows a very slight deviation (within 5%). Based on such calculations, it was found that in this story by Pushkin, the preposition "k" is used twice as often as "y", and the pronoun "you" is used three times more often than "them", etc. Thus, despite at all the vicissitudes of the plot, both throughout the story and in its individual parts, there is a stability in the relative frequency of the use of words. What is observed in relation to some (most common) words is presumably applicable to all words used in the work. It follows that the style of the author can be characterized by a certain ratio of the variability of the average frequency of using a word to the general frequency for a given language.<121>the frequency of its use. This ratio is considered as an objective quantitative characteristic of the author's style.

Other formal elements of the language structure are studied in a similar way. So, for example, V. Fuchs subjected the metrical features of the works of Goethe, Rilke, Caesar, Sallust, etc. to a comparative-statistical consideration. 7 4

The criterion of the stability of the relative frequency of the use of words, while clarifying the technique of the quantitative characterization of style, does not introduce anything fundamentally new in comparison with the more primitive methods analyzed above. All methods of stylostatistics ultimately produce equally dispassionate "objective" results, gliding over the surface of the tongue and clinging only to purely external signs. Quantitative methods, apparently, are not able to focus on the qualitative differences in the material under study and in fact level out all the objects under study.

Where maximum specification is needed, the most generalized criteria are offered; qualitative characteristics are expressed in the language of quantity. This is not only a logical contradiction, but also a disagreement with the nature of things. Indeed, what happens if we try to get a comparative stylistic (i.e., therefore, qualitative) characteristic of the works of Alexander Gerasimov and Rembrandt based on the quantitative ratio of red and black paint on their canvases? It seems to be an absolute nonsense. To what extent can completely “objective” quantitative information about a person’s physical data be able to give us an idea of ​​\u200b\u200beverything that characterizes a person and constitutes his true essence? Obviously none. They can serve only as an individual sign that distinguishes one person from another, like an imprint of convolutions on the thumb. The situation is similar with the quantitative characteristics of literary style. If you look closely, they provide just as meager data for judging the actual stylistic<122>qualities of the author's language, as well as a description of the convolutions on the finger for the study of human psychology.

To all that has been said, it should be added that in the past, in the so-called formal school of literary criticism, an attempt was already made to conduct a quantitative study of the style of writers, when epithets, metaphors, and rhythmic-melodic elements of verse were counted. However, this attempt was not further developed.

Another area of ​​application of mathematical methods for the study of linguistic phenomena can be grouped under the name of linguistic statistics. It seeks to intrude into the fundamental questions of the theory of language, and thus obtain a vocation in the realm of linguistics proper. To get acquainted with this direction, it is best to turn to the already mentioned work of Herdan, in the words of one of its many reviewers, "a monstrously pretentious book" 7 5 , received, however, a wide response among linguists 7 6 . In view of the fact that Herdan (as already mentioned above) sought to collect in his book everything most significant in the field of application of mathematical methods to linguistic problems, in his book we are actually dealing not so much with Kherdan as with a whole trend. As the title of the book itself shows - "Language as Choice and Probability", - its main attention is directed to clarifying what in the language is left to the free choice of the speaker and what is due to the immanent structure of the language, in the same way as to determining the quantitative ratio of the elements of the first and second order. Kherdan's book provides almost exhaustive information about all the work in this area carried out by representatives of various specialties.<123>(philosophers, linguists, mathematicians, technicians), but is not limited to this and includes many original observations, considerations and conclusions of the author himself. As a summarizing work, it gives a good idea of ​​the quantitative methods used, and of the results achieved with their help. The questions that we conditionally combine into the section of linguistic statistics are treated in the second and fourth parts of the book.

Of the many cases of applying the methods of mathematical statistics to the study of linguistic issues, we will focus on the most general ones, which can also be considered as the most typical. Using data from other authors - Boldrini 7 7 , Mathesius 7 8 , Mariotti 7 9 , Zipf 8 0 , Deway 8 1 and others, as well as citing his own studies that determine the relative frequency of the distribution of phonemes, letters, word length (measured by the number of letters and syllables), grammatical forms and metric elements in Latin and Greek hexameter, Herdan establishes the fact of the stability of the relative frequency of linguistic elements as a common characteristic of all linguistic structures. He derives the following rule: “The proportions of linguistic elements belonging to one or another level or sphere of linguistic coding - phonology, grammar, metrics - remain more or less constant for a given language, in a given period of its development and within the limits of sufficiently extensive and impartially conducted observations. » 8 2 . This rule, which Herdan calls the basic law of language, he seeks to interpret and expand in a certain way. “He,” Herdan writes about this law, “is an expression of the fact that even here, where the human will and freedom of choice are granted<124>the broadest framework, where conscious choice and carefree play alternate with each other, there is considerable stability on the whole... in grammar, but also in relation to the frequency of use of specific phonemes, lexical units (words) and grammatical phonemes and constructions; in other words, the similarity is not only in what is used, but also in how often it is used” 8 3 . This situation is due to understandable reasons, but this gives rise to new conclusions. When examining different texts or segments of a given language, for example, it is found that the relative frequencies of use of a given particular phoneme (or other speech elements) by different people remain basically the same. This leads to the interpretation of individual forms of speech as some fluctuations in the constant probability of using the considered phoneme in a given language. Thus, it turns out that in his speech activity a person is subject to certain laws of probability in relation to the number of linguistic elements used. And then, when we observe a huge number of linguistic elements in a large set of texts or speech segments, we get the impression of causal dependence in the sense that in this case there is also a determination in relation to the use of certain linguistic elements. In other words, it turns out to be admissible to assert that what seems to be a causal relation from an intuitive point of view, is quantitatively a probability 8 4 . It is clear that the larger the total<125>the specificity of the examined texts or speech segments, the more clearly the stability of the relative frequency of the use of linguistic elements will also be manifested in individual use (the law of large numbers). From this, a new general conclusion is drawn that language is a mass phenomenon and should be treated as such.

These conclusions, reached on the basis of frequency calculations of phonetic elements, words and grammatical forms, which together constitute a language, are then applied to the "statistical interpretation" of Saussure's division into "language" (lalangue) and "speech" (laparole). According to Saussure, "language" is a set of linguistic habits that make communication possible between members of a given linguistic community. This is a social reality, a "mass phenomenon", obligatory for all people who speak this language. Herdan, as indicated, proves that the members of a single language community are similar to each other not only in that they use the same phonemes, lexical units and grammatical forms, but also in that all these elements are used with the same frequency. Thus, his statistical definition of "language" takes the following form: "language" (lalangue) is the totality of common linguistic elements plus their relative probability of being used.

This definition of "language" is also the starting point for the corresponding statistical interpretation of "speech", which, according to Saussure, is an individual utterance. Contrasting “language” as a social phenomenon of “speech” as an individual phenomenon, Saussure wrote: “Speech is an individual act of will and understanding, in which it is necessary to distinguish: 1. combinations with which the speaking subject uses the language code in order to express his personal thought; 2. a psychophysical mechanism that allows him to objectify these combinations” 8 5 . Since "language" in linguistic statistics is considered as a set of elements with a certain relative<126>certain probability of their use, insofar as it includes the statistical totality or ensemble (population) as the most essential characteristic and can be considered in this aspect. In accordance with this, "speech" turns into a separate sample taken from "language" as a statistical aggregate. The probability in this case is determined by the ratio of "speech" to "language" (in their "quantitative" understanding), and the distribution of the relative frequency of the use of different elements of the language is interpreted as the result of a collective "choice" (choice) in a certain chronological period of the existence of the language. Understanding that such an interpretation of the differences between “language” and “speech” is nevertheless built on completely different grounds than Saussure’s, Herdan writes in this regard: “This apparently minor modification of Saussure’s concept has the important consequence that “language” ( lalangue) now acquires an essential characteristic in the form of a statistical aggregate (population). This population is characterized by certain relative frequencies or fluctuation probabilities, meaning that each linguistic element belongs to a certain linguistic level. In this case, "speech" (laparole), in accordance with its meaning, turns out to be a term for defining statistical samples taken from "language" as a statistical population. It becomes obvious that the choice (choice) appears here in the form of the ratio of "speech" to "language", being the ratio of a sample taken at random to a statistical aggregate (population). The very order of frequency distribution, as a deposit of the speech activity of a linguistic community over the centuries, is an element of choice (choice), but not of individual choice, as in style, but of collective choice. Using a metaphor, we can talk here about the choice made by the spirit of the language, if we understand by this the principles of linguistic communication, which are in accordance with the complex of mental data of the members of a particular linguistic community. The stability of series is the result of probability (chance)» 8 6 .

A special case of the application of the stated principle<127>pa is the delimitation in the language of normative phenomena from "exceptions" (deviations). Linguistic statistics states that the statistical method allows to eliminate the fuzziness existing in this issue and establish clear criteria for distinguishing between these phenomena. If the norm is understood as a statistical population (in the above sense), and the exception (or error) is a deviation from the frequencies shown by the statistical population, then a quantitative solution of the question suggests itself. It all boils down to a statistical relationship between "population" and "outlier". If the frequencies observed in an individual sample deviate from the probabilities due to the statistical population by more than is determined by a series of sample counts, then we have reason to conclude that the demarcation line between "the same" (norm) and "not the same" (exception) is violated.

Quantitative distinctions between "language" and "speech" are also used to distinguish two types of linguistic elements: grammatical and lexical. The starting point for solving this problem, which often presents great difficulties from a linguistic point of view, is the assumption that the degree of frequency of grammatical elements is different than that of lexical units. This is allegedly associated with the "generalization" of grammatical elements, how they differ from concepts fixed by lexical units. In addition, grammatical elements are supposedly, as a rule, much smaller in volume: as independent words (they include pronouns, prepositions, conjunctions and auxiliary words) they usually consist of a small number of phonemes, and in the form of "connected forms" - from one or two phonemes 8 7 . The smaller the linguistic element, the less able its "length" (quantitative moment) to serve as a defining characteristic, and the more important the "quality" of phonemes acquires for this purpose. What methods are proposed to solve the problem under consideration? It is solved by referring to the purely quantitative concept of grammatical<128>load, “Suppose,” Herdan writes in this connection, “that we are interested in comparing two languages ​​in this respect. How do we determine with a certain degree of objectivity the "grammatical load" that a language carries? It is clear that this load will depend on the position of the demarcation line separating grammar from vocabulary. The first consideration that may come to our mind is to determine how "complex" the grammar of a given language is. After all, “complexity” is a qualitative characteristic, and the concept of “grammatical load” is a quantitative characteristic. True, the load to a certain extent depends on the complexity, but not entirely. A language may be rewarded with an extremely complex grammar, but only a comparatively small part of it is used in the activity of the language. We define "grammatical load" as the totality of grammar that a language carries when it is in action, which immediately brings our problem into the realm of structural linguistics in the sense in which the discipline was defined by Saussure. In the following presentation, quantitative methods are used to determine the difference between languages, depending on where the boundary lies, separating grammar from vocabulary” 8 8 . In other words, language differences in this case should be reduced to differences in numerical relations between grammatical and lexical elements.

The materials at our disposal paint the following picture. In English (only “grammatical words” were taken into account: pronouns, or, as they are also called, “substitutes”, prepositions, conjunctions and auxiliary verbs), in a segment that includes 78633 cases of the use of all words (1027 different words), 53,102 case of the use of grammatical elements, or, more precisely, "grammatical words" (149 different words), which is 67.53% with 15.8% of different words. Such are the data of Deway 8 9 . Other data show a different percentage<129>ratio: 57.1% with 5.4% different words 9 0 . This significant discrepancy is explained by the difference between written and spoken language. Written forms of the language (first data) supposedly use more grammatical elements than oral ones (second case). In Dante's Divine Comedy (after the Italian original), Mariotti established 54.4% of the occurrences of "grammatical words".

Another and, apparently, a more perfect way to determine the grammatical load of a language is to count the phonemes included in the grammatical elements. In this case, not only independent grammatical words are taken into account, but also related forms. There are various options here. For example, determining the relative frequency of the use of individual consonant phonemes in grammatical elements and comparing them with the frequency of the total use of these same phonemes (the final data of such a ratio in English gives a proportion of 99.9% to 100,000 - total use); or a similar comparison of consonants according to separate classification groups (labial, palatal, velar and other phonemes). The final ratio here takes the form of a proportion of 56.47% (in grammatical elements) to 60.25% (in total usage); or the same comparison of the initial consonant phonemes (in this case, the ratio was 100.2% in grammatical words to 99.95 in total use). Other more complex statistical operations are also possible, which, however, result in similar quantitative expressions of the problem under study.

The given quantitative data serve as the basis for a general conclusion. It boils down to the fact that the distribution of phonemes in grammatical elements determines the nature of the distribution (in numerical terms, of course) of phonemes in the language as a whole. And this, in turn, allows us to conclude that the use of grammatical elements to the least extent depends on individual choice and constitutes that part of the linguistic expression that is controlled by probabilities.<130>ness. This speculative conclusion is confirmed by the calculation of grammatical forms in the Russian language, made by Esselson 9 1 . The study was subjected to 46896 words taken from sources II (works by Griboyedov, Dostoevsky, Goncharov, Saltykov-Shchedrin, Garshin, Belinsky, Amfiteatrov, Gusev-Orenburgsky, Ehrenburg, Simonov and N. Ostrovsky). They were divided into colloquial words (17,756 words or 37.9%) and non-colloquial (29140 words or 62.1%). Then the entire set of words was divided into 4 groups depending on their grammatical nature: the 1st group included nouns, adjectives, adjectives in the function of nouns, pronouns and inflected numerals; in the 2nd group - verbs; in the 3rd group - verbal participles, participles in the function of adjectives and nouns and gerunds; in the 4th group - invariable forms of adverbs, prepositions, conjunctions and particles. The summary results (also tables with data for individual authors are given) give the following ratio:

1st group

2nd group

3rd group

4th group

colloquial

taciturn

Herdan characterizes the consideration of the quantitative data thus obtained in the following words: “They justify the conclusion that grammatical elements should be considered as a factor that determines the likelihood of a linguistic expression. Such a conclusion avoids the burdensome qualification of each word used. It is clear that, since grammar and vocabulary are not stored in watertight shells, neither is pure "choice" or pure "chance." Both grammar and vocabulary contain both elements, although in significantly varying proportions” 9 2 .<131>

A large section of Herdan's book is devoted to the study of duality or duality in language, and the very concept of duality is based on mathematical characteristics.

Thus, theorems in projective geometry can be arranged in two series, so that each theorem of one series can be obtained from some theorem of another series by replacing the words dot and straight. For example, if the statement is given: "any different points belong to one and only one line," then we can derive from it the corresponding statement: "any two different lines belong to one and only one point." Another method for determining duality is to plot different planes of the phenomenon under study along the abscissa and ordinates. So, as Yul 9 3 does, for example, different frequencies of use are counted along the abscissa axis, and the number of lexical units whose frequency is determined, etc., is counted along the ordinate axis. .linguistic research.

Under the concept of duality defined in this way, which in all cases actually has the character of a binary code and which is also considered the most essential feature of the linguistic structure, phenomena of extremely different qualities are brought in, allowing opposition along two planes: the distribution of the use of words according to the nature of lexical units and the distribution of lexical units according to frequency. the use of words; written and spoken forms of speech; lexical and grammatical elements; synonyms and antonyms; phoneme and its graphic representation; defined and defining (Saussure's signifiant and signifiy), etc.

After a quantitative study of the duality of one or another particular, linguistic phenomenon or limited "text", as a rule, a conclusion is drawn, to which the qualities of linguistic universality are attributed. The nature of such conclusions and the way they are justified can be seen in the example<132>studies of the duality of the word and the concept (in fact, we are talking about the ratio of the length of the word and the volume of the concept - it must be borne in mind that the extremely free use of linguistic and other terms in such works often makes understanding very difficult). It is important to note here that the international nomenclature of diseases (about 1000 names) and the general register of diseases in England and Wells for 1949 were used as the source of observations of this type of linguistic duality. In this case, the following general conclusion is made: “ Every concept denoting a general idea has what may be called a "sphere" or "volume." It allows through its medium to think about many objects or other concepts that are within its "sphere". On the other hand, all the items needed to define a concept constitute what is called its "content". Volume and content are mutually correlated - the smaller the content and, accordingly, the more abstract the concept, the larger its scope or volume, i.e., the more objects are brought under it. This can be seen as an analogy (in the conceptual sphere) to the principles of coding, according to which the length of a symbol and frequency of use are interdependent” 9 4 .

The principle of duality applies to particular problems as well. For example, when establishing the equivalence of the meanings of words in two different languages. As a result of studying the English-German dictionary of Muret-Zanders using the mathematical method of iterations, it is concluded that the probability of using an English word with one or more meanings in German translation remains constant for each initial letter in the entire dictionary 9 5 . Consideration of the word order in Chinese dictionaries leads to the conclusion that it is of a taxonomic nature, since the number of strokes in the character indicates its place (as an independent radical or a certain subclass subordinate to the radical). Taxonomy is a subordinating principle of classification used in zoology and botany. Kherdan claims that<133>the foundations of Chinese lexicography are also built on the principles of taxonomy 9 6, etc.

Making a general assessment of this area of ​​application of mathematical methods to the study of linguistic problems (i.e., linguistic statistics), it is necessary, apparently, to proceed from the position that was formulated by Ettinger: “Mathematics can be effectively used in the service of linguistics only when linguists are clear the real limits of its application, as well as the possibilities of the mathematical models used” 9 7 . In other words, we can talk about mathematical linguistics when mathematical methods prove their suitability for solving those linguistic problems proper, which in their totality constitute the science of language. If this is not the case, although this may open up new aspects of scientific research, then in this case we can talk about anything, but not about linguistics - in this case, we mean not different types of applied linguistics (we will talk about it later). speech below), but scientific, or theoretical, linguistics. Based on this position, it should be noted that from the point of view of a linguist, much in linguistic statistics is doubtful and even bewildering.

Let us turn to the analysis of only two examples (so as not to clutter up the presentation), stipulating that very significant objections can be made to each of them. Here we have a quantitative distinction between grammatical and lexical units. It turns out that in order to make such a distinction, it is necessary to already know in advance what belongs to the field of grammar, and what to vocabulary, since the “grammatical load” of the language (i.e., the totality of grammatical elements used in speech), as indicated in quoted above, "depends on the line of demarcation that separates vocabulary from grammar." Without knowing where this line lies, it is therefore impossible to draw the indicated distinction. What then is the meaning of the quantitative method of distinguishing the lexical from the grammar?<134>matic? However, as for Herdan, he does not particularly think about this issue and boldly classifies linguistic elements, referring to grammatical elements "related forms", which, judging by the presentation, should be understood as external inflection, and "grammatical words", which include prepositions , conjunctions, auxiliary verbs and pronouns - the latter by virtue of the fact that they are "substitutes". But if we talk only about this quality of pronouns and on this basis relate them to grammatical elements, then, obviously, such words as “aforementioned”, “named”, “given”, etc., should also be attributed to them, so how they too act as deputies. In connection with the method of separating grammatical elements used in linguistic statistics, the question naturally arises of how to deal in this case with such “non-formal” grammatical phenomena as word order, tones, zero morphemes, paradigmatic relations (some of these phenomena, by the way, find reflection in those languages ​​that are studied by mathematical methods)? How to draw a distinction in languages ​​with rich internal inflection (as, for example, in the Semitic languages), where it not only grammatically modifies the root (radical), but also gives it lexical existence, since the root without permutations has no real existence in the language? What should be understood by the grammatical complexity of a language, by what criterion is it determined? If the quantitative point, which in this case is emphasized in every possible way, then one of the most grammatically difficult languages ​​will be English, which has such constructions as Ishallhavebeencalling or Hewouldhavebeencalling. In these sentences, only call can be classified as lexical, and everything else, therefore, must be considered grammatical. What grounds exist for linking the frequency of use of grammatical elements with the generality or abstractness of the meanings of grammatical words? After all, it is quite obvious that the relatively large frequency of use of grammatical elements is determined by their function in the construction of sentences, and as for the abstractness of meanings, it is very easy to find a large<135>the number of lexical elements that can easily compete with grammatical elements in this respect, being largely inferior to them in frequency (for example, being, existence, extension, space, substance etc).

A similar kind of absurdity arises before us in the case of the definition of duality (duality) of the word and concept. It is necessary to have an extremely peculiar understanding of the structural essence of the language in order to subject it to research using the nomenclature of diseases and the hospital register of diseases, which, as indicated above, served as the source material for very important linguistic conclusions. Without dwelling on the completely obscure use of such terms that do not have a linguistic existence, such as the sphere, scope and content of a concept (by the way, the lexical meaning of the word and the concept denoted by the scientific term are grossly confused), let us turn to the conclusion that is made in this case. As stated above, we are dealing with the assertion that "scope and content are mutually correlated." The entire course of reasoning that gives rise to such a conclusion, as well as the way of mathematical operation of linguistic facts, clearly shows that in this case one very essential quality of the language is completely ignored, which upsets all the calculations being carried out: the ability to express the same thing. content” by linguistic units of different “volume”, which undoubtedly have, moreover, different relative frequency of use. So, we can designate the same person as Petrov, my acquaintance, he, a Muscovite, a young man, a university employee, my wife's brother, a man whom we met on the bridge, etc. In the light of such facts, there are no doubts only particular conclusions, which, however, as was pointed out, are given universal significance, but also the expediency of applying the quantitative methods themselves to such linguistic problems.

But sometimes linguists are offered conclusions, the validity of which is not in doubt. This is the "basic law of language", which consists in the fact that in the language there is a certain stability of its elements and the relative frequency of their mention.<136>consumption. The trouble with this kind of discoveries, however, is that they have long been known to linguists. After all, it is quite obvious that if the language did not have a certain stability and each member of a given linguistic community freely varied the elements of the language, then mutual communication would not be possible and the very existence of the language would become meaningless. As for the distribution of the relative frequency of the use of individual elements of the language, it has found its expression in linguistics in the form of the allocation of categories of passive and active vocabulary and grammar, to which L. V. Shcherba paid so much attention. In this case, statistical methods can only help linguists in the distribution of specific linguistic elements according to the categories of the relative frequency of their use, but they have no reason to claim to discover any new patterns that are valuable for theoretical linguistics.

On the other hand, linguostatistics offers a number of truly "original" conclusions, which are extremely indicative of the nature of the scientific thinking of its adherents. Thus, the "political vocabulary" in the works of Churchill, Benes, Halifax, Stresemann and others is studied with complex statistical methods, and translations of their works into English are used in the calculations for non-English-speaking authors. The calculation results are presented in the form of numerous tables, mathematical formulas and equations. The linguistic interpretation of quantitative data in this case boils down to the fact that Churchill's use of "political vocabulary" is the most typical (?) for this group of authors and that Churchill's use of words in cases where he deals with political issues is typical of the English speech group. 9 8 .

In another case, after appropriate statistical manipulations, it is concluded that Hitler violated the duality between "language" and "speech" in the quantitative sense of these terms in the use of Nazi Germany. A special case of the destruction of this duality is the literal understanding<137>metaphorical turns (for example, "pour salt into open wounds"). Nazi Germany branded itself with so many inhuman acts that there is hardly any need to convict it of this linguistic atrocity 9 9 . According to Kherdan, Marx's definition of language as the immediate reality of thought also leads to a violation of linguistic duality, and the law of dialectics about the transition of a phenomenon into its opposite is, in his opinion, the misunderstood linguistic law of the duality of language. Such interpretations speak for themselves.

Finally, a common shortcoming inherent in all the above cases of the quantitative method of studying linguistic material and thus acquiring a methodological character is the approach to linguistic elements as a mechanical set of facts absolutely independent of each other, in accordance with which, if any or patterns, they refer only to the numerical relations of the distribution of autonomous facts, outside of their systemic dependencies. True, J. Watmow is trying in every possible way to assure that it is mathematics that is better than any kind of linguistic structural analysis, capable of revealing the structural features of a language. “Modern mathematics,” he writes, “does not concern itself with measurement and calculation, the accuracy of which is limited by their very nature, but primarily with structure. This is why mathematics is highly conducive to the accuracy of language learning - to the extent that a separate description, even more limited in nature, is not capable of ... Just as in physics, mathematical elements are used to describe the physical world, since they are assumed to correspond to elements of the physical world, so in mathematical linguistics the mathematical elements are supposed to correspond to the elements of the world of speech” 101 . But such a formulation of the question by no means saves the situation, since at best it can<138>give an analysis of language either as a physical structure, which is still far from sufficient for a language, and in the final analysis is still of the same mechanistic character, or as a logical-mathematical structure, and this transfers the language to a different plane, and in many respects alien to it. It is not superfluous to note that Watmow foresees the successes of mathematical linguistics only in the future, and as for their real results, he evaluates them in the following words: “... almost all the work done to date by Herdan, Zipf, Yul, Giro (Guiraux) and others, is by no means outside the scope of criticism from both linguistics and mathematics; she smacks of amateurishness to a great extent” 103 . Thus, if we do not try to predict the future of mathematical methods in linguistic research, but try to appreciate what we have today, then, of necessity, we will have to admit that, in fact, mathematics has so far been limited in the field of linguistics only to “measurement and calculation”, and I could not give a qualitative analysis of the language, delving into its structure.<139>

Let's try to be as objective as possible. In a certain part, quantitative data, apparently, can be used by linguistics, but only as auxiliary and mainly in problems that have a practical orientation. With regard to most of the quantitative methods of studying individual linguistic phenomena, the general conclusion of R. Brown is undoubtedly justified: “They can be considered as Kherdan considers them, but what is the meaning of all this?” 104 . Let's imagine that we ask the question: "What are the trees in this garden?". And in response we get: "There are a hundred trees in this garden." Is this the answer to our question, and does it really make sense? But with regard to many linguistic questions, mathematical methods give just such answers.

However, there is a wide field of research activity, using mainly mathematical methods and at the same time orienting them to linguistic material, where the expediency of such a combination is beyond doubt. The "meaning" of this research activity, its significance is determined by the goals to which it aspires. It has already been tested in practice. In this case, we are talking about the problems associated with the creation of information machines, structures for machine translation of written scientific texts, automation of the translation of oral speech from one language to another, and with the whole range of tasks that are combined in the linguistic issues of cybernetics. The whole set of such problems is usually given the general name of applied linguistics. Thus, it is distinguished from the so-called mathematical linguistics, which includes those areas of work that have been designated above as stylostatistics and linguistic statistics, although it by no means avoids the statistical processing of linguistic material. Perhaps the most important feature of applied linguistics, separating it from mathematical linguistics, as outlined above, is that the former has the opposite direction: not mathematics for linguistics, but linguistics.<140>(formalized by mathematical methods) for a wide range of practical problems.

There is no need to disclose the content of individual problems that are now included in the extremely wide area of ​​applied linguistics. In contrast to mathematical linguistics, these problems are actively discussed in Soviet linguistic literature and rightly begin to occupy an increasingly prominent place in the scientific problems of research institutes 105 . Thus, they are already well known to our linguistic community. This circumstance, however, does not relieve us of the need to subject them to reflection, in particular, from the point of view of the principles of the science of language. This will undoubtedly help to eliminate the misunderstandings that more and more often arise between representatives of sciences that are very distant from each other and take part in the work on the problems of applied linguistics, and will outline ways for their convergence, on the one hand, and delimitation of areas of research, on the other hand. It goes without saying that the following considerations will represent the point of view of the linguist, and it is necessary that mathematicians not only try to assimilate it, but, in connection with the questions raised, give them their interpretation.

The linguist-theorist cannot in any way be satisfied with the fact that in all cases<141>language for the purposes set by applied linguistics, their basis is a mathematical model. In accordance with this, observations on the phenomena of language and the results obtained in this way are expressed in terms and concepts of mathematics, i.e., through mathematical equations and formulas. Let's look at an example for clarity. Condon 1 06 and Zipf 1 07 established that the logarithms of the frequency ( f) the occurrences of words in a large text are located almost in a straight line, if they are correlated in the diagram with the logarithms of rank or rank ( r) of these words. The equation f=c:r, where with is a constant reflects this relationship in the limited sense that c:r for set value r reproduces the observed frequency with great approximation. Relationship between f and r, expressed by a mathematical formula, is a model for the relationship between the observed values ​​of the frequency of use and the rank, or rank, of words. This is one of the cases of mathematical modeling. 

The entire theory of information is entirely based on the mathematical model of the communication process developed by C. Shannon 108 . It is defined as "a mathematical discipline devoted to the methods of calculating and estimating the amount of information contained in any data, and the study of the processes of storing and transmitting information" (TSB, vol. 51, p. 128). Accordingly, the basic concepts of information theory receive a mathematical expression. Information is measured in binits or binary units (a code, to which the language is likened, with two conditional equally probable signals transmits one binary unit of information during the transmission of each character). -either code and the average amount of information transmitted<142>formations. Redundancy is expressed as a percentage of the total transmitting ability of the code”, 1 09 etc. In the same way, machine translation requires algorithmic development of mapping elements of one language into another, etc. 1 10 . These are other cases of modeling.

The use of models without any significance can be of very significant help, in particular, in all likelihood, in solving the problems that applied linguistics sets itself. However, for theoretical linguistics, it is very important that an abstract model, as a rule, does not reproduce all the features of a real phenomenon, all its functional qualities. So, an architect, before building a house, can create his model, which reproduces the house being designed in all the smallest details, and this helps him solve a number of practical issues related to the construction of the house itself. But such a model of a house, no matter how accurate it may be, is devoid of that “function” and that purpose for which all houses are built in general - it is not capable of providing a person with housing. The situation is similar with the language, where the model is not always able to reproduce all its qualities. In this case, the matter is further complicated by the fact that not linguistic, but mathematical criteria are used to build the model. “Mathematical models ... - writes A. Ettinger, - play an extremely important role in all areas of technology, but since they are a tool for synthesis, their significance for linguistics, which is primarily a historical and descriptive discipline, is naturally limited” 1 11 .<143>

Mathematical modeling of a language is actually applicable only to its static state, which is conditional for a linguist and in fact is in direct conflict with the basic quality of a language, the very form of existence of which is development. It goes without saying that the static study of a language is by no means excluded from linguistics and is the basis for compiling normative grammars and dictionaries, descriptive grammars, practical grammars and dictionaries that serve as a guide for the practical study of foreign languages, etc. However, in all such works, which are predominantly applied in nature, linguists consciously limit the field of research and by no means turn a blind eye to other aspects of language 1 12 . With a static examination of the language, in particular, such qualities of the language associated with its dynamic nature, such as productivity, dependence on forms of thinking, and extensive interaction with cultural, social, political, historical and other factors, completely disappear from the field of view of the researcher. Only on the synchronic plane can language be considered as a system of conventional signs or codes, which, however, turns out to be completely unjustified as soon as we adopt a dynamic point of view more suitable for language. It is in the processes of development that language qualities such as motivation, polysemy of words that do not have stable boundaries, non-autonomy of the meaning of a word and its sound shell, and the creative potential of a word associated with the context are manifested, and all this is in sharp contradiction with the main characteristics of a code or a sign 1 13 . Obviously, in applied linguistics, one can also think of all these qualities of the language and, for practical purposes, be content with, so to speak, a “snapshot” of the language, which is still capable of giving a fairly approximate idea of ​​the mechanism of its functioning.<144>nirovaniya. However, each such "snapshot", if considered as a fact of language, and not as a fact of a system of conventional codes, must be included in the endless process of movement in which language always exists 1 14 . It cannot be studied outside of those specific conditions that characterize this movement, which leaves its mark on the given state of the language and determines the potential for its further development. Here there is the same difference as between a momentary photograph of a person and his portrait painted with a brush of a true artist. In the artist's work, we have before us a generalizing image of a person in all the originality of not only his physical appearance, but also his inner spiritual content. From an artistic portrait, we can also read the past of the person depicted on it and determine what he is capable of in his actions. And a snapshot, although capable of giving a more accurate image of the appearance of the original, is devoid of these qualities and often captures both an accidental pimple that jumped up on the nose and<145>a completely uncharacteristic pose or expression, which ultimately leads to a distortion of the original.

It should be noted that the method of "snapshots" can, of course, be applied to the facts of language development. But in this case, we will actually be dealing only with separate states of the language, which, in their quantitative characterization, turn out to be connected no more than a comparative quantitative characterization of different languages. This kind of quantitative "dynamics" will not contain anything organic, and the connection between the individual states of the language will rest only on the comparison of numerical relations. If in this case, too, to resort to an analogy, then we can refer to the growth of the child. His development, of course, can be represented in the form of the dynamics of numerical data about his weight, height, changing ratios of the volume of parts of his body, but all these data are absolutely detached from everything that primarily constitutes the individual essence of a person - his character, inclinations, habits. , flavors, etc.

Another negative side of the mathematical "modeling" of the language is the fact that it cannot serve as the general principle on the basis of which it is possible to carry out a comprehensive and comprehensive - systematic description of the language. Only a mathematical approach to the phenomena of language, for example, will not make it possible to answer even such fundamental questions (without which the very existence of the science of language is unthinkable), such as: what is language, what phenomena should be attributed to proper linguistic ones, how a word or a sentence is defined, what are the basic concepts and categories of language, etc. Before turning to the mathematical methods of studying language, it is necessary to already have answers (even in the form of a working hypothesis) to all these questions in advance. There is no need to turn a blind eye to the fact that in all cases known to us of the study of linguistic phenomena by mathematical methods, all these concepts and categories inevitably had to be accepted as they were defined by traditional or, relatively speaking, qualitative methods.

This feature of mathematical methods in their linguistic application was noted by Spang-Hanssen when pi<146>sal: “It should be borne in mind that observed facts that receive a quantitative expression ... have no value if they do not form part of the description, and for linguistic purposes it should be a systematic description, closely related to a qualitative linguistic description and theory” 1 15 . In another speech by Spang-Hanssen, we find a clarification of this idea: “Until the possibility of constructing a quantitative system is proved, and as long as there is a generally accepted qualitative system for a given field of study, frequency calculations and other numerical characteristics from a linguistic point of view vision do not make any sense" 1 16 . Similar ideas are expressed by Uldall, somewhat unexpectedly connecting them with the development of the general theoretical foundations of glossematics: “When a linguist counts or measures everything that he counts and measures, in itself is not determined quantitatively; for example, words, when they are counted, are defined, if they are defined at all, in quite different terms.<147>

Thus, it turns out that both in theoretical terms and in their practical application, mathematical methods are directly dependent on linguistic concepts and categories defined by traditional, philological, or, as mentioned above, qualitative methods. In terms of applied linguistics, it is important to realize this dependence, and, consequently, to get acquainted with the totality of the main categories of traditional linguistics.

True, there is no reason to reproach representatives of the exact sciences working in the field of applied linguistics for not using the data of modern linguistics. This does not correspond to the actual state of things. They not only know perfectly well, but also widely use in their work the systems of differential features established by linguists that are characteristic of different languages, the distribution and arrangement of linguistic elements within specific language systems, the achievements of acoustic phonetics, etc. But in this case, a very significant reservation is necessary. . In fact, representatives of the exact sciences use the data of only one direction in linguistics - the so-called descriptive linguistics, which deliberately distinguished itself from the traditional problems of theoretical linguistics, far from covering the entire field of linguistic research, from a proper linguistic point of view, it has significant methodological shortcomings, which led it to recently revealed crisis 1 18 , and, in addition, has a purely practical orientation, corresponding to the interests of applied linguistics. All the reservations and reproaches that were made above against the static consideration of language are applicable to descriptive linguistics. Such a one-sided approach of descriptive linguistics can, the investigator<148>however, justified only by the tasks that applied linguistics sets itself, but it far from exhausts the entire content of the science of language.

In the process of developing questions of applied linguistics, new theoretical problems may arise, and in fact have already arisen. Some of these problems are closely related to the specific tasks of applied linguistics and are aimed at overcoming the difficulties that arise in solving these problems. Other problems are directly related to theoretical linguistics, allowing a new perspective on traditional ideas or opening up new areas of linguistic research, new concepts and theories. Among these latter, for example, is the problem of creating a "machine" language (or intermediary language), which is most closely related to a complex set of such cardinal issues of theoretical linguistics as the relationship of concepts and lexical meanings, logic and grammar, diachrony and synchrony, the sign nature of the language, the essence of linguistic meaning, the principles of constructing artificial languages, etc. 1 19 . In this case, it is especially important to establish mutual understanding and commonwealth in the common work of representatives of linguistic disciplines and the exact sciences. As for the linguistic side, in this case, apparently, we should not be talking about limiting the efforts of, for example, designers of translation machines in advance” and trying to establish the working capabilities of such machines using N. Gribachev’s poems or V. Kochetov’s prose 1 20 . The machine itself will find the limits of its capabilities, and profitability - the limits of its use. But linguists, as their contribution to the common cause, must bring their knowledge of the features of the structure of the language, its versatility, the internal intersecting relations of its elements, as well as the broad and multilateral connections of language with physical, physiological, mental and logical<149>mi phenomena, specific patterns of functioning and development of the language. The totality of this knowledge is necessary for the designers of the respective machines in order not to wander in the wrong directions, but to make the search purposeful and clearly oriented. Even the very brief review of the cases of applying mathematical methods to linguistic problems, which was made in this essay, convinces that such knowledge will by no means be superfluous for representatives of the exact sciences.

On the basis of all the above considerations, one can obviously come to some general conclusions.

So, mathematical linguistics? If this means the use of mathematical methods as a universal master key for solving all linguistic problems, then such claims should be recognized as absolutely unjustified. Everything that has been done in this direction has so far done very little or even not at all to solve the traditional problems of the science of language. At worst, the application of mathematical methods is accompanied by obvious absurdities or, from a linguistic point of view, is absolutely meaningless. At best, mathematical methods can be used as auxiliary methods of linguistic research, being placed at the service of specific and limited linguistic tasks. There can be no question of any "quantitative philosophy of language" in this case. At one time, physics, psychology, physiology, logic, sociology, and ethnology encroached on the independence of the science of language, but they could not subjugate linguistics. The opposite happened - linguistics took advantage of the achievements of these sciences and, to the extent necessary for itself, began to use their help, thereby enriching the arsenal of its research techniques. Now, apparently, it's the turn of mathematics. It is to be hoped that this new community will also contribute to the strengthening of the science of language, the improvement of its working methods, and the increase in their diversity. It is, therefore, just as legitimate to speak of mathematical linguistics as of physical linguistics, physiological linguistics, logical linguistics, psychological linguistics, and<150>etc. There are no such linguists, there is only one linguistics, which profitably uses the data of other sciences as auxiliary research tools. Thus, there is no reason to retreat before the onslaught of the new science and easily cede to it the positions it has won. Here it is very appropriate to recall the words of A. Martinet: “Perhaps it is tempting to join one or another major movement of thought by using a few well-chosen terms, or to declare with some mathematical formula the rigor of one’s reasoning. However, the time has come for linguists to realize the independence of their science and to free themselves from that inferiority complex that makes them associate any of their actions with one or another general scientific principle, as a result of which the contours of reality always become only more vague, instead of becoming clearer. 21 .

Therefore, mathematics in itself and linguistics in itself. This by no means precludes their mutual assistance or a friendly meeting in joint work on common problems. This kind of place of application of the concerted efforts of the two sciences is the whole wide range of problems that are part of applied linguistics and are of great national economic importance. One should only wish that in their joint work both sciences show maximum mutual understanding, which will undoubtedly contribute to the maximum fruitfulness of their cooperation.<151>

How much does it cost to write your paper?

Choose the type of work Thesis (bachelor/specialist) Part of the thesis Master's diploma Coursework with practice Course theory Essay Essay Examination Tasks Attestation work (VAR/VKR) Business plan Exam questions MBA diploma Thesis (college/technical school) Other Cases Laboratory work, RGR On-line help Practice report Information search Presentation in PowerPoint Postgraduate abstract Accompanying materials for the diploma Article Test Drawings more »

Thank you, an email has been sent to you. Check your mail.

Do you want a 15% discount promo code?

Receive SMS
with promo code

Successfully!

?Tell the promo code during a conversation with the manager.
The promo code can only be used once on your first order.
Type of promotional code - " graduate work".

Interaction of mathematics and linguistics


Introduction

Chapter 1. The history of the application of mathematical methods in linguistics

1.1. The Formation of Structural Linguistics at the Turn of the 19th – 20th Centuries

1.2. Application of mathematical methods in linguistics in the second half of the twentieth century

Chapter 2. Selected examples of the use of mathematics in linguistics

2.1. Machine translate

2.2.Statistical methods in language learning

2.3. Learning a language by methods of formal logic

2.4. Prospects for the application of mathematical methods in linguistics

Conclusion

Literature

Appendix 1. Ronald Schleifer. Ferdinand de Saussure

Appendix 2. Ferdinand de Saussure (translation)

Introduction


In the 20th century, there has been a continuing trend towards the interaction and interpenetration of various fields of knowledge. The boundaries between the individual sciences are gradually blurred; there are more and more branches of mental activity that are "at the junction" of humanitarian, technical and natural science knowledge.

Another obvious feature of modernity is the desire to study structures and their constituent elements. Therefore, an increasing place, both in scientific theory and in practice, is given to mathematics. Coming into contact, on the one hand, with logic and philosophy, on the other hand, with statistics (and, consequently, with the social sciences), mathematics penetrates deeper and deeper into those areas that for a long time were considered to be purely “humanitarian”, expanding their heuristic potential (the answer to the question "how much" will often help answer the questions "what" and "how"). Linguistics was no exception.

The purpose of my course work is to briefly highlight the connection between mathematics and such a branch of linguistics as linguistics. Since the 50s of the last century, mathematics has been used in linguistics to create a theoretical apparatus for describing the structure of languages ​​(both natural and artificial). However, it should be said that it did not immediately find such a practical application for itself. Initially, mathematical methods in linguistics began to be used in order to clarify the basic concepts of linguistics, however, with the development of computer technology, such a theoretical premise began to find application in practice. The resolution of such tasks as machine translation, machine information retrieval, automatic text processing required a fundamentally new approach to the language. A question has arisen before linguists: how to learn to represent linguistic patterns in the form in which they can be applied directly to technology. The term “mathematical linguistics”, which is popular in our time, refers to any linguistic research that uses exact methods (and the concept of exact methods in science is always closely related to mathematics). Some scientists of the past years believe that the expression itself cannot be elevated to the rank of a term, since it does not mean any special “linguistics”, but only a new direction focused on improving, increasing the accuracy and reliability of language research methods. Linguistics uses both quantitative (algebraic) and non-quantitative methods, which brings it closer to mathematical logic, and, consequently, to philosophy, and even to psychology. Even Schlegel noted the interaction of language and consciousness, and Ferdinand de Saussure, a prominent linguist of the early twentieth century (I will tell about his influence on the development of mathematical methods in linguistics later), connected the structure of the language with its belonging to the people. The modern researcher L. Perlovsky goes further, identifying the quantitative characteristics of the language (for example, the number of genders, cases) with the peculiarities of the national mentality (more on this in Section 2.2, "Statistical Methods in Linguistics").

The interaction of mathematics and linguistics is a multifaceted topic, and in my work I will not dwell on all, but, first of all, on its applied aspects.

Chapter I. History of the Application of Mathematical Methods in Linguistics


1.1 The formation of structural linguistics at the turn of the XIX - XX centuries


The mathematical description of language is based on the idea of ​​language as a mechanism, which goes back to the famous Swiss linguist of the early twentieth century, Ferdinand de Saussure.

The initial link of his concept is the theory of language as a system consisting of three parts (language itself - langue, speech - parole, and speech activity - langage), in which each word (member of the system) is considered not in itself, but in connection with others. members. As another prominent linguist, the Dane Louis Hjelmslev, later noted, Saussure "was the first to demand a structural approach to language, that is, a scientific description of the language by recording the relationships between units."

Understanding language as a hierarchical structure, Saussure was the first to pose the problem of the value and significance of language units. Separate phenomena and events (say, the history of the origin of individual Indo-European words) should be studied not by themselves, but in a system in which they are correlated with similar components.

The structural unit of the language of Saussure considered the word, "sign", in which sound and meaning were combined. None of these elements exist without each other: therefore, the native speaker understands the various shades of the meaning of a polysemantic word as a separate element in the structural whole, in the language.

Thus, in the theory of F. de Saussure one can see the interaction of linguistics, on the one hand, with sociology and social psychology (it should be noted that at the same time, Husserl's phenomenology, Freud's psychoanalysis, Einstein's theory of relativity were developing, experiments were taking place on form and content in literature, music, and fine arts), on the other hand, with mathematics (the concept of consistency corresponds to the algebraic concept of language). Such a concept changed the concept of linguistic interpretation as such: Phenomena began to be interpreted not in relation to the causes of their occurrence, but in relation to the present and future. Interpretation ceased to be independent of a person's intentions (despite the fact that intentions may be impersonal, "unconscious" in the Freudian sense of the word).

The functioning of the linguistic mechanism is manifested through the speech activity of native speakers. The result of speech is the so-called "correct texts" - sequences of speech units that obey certain patterns, many of which allow mathematical description. The theory of ways to describe the syntactic structure deals with the study of methods for mathematical description of correct texts (primarily sentences). In such a structure, linguistic analogies are defined not with the help of their inherent qualities, but with the help of system (“structural”) relations.

Saussure's ideas were developed in the West by the younger contemporaries of the great Swiss linguist: in Denmark - L. Hjelmslev, already mentioned by me, who gave rise to the algebraic theory of language in his work "Fundamentals of Linguistic Theory", in the USA - E. Sapir, L. Bloomfield, C. Harris, in the Czech Republic - Russian scientist-emigrant N. Trubetskoy.

Statistical regularities in the study of language began to be dealt with by none other than the founder of genetics, Georg Mendel. It was only in 1968 that philologists discovered that, it turns out, in the last years of his life he was fascinated by the study of linguistic phenomena using the methods of mathematics. Mendel brought this method to linguistics from biology; in the 1990s, only the most daring linguists and biologists claimed the feasibility of such an analysis. In the archives of the monastery of St. Tomasz in Brno, whose abbot was Mendel, sheets were found with columns of surnames ending in "mann", "bauer", "mayer", and with some fractions and calculations. In an effort to discover the formal laws of the origin of family names, Mendel makes complex calculations, in which he takes into account the number of vowels and consonants in the German language, the total number of words he considers, the number of surnames, etc.

In our country, structural linguistics began to develop at about the same time as in the West - at the turn of the 19th-20th centuries. Simultaneously with F. de Saussure, the concept of language as a system was developed in their works by professors of Kazan University F.F. Fortunatov and I.A. Baudouin de Courtenay. The latter corresponded for a long time with de Saussure, respectively, the Geneva and Kazan schools of linguistics collaborated with each other. If Saussure can be called the ideologist of "exact" methods in linguistics, then Baudouin de Courtenay laid the practical foundations for their application. He was the first to separate linguistics (as an exact science using statistical methods and functional dependence) from philology (a community of humanitarian disciplines that study spiritual culture through language and speech). The scientist himself believed that "linguistics can be useful in the near future, only freed from the mandatory union with philology and literary history" . Phonology became the "testing ground" for the introduction of mathematical methods into linguistics - sounds as "atoms" of the language system, having a limited number of easily measurable properties, were the most convenient material for formal, rigorous methods of description. Phonology denies the existence of meaning in sound, so the "human" factor was eliminated in the studies. In this sense, phonemes are like physical or biological objects.

Phonemes, as the smallest linguistic elements acceptable for perception, represent a separate sphere, a separate "phenomenological reality". For example, in English, the sound "t" can be pronounced differently, but in all cases, a person who speaks English will perceive it as "t". The main thing is that the phoneme will perform its main - meaningful - function. Moreover, the differences between languages ​​are such that varieties of one sound in one language can correspond to different phonemes in another; for example, "l" and "r" in English are different, while in other languages ​​they are varieties of the same phoneme (like the English "t", pronounced with or without aspiration). The vast vocabulary of any natural language is a set of combinations of a much smaller number of phonemes. In English, for example, only 40 phonemes are used to pronounce and write about a million words.

The sounds of a language are a systematically organized set of features. In the 1920s-1930s, following Saussure, Jacobson and N.S. Trubetskoy singled out the “distinctive features” of phonemes. These features are based on the structure of the organs of speech - tongue, teeth, vocal cords. For example, in English the difference between "t" and "d" is the presence or absence of a "voice" (the tension of the vocal cords) and the level of voice that distinguishes one phoneme from another. Thus, phonology can be considered an example of the general language rule described by Saussure: "There are only differences in language". Even more important is not this: the difference usually implies the exact conditions between which it is located; but in language there are only differences without precise conditions. Whether we are considering "designation" or "signified" - in the language there are neither concepts nor sounds that would have existed before the development of the language system.

Thus, in Saussurean linguistics, the studied phenomenon is understood as a set of comparisons and oppositions of language. Language is both an expression of the meaning of words and a means of communication, and these two functions never coincide. We can notice the alternation of form and content: linguistic contrasts define its structural units, and these units interact to create a certain meaningful content. Since the elements of language are random, neither contrast nor combination can be the basis. This means that in a language, distinctive features form a phonetic contrast at a different level of understanding, phonemes are combined into morphemes, morphemes into words, words into sentences, etc. In any case, an entire phoneme, word, sentence, etc. is more than just the sum of its parts.

Saussure proposed the idea of ​​a new science of the twentieth century, separate from linguistics, studying the role of signs in society. Saussure called this science semiology (from the Greek "semeon" - a sign). The "science" of semiotics, which developed in Eastern Europe in the 1920s-1930s and in Paris in the 1950s-1960s, expanded the study of language and linguistic structures into literary findings composed (or formulated) with the help of these structures. In addition, in the twilight of his career, in parallel with his course in general linguistics, Saussure engaged in a "semiotic" analysis of late Roman poetry, trying to discover deliberately composed anagrams of proper names. This method was in many ways the opposite of rationalism in its linguistic analysis: it was an attempt to study in a system the problem of "probability" in language. Such research helps to focus on the "real side" of probability; the "key word" for which Saussure is looking for an anagram is, as Jean Starobinsky argues, "a tool for the poet, not the source of the life of the poem." The poem serves to swap the sounds of the keyword. According to Starobinsky, in this analysis, "Saussure does not delve into the search for hidden meanings." On the contrary, in his works, a desire to avoid questions related to consciousness is noticeable: “since poetry is expressed not only in words, but also in what these words give rise to, it goes beyond the control of consciousness and depends only on the laws of language” (see . Appendix 1).

Saussure's attempt to study proper names in late Roman poetry emphasizes one of the components of his linguistic analysis - the arbitrary nature of signs, as well as the formal essence of Saussure's linguistics, which excludes the possibility of analyzing meaning. Todorov concludes that today the works of Saussure seem to be extremely consistent in their reluctance to study the symbols of a phenomenon that have a clearly defined meaning [Appendix 1]. Exploring anagrams, Saussure pays attention only to repetition, but not to previous options. . . . Studying the Nibelungenlied, he defines the symbols only to assign them to erroneous readings: if they are unintentional, the symbols do not exist. After all, in his writings on general linguistics, he makes the assumption of the existence of a semiology that describes not only linguistic signs; but this assumption is limited by the fact that semiology can only describe random, arbitrary signs.

If this is really so, it is only because he could not imagine "intention" without an object; he could not completely bridge the gap between form and content - in his writings this turned into a question. Instead, he turned to "linguistic legitimacy". Between, on the one hand, nineteenth-century concepts based on history and subjective conjectures, and methods of random interpretation based on these concepts, and, on the other hand, structuralist concepts that erase the opposition between form and content (subject and object), meaning and origins in structuralism, psychoanalysis, and even quantum mechanics, Ferdinand de Saussure's writings on linguistics and semiotics mark a turning point in the study of meanings in language and culture.

Russian scientists were also represented at the First International Congress of Linguists in The Hague in 1928. S. Kartsevsky, R. Yakobson and N. Trubetskoy made a report that examined the hierarchical structure of the language - in the spirit of the most modern ideas for the beginning of the last century. Jakobson in his writings developed Saussure's ideas that the basic elements of a language should be studied, first of all, in connection with their functions, and not with the reasons for their occurrence.

Unfortunately, after Stalin came to power in 1924, Russian linguistics, like many other sciences, is thrown back. Many talented scientists were forced to emigrate, were expelled from the country or died in camps. It was not until the mid-1950s that a certain pluralism of theories became possible—more on that in Section 1.2.


1.2 Application of mathematical methods in linguistics in the second half of the twentieth century


By the middle of the twentieth century, four world linguistic schools had formed, each of which turned out to be the ancestor of a certain “exact” method. The Leningrad phonological school (its ancestor was a student of Baudouin de Courtenay L.V. Shcherba) used a psycholinguistic experiment based on the analysis of the speech of native speakers as the main criterion for generalizing sound in the form of a phoneme.

Scientists of the Prague Linguistic Circle, in particular, its founder N.S. Trubetskoy, who emigrated from Russia, developed the theory of oppositions - the semantic structure of the language was described by them as a set of oppositionally constructed semantic units - Sem. This theory was applied in the study of not only language, but also artistic culture.

The ideologists of American descriptivism were the linguists L. Bloomfield and E. Sapir. Language was presented to descriptivists as a set of speech statements, which were the main object of their study. Their focus was on the rules of scientific description (hence the name) of texts: the study of organization, arrangement and classification of their elements. Formalization of analytical procedures in the field of phonology and morphology (development of principles for the study of language at different levels, distributive analysis, the method of direct constituents, etc.) led to the formulation of general questions of linguistic modeling. Inattention to the plan of the content of the language, as well as the paradigmatic side of the language, did not allow descriptivists to interpret the language as a system fully enough.

In the 1960s, the theory of formal grammars developed, which arose mainly due to the work of the American philosopher and linguist N. Chomsky. He is rightfully considered one of the most famous modern scientists and public figures, many articles, monographs and even a full-length documentary are devoted to him. By the name of a fundamentally new way of describing the syntactic structure invented by Chomsky - generative (generative) grammar - the corresponding trend in linguistics was called generativism.

Chomsky, a descendant of immigrants from Russia, studied linguistics, mathematics and philosophy at the University of Pennsylvania since 1945, being strongly influenced by his teacher Zelig Harris - like Harris, Chomsky considered and still considers his political views close to anarchism (he is still known as critic of the existing US political system and as one of the spiritual leaders of anti-globalism).

Chomsky's first major scientific work, his master's thesis Morphology of Modern Hebrew (1951), remained unpublished. Chomsky received his doctorate from the University of Pennsylvania in 1955, but much of the research underlying his dissertation (published in full only in 1975 under the title The Logical Structure of Linguistic Theory) and his first monograph, Syntactic Structures (1957, Rus. trans. 1962), was performed at Harvard University in 1951-1955. In the same 1955, the scientist moved to the Massachusetts Institute of Technology, where he became a professor in 1962.

Chomsky's theory has gone through several stages in its development.

In the first monograph "Syntactic Structures", the scientist presented the language as a mechanism for generating an infinite set of sentences using a finite set of grammatical means. To describe linguistic properties, he proposed the concepts of deep (hidden from direct perception and generated by a system of recursive, i.e., can be applied repeatedly, rules) and surface (directly perceived) grammatical structures, as well as transformations that describe the transition from deep structures to surface ones. Several surface structures can correspond to one deep structure (for example, the passive construction The decree is signed by the president is derived from the same deep structure as the active construction The president signs the decree) and vice versa (for example, the ambiguity Mother loves daughter is described as the result of the coincidence of surface structures that go back to two different deep, in one of which the mother is the one who loves the daughter, and in the other, the one who the daughter loves).

Chomsky's standard theory is considered to be the "Aspects" model set forth in Chomsky's book "Aspects of the Theory of Syntax". In this model, for the first time, rules of semantic interpretation were introduced into formal theory, attributing meaning to deep structures. In Aspects, linguistic competence is opposed to the use of language (performance), the so-called Katz-Postal hypothesis about the preservation of meaning during transformation is adopted, in connection with which the concept of optional transformation is excluded, and an apparatus of syntactic features describing lexical compatibility is introduced.

In the 1970s, Chomsky worked on the theory of governance and binding (GB-theory - from the words government and binding) - more general than the previous one. In it, the scientist abandoned the specific rules that describe the syntactic structures of specific languages. All transformations have been replaced with one universal move transformation. Within the framework of the GB theory, there are also private modules, each of which is responsible for its own part of the grammar.

Already recently, in 1995, Chomsky put forward a minimalist program, where human language is described like machine language. It is only a program, not a model or a theory. In it, Chomsky identifies two main subsystems of the human language apparatus: the lexicon and the computing system, as well as two interfaces - phonetic and logical.

Chomsky's formal grammars have become classic for describing not only natural but also artificial languages ​​- in particular, programming languages. The development of structural linguistics in the second half of the 20th century can rightfully be considered a "Chomskian revolution".

The Moscow Phonological School, whose representatives were A.A. Reformatsky, V.N. Sidorov, P.S. Kuznetsov, A.M. Sukhotin, R.I. Avanesov, used a similar theory to study phonetics. Gradually, "exact" methods are beginning to be applied with regards not only to phonetics, but also to syntax. Both linguists and mathematicians, both here and abroad, are beginning to study the structure of the language. In the 1950s and 60s, a new stage in the interaction between mathematics and linguistics began in the USSR, associated with the development of machine translation systems.

The impetus for the beginning of these works in our country was the first developments in the field of machine translation in the United States (although the first mechanized translation device by P.P. Smirnov-Troyansky was invented in the USSR back in 1933, it, being primitive, did not become widespread). In 1947, A. Butt and D. Britten came up with a code for word-by-word translation using a computer; a year later, R. Richens proposed a rule for splitting words into stems and endings in machine translation. Those years were quite different from today. These were very large and expensive machines that took up entire rooms and required a large staff of engineers, operators and programmers to maintain them. Basically, these computers were used to carry out mathematical calculations for the needs of military institutions - the new in mathematics, physics and technology served, first of all, military affairs. In the early stages, the development of the MP was actively supported by the military, while (in the conditions of the Cold War) the Russian-English direction developed in the USA, and the Anglo-Russian direction in the USSR.

In January 1954, the Georgetown Experiment, the first public demonstration of translation from Russian into English on the IBM-701 machine, took place at the Massachusetts Technical University. Abstract of the message about the successful passage of the experiment, made by D.Yu. Panov, appeared in the RJ "Mathematics", 1954, No. 10: "Translation from one language to another using a machine: a report on the first successful test."

D. Yu. Panov (at that time director of the Institute of Scientific Information - INI, later VINITI) attracted I. K. Belskaya, who later headed the machine translation group at the Institute of Precise Mathematics and Computer Engineering of the USSR Academy of Sciences, to work on machine translation. By the end of 1955, the first experience of translating from English into Russian with the help of the BESM machine dates back. Programs for BESM were compiled by N.P. Trifonov and L.N. Korolev, whose Ph.D. thesis was devoted to methods for constructing dictionaries for machine translation.

In parallel, work on machine translation was carried out at the Department of Applied Mathematics of the Mathematical Institute of the USSR Academy of Sciences (now the M.V. Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences). At the initiative of the mathematician A.A. Lyapunov. He involved O.S. Kulagin and her students T.D. Wentzel and N.N. Ricco. The ideas of Lyapunov and Kulagina about the possibility of using technology to translate from one language into another were published in the journal Nature, 1955, No. 8. From the end of 1955, T.N. Moloshnaya, who then began independent work on the English-Russian translation algorithm.

R. Frumkina, who at that time was engaged in the translation algorithm from Spanish, recalls that at this stage of the work it was difficult to take any consistent steps. Much more often I had to follow the heuristic experience - my own or colleagues.

However, the first generation of machine translation systems were very imperfect. All of them were based on sequential translation algorithms “word by word”, “phrase by phrase” – semantic connections between words and sentences were not taken into account in any way. For example, we can cite the sentences: “John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy. (John was looking for his toy box. Finally he found it. The box was in the playpen. John was very happy.).” "Pen" in this context is not a "pen" (writing tool), but a "playpen" (play-pen). Knowledge of synonyms, antonyms and figurative meanings is difficult to enter into a computer. A promising direction was the development of computer systems focused on the use of a human translator.

Over time, direct translation systems were replaced by T-systems (from the English word "transfer" - transformation), in which translation was carried out at the level of syntactic structures. The algorithms of T-systems used a mechanism that made it possible to build a syntactic structure according to the grammar rules of the language of the input sentence (similar to how a foreign language is taught in high school), and then synthesize the output sentence by transforming the syntactic structure and substituting the necessary words from the dictionary.

Lyapunov talked about translation by extracting the meaning of the translated text and presenting it in another language. The approach to building machine translation systems based on obtaining the semantic representation of the input sentence by semantic analysis and synthesis of the input sentence according to the obtained semantic representation is still considered the most perfect. Such systems are called I-systems (from the word "interlingua"). However, the task of creating them, set back in the late 50s and early 60s, has not been fully resolved so far, despite the efforts of the International Federation of IFIP, the world community of scientists in the field of information processing.

Scientists thought about how to formalize and build algorithms for working with texts, what dictionaries should be entered into the machine, what linguistic patterns should be used in machine translation. Traditional linguistics did not have such ideas - not only in terms of semantics, but also in terms of syntax. At that time, there were no lists of syntactic constructions for any language, the conditions for their compatibility and interchangeability were not studied, the rules for constructing large units of syntactic structure from smaller constituent elements were not developed.

The need to create the theoretical foundations of machine translation led to the formation and development of mathematical linguistics. The leading role in this matter in the USSR was played by the mathematicians A.A. Lyapunov, O.S. Kulagina, V.A. Uspensky, linguists V.Yu. Rosenzweig, P.S. Kuznetsov, R.M. Frumkina, A.A. Reformatsky, I.A. Melchuk, V.V. Ivanov. Kulagina's dissertation was devoted to the study of the formal theory of grammars (simultaneously with N. Khomsky in the USA), Kuznetsov put forward the task of axiomatization of linguistics, which goes back to the works of F.F. Fortunatov.

On May 6, 1960, the Decree of the Presidium of the USSR Academy of Sciences "On the development of structural and mathematical methods for the study of language" was adopted, and corresponding divisions were created at the Institute of Linguistics and the Institute of the Russian Language. Since 1960, in the leading humanitarian universities of the country - the philological faculty of Moscow State University, Leninrad, Novosibirsk universities, Moscow State Institute of Foreign Languages ​​- began training in the field of automatic text processing.

However, machine translation works of this period, called "classical", are of theoretical rather than practical interest. Cost-effective machine translation systems began to be created only in the eighties of the last century. I will talk about this later in Section 2.1, Machine Translation.

The 1960s - 70s include deep theoretical developments using the methods of set theory and mathematical logic, such as field theory and fuzzy set theory.

The author of field theory in linguistics was the Soviet poet, translator and linguist V.G. Admoni. He initially developed his theory on the basis of the German language. For Admoni, the concept of "field" denotes an arbitrary non-empty set of linguistic elements (for example, "lexical field", "semantic field").

The structure of the field is heterogeneous: it consists of a core, the elements of which have a complete set of features that define a set, and a periphery, the elements of which can have both the features of a given set (not all) and neighboring ones. I will give an example illustrating this statement: for example, in English, the field of compound words (“day-dream” - “dream” is difficult to separate from the field of phrases (“tear gas” - “tear gas”).

The theory of fuzzy sets already mentioned above is closely related to field theory. In the USSR, linguists V.G. Admoni, I.P. Ivanova, G.G. Pochentsov, however, its ancestor was the American mathematician L. Zadeh, who in 1965 published the article “Fuzzy Logic”. Giving a mathematical justification for the theory of fuzzy sets, Zade considered them on the basis of linguistic material.

In this theory, we are talking not so much about the belonging of elements to a given set (АОа), as about the degree of this belonging (mАОа), since peripheral elements can belong to several fields to one degree or another. Zade (Lofti-zade) was a native of Azerbaijan, until the age of 12 he had the practice of communicating in four languages ​​- Azerbaijani, Russian, English and Persian - and used three different alphabets: Cyrillic, Latin, Arabic. When a scientist is asked what is common between fuzzy set theory and linguistics, he does not deny this connection, but clarifies: “I am not sure that the study of these languages ​​\u200b\u200bhas had a big impact on my thinking. If this was the case, then only subconsciously. In his youth, Zadeh studied at a Presbyterian school in Tehran, and after World War II he emigrated to the United States. “The question is not whether I am American, Russian, Azerbaijani or whatever,” he said.

Similar abstracts:

Language and speech as one of the fundamental problems in stylistics. The concept of phoneme and phonological level. The concept of language as a system and levels of the language system. The concept of morphemes and their types. A sentence as a syntactic unit of a text. Signs of the language system.

Introduction

Chapter 1. The history of the application of mathematical methods in linguistics

1.1. The Formation of Structural Linguistics at the Turn of the 19th – 20th Centuries

1.2. Application of mathematical methods in linguistics in the second half of the twentieth century

Chapter 2. Selected examples of the use of mathematics in linguistics

2.1. Machine translate

2.2.Statistical methods in language learning

2.3. Learning a language by methods of formal logic

2.4. Prospects for the application of mathematical methods in linguistics

Conclusion

Literature

Appendix 1. Ronald Schleifer. Ferdinand de Saussure

Appendix 2. Ferdinand de Saussure (translation)

Introduction

In the 20th century, there has been a continuing trend towards the interaction and interpenetration of various fields of knowledge. The boundaries between the individual sciences are gradually blurred; there are more and more branches of mental activity that are "at the junction" of humanitarian, technical and natural science knowledge.

Another obvious feature of modernity is the desire to study structures and their constituent elements. Therefore, an increasing place, both in scientific theory and in practice, is given to mathematics. Coming into contact, on the one hand, with logic and philosophy, on the other hand, with statistics (and, consequently, with the social sciences), mathematics penetrates deeper and deeper into those areas that for a long time were considered to be purely “humanitarian”, expanding their heuristic potential (the answer to the question "how much" will often help answer the questions "what" and "how"). Linguistics was no exception.

The purpose of my course work is to briefly highlight the connection between mathematics and such a branch of linguistics as linguistics. Since the 50s of the last century, mathematics has been used in linguistics to create a theoretical apparatus for describing the structure of languages ​​(both natural and artificial). However, it should be said that it did not immediately find such a practical application for itself. Initially, mathematical methods in linguistics began to be used in order to clarify the basic concepts of linguistics, however, with the development of computer technology, such a theoretical premise began to find application in practice. The resolution of such tasks as machine translation, machine information retrieval, automatic text processing required a fundamentally new approach to the language. A question has arisen before linguists: how to learn to represent linguistic patterns in the form in which they can be applied directly to technology. The term “mathematical linguistics”, which is popular in our time, refers to any linguistic research that uses exact methods (and the concept of exact methods in science is always closely related to mathematics). Some scientists of the past years believe that the expression itself cannot be elevated to the rank of a term, since it does not mean any special “linguistics”, but only a new direction focused on improving, increasing the accuracy and reliability of language research methods. Linguistics uses both quantitative (algebraic) and non-quantitative methods, which brings it closer to mathematical logic, and, consequently, to philosophy, and even to psychology. Even Schlegel noted the interaction of language and consciousness, and Ferdinand de Saussure, a prominent linguist of the early twentieth century (I will tell about his influence on the development of mathematical methods in linguistics later), connected the structure of the language with its belonging to the people. The modern researcher L. Perlovsky goes further, identifying the quantitative characteristics of the language (for example, the number of genders, cases) with the peculiarities of the national mentality (more on this in Section 2.2, "Statistical Methods in Linguistics").

The interaction of mathematics and linguistics is a multifaceted topic, and in my work I will not dwell on all, but, first of all, on its applied aspects.

Chapter I. History of the Application of Mathematical Methods in Linguistics

1.1 The formation of structural linguistics at the turn of the XIX - XX centuries

The mathematical description of language is based on the idea of ​​language as a mechanism, which goes back to the famous Swiss linguist of the early twentieth century, Ferdinand de Saussure.

The initial link of his concept is the theory of language as a system consisting of three parts (the language itself is language, speech - password, and speech activity - language), in which each word (member of the system) is considered not in itself, but in connection with other members. As another prominent linguist, the Dane Louis Hjelmslev, later noted, Saussure "was the first to demand a structural approach to language, that is, a scientific description of the language by recording the relationships between units."

Understanding language as a hierarchical structure, Saussure was the first to pose the problem of the value and significance of language units. Separate phenomena and events (say, the history of the origin of individual Indo-European words) should be studied not by themselves, but in a system in which they are correlated with similar components.

The structural unit of the language of Saussure considered the word, "sign", in which sound and meaning were combined. None of these elements exist without each other: therefore, the native speaker understands the various shades of the meaning of a polysemantic word as a separate element in the structural whole, in the language.

Thus, in the theory of F. de Saussure one can see the interaction of linguistics, on the one hand, with sociology and social psychology (it should be noted that at the same time, Husserl's phenomenology, Freud's psychoanalysis, Einstein's theory of relativity were developing, experiments were taking place on form and content in literature, music, and fine arts), on the other hand, with mathematics (the concept of consistency corresponds to the algebraic concept of language). Such a concept changed the concept of linguistic interpretation as such: Phenomena began to be interpreted not in relation to the causes of their occurrence, but in relation to the present and future. Interpretation ceased to be independent of a person's intentions (despite the fact that intentions may be impersonal, "unconscious" in the Freudian sense of the word).

The functioning of the linguistic mechanism is manifested through the speech activity of native speakers. The result of speech is the so-called "correct texts" - sequences of speech units that obey certain patterns, many of which allow mathematical description. The theory of ways to describe the syntactic structure deals with the study of methods for mathematical description of correct texts (primarily sentences). In such a structure, linguistic analogies are defined not with the help of their inherent qualities, but with the help of system (“structural”) relations.

Saussure's ideas were developed in the West by the younger contemporaries of the great Swiss linguist: in Denmark - L. Hjelmslev, already mentioned by me, who gave rise to the algebraic theory of language in his work "Fundamentals of Linguistic Theory", in the USA - E. Sapir, L. Bloomfield, C. Harris, in the Czech Republic - Russian scientist-emigrant N. Trubetskoy.

Statistical regularities in the study of language began to be dealt with by none other than the founder of genetics, Georg Mendel. It was only in 1968 that philologists discovered that, it turns out, in the last years of his life he was fascinated by the study of linguistic phenomena using the methods of mathematics. Mendel brought this method to linguistics from biology; in the 1990s, only the most daring linguists and biologists claimed the feasibility of such an analysis. In the archives of the monastery of St. Tomasz in Brno, whose abbot was Mendel, sheets were found with columns of surnames ending in "mann", "bauer", "mayer", and with some fractions and calculations. In an effort to discover the formal laws of the origin of family names, Mendel makes complex calculations, in which he takes into account the number of vowels and consonants in the German language, the total number of words he considers, the number of surnames, etc.

In our country, structural linguistics began to develop at about the same time as in the West - at the turn of the 19th-20th centuries. Simultaneously with F. de Saussure, the concept of language as a system was developed in their works by professors of Kazan University F.F. Fortunatov and I.A. Baudouin de Courtenay. The latter corresponded for a long time with de Saussure, respectively, the Geneva and Kazan schools of linguistics collaborated with each other. If Saussure can be called the ideologist of "exact" methods in linguistics, then Baudouin de Courtenay laid the practical foundations for their application. He was the first to separate linguistics (as accurate a science using statistical methods and functional dependence) from philology (a community of humanitarian disciplines that study spiritual culture through language and speech). The scientist himself believed that "linguistics can be useful in the near future, only freed from the mandatory union with philology and literary history" . Phonology became the "testing ground" for the introduction of mathematical methods into linguistics - sounds as "atoms" of the language system, having a limited number of easily measurable properties, were the most convenient material for formal, rigorous methods of description. Phonology denies the existence of meaning in sound, so the "human" factor was eliminated in the studies. In this sense, phonemes are like physical or biological objects.

Phonemes, as the smallest linguistic elements acceptable for perception, represent a separate sphere, a separate "phenomenological reality". For example, in English, the sound "t" can be pronounced differently, but in all cases, a person who speaks English will perceive it as "t". The main thing is that the phoneme will perform its main - meaningful - function. Moreover, the differences between languages ​​are such that varieties of one sound in one language can correspond to different phonemes in another; for example, "l" and "r" in English are different, while in other languages ​​they are varieties of the same phoneme (like the English "t", pronounced with or without aspiration). The vast vocabulary of any natural language is a set of combinations of a much smaller number of phonemes. In English, for example, only 40 phonemes are used to pronounce and write about a million words.

Introduction

Chapter 1. The history of the application of mathematical methods in linguistics

1.1. The Formation of Structural Linguistics at the Turn of the 19th - 20th Centuries

1.2. Application of mathematical methods in linguistics in the second half of the twentieth century

Chapter 2. Selected examples of the use of mathematics in linguistics

2.1. Machine translate

2.2.Statistical methods in language learning

2.3. Learning a language by methods of formal logic

2.4. Prospects for the application of mathematical methods in linguistics

Conclusion

Literature

Appendix 1. Ronald Schleifer. Ferdinand de Saussure

Appendix 2. Ferdinand de Saussure (translation)

Introduction

In the 20th century, there has been a continuing trend towards the interaction and interpenetration of various fields of knowledge. The boundaries between the individual sciences are gradually blurred; there are more and more branches of mental activity that are "at the junction" of humanitarian, technical and natural science knowledge.

Another obvious feature of modernity is the desire to study structures and their constituent elements. Therefore, an increasing place, both in scientific theory and in practice, is given to mathematics. Coming into contact, on the one hand, with logic and philosophy, on the other hand, with statistics (and, consequently, with the social sciences), mathematics penetrates deeper and deeper into those areas that for a long time were considered to be purely “humanitarian”, expanding their heuristic potential (the answer to the question "how much" will often help answer the questions "what" and "how"). Linguistics was no exception.

The purpose of my course work is to briefly highlight the connection between mathematics and such a branch of linguistics as linguistics. Since the 50s of the last century, mathematics has been used in linguistics to create a theoretical apparatus for describing the structure of languages ​​(both natural and artificial). However, it should be said that it did not immediately find such a practical application for itself. Initially, mathematical methods in linguistics began to be used in order to clarify the basic concepts of linguistics, however, with the development of computer technology, such a theoretical premise began to find application in practice. The resolution of such tasks as machine translation, machine information retrieval, automatic text processing required a fundamentally new approach to the language. A question has arisen before linguists: how to learn to represent linguistic patterns in the form in which they can be applied directly to technology. The term “mathematical linguistics”, which is popular in our time, refers to any linguistic research that uses exact methods (and the concept of exact methods in science is always closely related to mathematics). Some scientists of the past years believe that the expression itself cannot be elevated to the rank of a term, since it does not mean any special “linguistics”, but only a new direction focused on improving, increasing the accuracy and reliability of language research methods. Linguistics uses both quantitative (algebraic) and non-quantitative methods, which brings it closer to mathematical logic, and, consequently, to philosophy, and even to psychology. Even Schlegel noted the interaction of language and consciousness, and Ferdinand de Saussure, a prominent linguist of the early twentieth century (I will tell about his influence on the development of mathematical methods in linguistics later), connected the structure of the language with its belonging to the people. The modern researcher L. Perlovsky goes further, identifying the quantitative characteristics of the language (for example, the number of genders, cases) with the peculiarities of the national mentality (more on this in Section 2.2, "Statistical Methods in Linguistics").

The interaction of mathematics and linguistics is a multifaceted topic, and in my work I will not dwell on all, but, first of all, on its applied aspects.

Chapter I. History of the Application of Mathematical Methods in Linguistics

1.1 The formation of structural linguistics at the turn of the XIX - XX centuries

The mathematical description of language is based on the idea of ​​language as a mechanism, which goes back to the famous Swiss linguist of the early twentieth century, Ferdinand de Saussure.

The initial link of his concept is the theory of language as a system consisting of three parts (the language itself is language, speech - password, and speech activity - language), in which each word (member of the system) is considered not in itself, but in connection with other members. As another prominent linguist, the Dane Louis Hjelmslev, later noted, Saussure "was the first to demand a structural approach to language, that is, a scientific description of the language by recording the relationships between units."

Understanding language as a hierarchical structure, Saussure was the first to pose the problem of the value and significance of language units. Separate phenomena and events (say, the history of the origin of individual Indo-European words) should be studied not by themselves, but in a system in which they are correlated with similar components.

The structural unit of the language of Saussure considered the word, "sign", in which sound and meaning were combined. None of these elements exist without each other: therefore, the native speaker understands the various shades of the meaning of a polysemantic word as a separate element in the structural whole, in the language.

Thus, in the theory of F. de Saussure one can see the interaction of linguistics, on the one hand, with sociology and social psychology (it should be noted that at the same time, Husserl's phenomenology, Freud's psychoanalysis, Einstein's theory of relativity were developing, experiments were taking place on form and content in literature, music and fine arts), on the other hand, with mathematics (the concept of systemicity corresponds to the algebraic concept of language). Such a concept changed the concept of linguistic interpretation as such: Phenomena began to be interpreted not in relation to the causes of their occurrence, but in relation to the present and future. Interpretation ceased to be independent of a person's intentions (despite the fact that intentions may be impersonal, "unconscious" in the Freudian sense of the word).

The functioning of the linguistic mechanism is manifested through the speech activity of native speakers. The result of speech is the so-called "correct texts" - sequences of speech units that obey certain patterns, many of which allow a mathematical description. The theory of ways to describe the syntactic structure deals with the study of methods for mathematical description of correct texts (primarily sentences). In such a structure, linguistic analogies are defined not with the help of their inherent qualities, but with the help of system (“structural”) relations.

In the West, Saussure's ideas are developed by the younger contemporaries of the great Swiss linguist: in Denmark - L. Hjelmslev, already mentioned by me, who gave rise to the algebraic theory of language in his work "Fundamentals of Linguistic Theory", in the USA - E. Sapir, L. Bloomfield, C. Harris, in the Czech Republic - the Russian émigré scientist N. Trubetskoy.

Statistical regularities in the study of language began to be dealt with by none other than the founder of genetics, Georg Mendel. It was only in 1968 that philologists discovered that, it turns out, in the last years of his life he was fascinated by the study of linguistic phenomena using the methods of mathematics. Mendel brought this method to linguistics from biology; in the 1990s, only the most daring linguists and biologists claimed the feasibility of such an analysis. In the archives of the monastery of St. Tomasz in Brno, whose abbot was Mendel, sheets were found with columns of surnames ending in "mann", "bauer", "mayer", and with some fractions and calculations. In an effort to discover the formal laws of the origin of family names, Mendel makes complex calculations, in which he takes into account the number of vowels and consonants in the German language, the total number of words he considers, the number of surnames, etc.

In our country, structural linguistics began to develop at about the same time as in the West - at the turn of the 19th-20th centuries. Simultaneously with F. de Saussure, the concept of language as a system was developed in their works by professors of Kazan University F.F. Fortunatov and I.A. Baudouin de Courtenay. The latter corresponded for a long time with de Saussure, respectively, the Geneva and Kazan schools of linguistics collaborated with each other. If Saussure can be called the ideologist of "exact" methods in linguistics, then Baudouin de Courtenay laid the practical foundations for their application. He was the first to separate linguistics (as accurate a science using statistical methods and functional dependence) from philology (a community of humanitarian disciplines that study spiritual culture through language and speech). The scientist himself believed that "linguistics can be useful in the near future, only freed from the mandatory union with philology and literary history" . Phonology became the "testing ground" for the introduction of mathematical methods into linguistics - sounds as "atoms" of the language system, having a limited number of easily measurable properties, were the most convenient material for formal, rigorous methods of description. Phonology denies the existence of meaning in sound, so the "human" factor was eliminated in the studies. In this sense, phonemes are like physical or biological objects.

Phonemes, as the smallest linguistic elements acceptable for perception, represent a separate sphere, a separate "phenomenological reality". For example, in English, the sound "t" can be pronounced differently, but in all cases, a person who speaks English will perceive it as "t". The main thing is that the phoneme will perform its main - meaningful - function. Moreover, the differences between languages ​​are such that varieties of one sound in one language can correspond to different phonemes in another; for example, "l" and "r" in English are different, while in other languages ​​they are varieties of the same phoneme (like the English "t", pronounced with or without aspiration). The vast vocabulary of any natural language is a set of combinations of a much smaller number of phonemes. In English, for example, only 40 phonemes are used to pronounce and write about a million words.

The sounds of a language are a systematically organized set of features. In the 1920s -1930s, following Saussure, Jacobson and N.S. Trubetskoy singled out the "distinctive features" of phonemes. These features are based on the structure of the speech organs - tongue, teeth, vocal cords. For example, in English the difference between "t" and "d" is the presence or absence of a "voice" (the tension of the vocal cords) and the level of voice that distinguishes one phoneme from another. Thus, phonology can be considered an example of the general language rule described by Saussure: "There are only differences in language". Even more important is not this: the difference usually implies the exact conditions between which it is located; but in language there are only differences without precise conditions. Whether we are considering "designation" or "signified" - in the language there are neither concepts nor sounds that would have existed before the development of the language system.

Thus, in Saussurean linguistics, the studied phenomenon is understood as a set of comparisons and oppositions of language. Language is both an expression of the meaning of words and a means of communication, and these two functions never coincide. We can notice the alternation of form and content: linguistic contrasts define its structural units, and these units interact to create a certain meaningful content. Since the elements of language are random, neither contrast nor combination can be the basis. This means that in a language, distinctive features form a phonetic contrast at a different level of understanding, phonemes are combined into morphemes, morphemes - into words, words - into sentences, etc. In any case, an entire phoneme, word, sentence, etc. is more than just the sum of its parts.

Saussure proposed the idea of ​​a new science of the twentieth century, separate from linguistics, studying the role of signs in society. Saussure called this science semiology (from the Greek "semeîon" - a sign). The "science" of semiotics, which developed in Eastern Europe in the 1920s and 1930s and in Paris in the 1950s and 1960s, expanded the study of language and linguistic structures into literary findings composed (or formulated) with the help of these structures. In addition, in the twilight of his career, in parallel with his course in general linguistics, Saussure engaged in a "semiotic" analysis of late Roman poetry, trying to discover deliberately composed anagrams of proper names. This method was in many ways the opposite of rationalism in its linguistic analysis: it was an attempt to study in a system the problem of "probability" in language. Such research helps to focus on the "real side" of probability; the "key word" for which Saussure is looking for an anagram is, as Jean Starobinsky argues, "a tool for the poet, not the source of the life of the poem." The poem serves to swap the sounds of the keyword. According to Starobinsky, in this analysis, "Saussure does not delve into the search for hidden meanings." On the contrary, in his works, a desire to avoid questions related to consciousness is noticeable: “since poetry is expressed not only in words, but also in what these words give rise to, it goes beyond the control of consciousness and depends only on the laws of language” (see . Appendix 1).

Saussure's attempt to study proper names in late Roman poetry emphasizes one of the components of his linguistic analysis - the arbitrary nature of signs, as well as the formal essence of Saussure's linguistics, which excludes the possibility of analyzing meaning. Todorov concludes that today the works of Saussure seem to be extremely consistent in their reluctance to study the symbols of a phenomenon that have a clearly defined meaning [Appendix 1]. Exploring anagrams, Saussure pays attention only to repetition, but not to previous options. . . . Studying the Nibelungenlied, he defines the symbols only to assign them to erroneous readings: if they are unintentional, the symbols do not exist. After all, in his writings on general linguistics, he makes the assumption of the existence of a semiology that describes not only linguistic signs; but this assumption is limited by the fact that semiology can only describe random, arbitrary signs.

If this is really so, it is only because he could not imagine "intention" without an object; he could not completely bridge the gap between form and content - in his writings this turned into a question. Instead, he turned to "linguistic legitimacy". Between, on the one hand, nineteenth-century concepts based on history and subjective conjectures, and methods of random interpretation based on these concepts, and, on the other hand, structuralist concepts that erase the opposition between form and content (subject and object), meaning and origins in structuralism, psychoanalysis, and even quantum mechanics - the writings of Ferdinand de Saussure on linguistics and semiotics mark a turning point in the study of meanings in language and culture.

Russian scientists were also represented at the First International Congress of Linguists in The Hague in 1928. S. Kartsevsky, R. Yakobson and N. Trubetskoy made a report that examined the hierarchical structure of the language - in the spirit of the most modern ideas for the beginning of the last century. Jakobson in his writings developed Saussure's ideas that the basic elements of a language should be studied, first of all, in connection with their functions, and not with the reasons for their occurrence.

Unfortunately, after Stalin came to power in 1924, Russian linguistics, like many other sciences, is thrown back. Many talented scientists were forced to emigrate, were expelled from the country or died in camps. Only since the mid-1950s has a certain pluralism of theories become possible - more on this in Section 1.2.

1.2 Application of mathematical methods in linguistics in the second half of the twentieth century

By the middle of the twentieth century, four world linguistic schools had formed, each of which turned out to be the ancestor of a certain “exact” method. Leningrad Phonological School(its ancestor was a student of Baudouin de Courtenay L.V. Shcherba) used a psycholinguistic experiment based on the analysis of the speech of native speakers as the main criterion for generalizing sound in the form of a phoneme.

Scientists Prague Linguistic Circle, in particular - its founder N.S. Trubetskoy, who emigrated from Russia, developed the theory of oppositions - the semantic structure of the language was described by them as a set of oppositionally constructed semantic units - Sem. This theory was applied in the study of not only language, but also artistic culture.

Ideologists American descriptivism were linguists L. Bloomfield and E. Sapir. Language was presented to descriptivists as a set of speech statements, which were the main object of their study. Their focus was on the rules of scientific description (hence the name) of texts: the study of organization, arrangement and classification of their elements. Formalization of analytical procedures in the field of phonology and morphology (development of principles for the study of language at different levels, distributive analysis, the method of direct constituents, etc.) led to the formulation of general questions of linguistic modeling. Inattention to the plan of the content of the language, as well as the paradigmatic side of the language, did not allow descriptivists to interpret the language as a system fully enough.

In the 1960s, the theory of formal grammars developed, which arose mainly due to the work of the American philosopher and linguist N. Chomsky. He is rightfully considered one of the most famous modern scientists and public figures, many articles, monographs and even a full-length documentary are devoted to him. By the name of a fundamentally new way of describing the syntactic structure invented by Chomsky - generative (generative) grammar - the corresponding trend in linguistics was called generativism.

Chomsky, a descendant of immigrants from Russia, studied linguistics, mathematics and philosophy at the University of Pennsylvania since 1945, being strongly influenced by his teacher Zelig Harris - like Harris, Chomsky considered and still considers his political views close to anarchism (he is still known as critic of the existing US political system and as one of the spiritual leaders of anti-globalism).

Chomsky's first major scientific work, master's thesis "Morphology of Modern Hebrew » (1951) has remained unpublished. Chomsky received his doctorate from the University of Pennsylvania in 1955, but much of the research underlying his dissertation (published in full only in 1975 under the title The Logical Structure of Linguistic Theory) and his first monograph, Syntactic Structures (1957, Rus. trans. 1962), was performed at Harvard University in 1951-1955. In the same 1955, the scientist moved to the Massachusetts Institute of Technology, where he became a professor in 1962.

Chomsky's theory has gone through several stages in its development.

In the first monograph "Syntactic Structures", the scientist presented the language as a mechanism for generating an infinite set of sentences using a finite set of grammatical means. To describe linguistic properties, he proposed the concepts of deep (hidden from direct perception and generated by a system of recursive, i.e., can be applied repeatedly, rules) and surface (directly perceived) grammatical structures, as well as transformations that describe the transition from deep structures to surface ones. Several surface structures can correspond to one deep structure (for example, a passive structure The decree is signed by the President derived from the same Deep Structure as the active construct The President signs the decree) and vice versa (thus, the ambiguity mother loves daughter described as the result of a coincidence of surface structures that go back to two different deep ones, in one of which the mother is the one who loves the daughter, and in the other, the one who is loved by the daughter).

Chomsky's standard theory is considered to be the "Aspects" model set forth in Chomsky's book "Aspects of the Theory of Syntax". In this model, for the first time, rules of semantic interpretation were introduced into formal theory, attributing meaning to deep structures. In Aspects, linguistic competence is opposed to the use of language (performance), the so-called Katz-Postal hypothesis about the preservation of meaning during transformation is adopted, in connection with which the concept of optional transformation is excluded, and an apparatus of syntactic features describing lexical compatibility is introduced.

In the 1970s, Chomsky worked on the theory of control and binding (GB-theory - from the words government and binding) is more general than the previous one. In it, the scientist abandoned the specific rules that describe the syntactic structures of specific languages. All transformations have been replaced with one universal move transformation. Within the framework of the GB theory, there are also private modules, each of which is responsible for its own part of the grammar.

Already recently, in 1995, Chomsky put forward a minimalist program, where human language is described like machine language. This is just a program - not a model or a theory. In it, Chomsky identifies two main subsystems of the human language apparatus: the lexicon and the computing system, as well as two interfaces - phonetic and logical.

Chomsky's formal grammars have become classic for describing not only natural but also artificial languages ​​- in particular, programming languages. The development of structural linguistics in the second half of the 20th century can rightfully be considered a "Chomskian revolution".

Moscow Phonological School, whose representatives were A.A. Reformatsky, V.N. Sidorov, P.S. Kuznetsov, A.M. Sukhotin, R.I. Avanesov, used a similar theory to study phonetics. Gradually, "exact" methods are beginning to be applied with regards not only to phonetics, but also to syntax. Both linguists and mathematicians, both here and abroad, are beginning to study the structure of the language. In the 1950s and 60s, a new stage in the interaction between mathematics and linguistics began in the USSR, associated with the development of machine translation systems.

The impetus for the beginning of these works in our country was the first developments in the field of machine translation in the United States (although the first mechanized translation device by P.P. Smirnov-Troyansky was invented in the USSR back in 1933, it, being primitive, did not become widespread). In 1947, A. Butt and D. Britten came up with a code for word-by-word translation using a computer; a year later, R. Richens proposed a rule for splitting words into stems and endings in machine translation. Those years were quite different from today. These were very large and expensive machines that took up entire rooms and required a large staff of engineers, operators and programmers to maintain them. Basically, these computers were used to carry out mathematical calculations for the needs of military institutions - the new in mathematics, physics and technology served, first of all, military affairs. In the early stages, the development of the MP was actively supported by the military, while (in the conditions of the Cold War) the Russian-English direction developed in the USA, and the Anglo-Russian direction in the USSR.

In January 1954, the "Georgetown Experiment" took place at the Massachusetts Technical University - the first public demonstration of translation from Russian into English on the IBM-701 machine. Abstract of the message about the successful passage of the experiment, made by D.Yu. Panov, appeared in the RJ "Mathematics", 1954, No. 10: "Translation from one language to another using a machine: a report on the first successful test."

D. Yu. Panov (at that time director of the Institute of Scientific Information - INI, later VINITI) attracted I. K. Belskaya, who later headed the machine translation group at the Institute of Precise Mathematics and Computer Engineering of the USSR Academy of Sciences, to work on machine translation. By the end of 1955, the first experience of translating from English into Russian with the help of the BESM machine dates back. Programs for BESM were compiled by N.P. Trifonov and L.N. Korolev, whose Ph.D. thesis was devoted to methods for constructing dictionaries for machine translation.

In parallel, work on machine translation was carried out at the Department of Applied Mathematics of the Mathematical Institute of the USSR Academy of Sciences (now the M.V. Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences). At the initiative of the mathematician A.A. Lyapunov. He involved O.S. Kulagin and her students T.D. Wentzel and N.N. Ricco. The ideas of Lyapunov and Kulagina about the possibility of using technology to translate from one language into another were published in the journal Nature, 1955, No. 8. From the end of 1955, T.N. Moloshnaya, who then began independent work on the English-Russian translation algorithm.

R. Frumkina, who at that time was engaged in the translation algorithm from Spanish, recalls that at this stage of the work it was difficult to take any consistent steps. Much more often I had to follow the heuristic experience - my own or colleagues.

However, the first generation of machine translation systems were very imperfect. All of them were based on sequential translation algorithms "word by word", "phrase by phrase" - semantic connections between words and sentences were not taken into account in any way. For example, the sentences are: John was looking for his toy box.Finally he found it. The box was in the pen.John was very happy. (John was looking for his toy box. Finally he found it. The box was in the playpen. John was very happy.).” "Pen" in this context is not a "pen" (writing tool), but a "playpen" ( play-pen). Knowledge of synonyms, antonyms and figurative meanings is difficult to enter into a computer. A promising direction was the development of computer systems focused on the use of a human translator.

Over time, direct translation systems were replaced by T-systems (from the English word "transfer" - transformation), in which translation was carried out at the level of syntactic structures. The algorithms of T-systems used a mechanism that made it possible to build a syntactic structure according to the grammar rules of the language of the input sentence (similar to how a foreign language is taught in high school), and then synthesize the output sentence by transforming the syntactic structure and substituting the necessary words from the dictionary.

Lyapunov talked about translation by extracting the meaning of the translated text and presenting it in another language. The approach to building machine translation systems based on obtaining the semantic representation of the input sentence by semantic analysis and synthesis of the input sentence according to the obtained semantic representation is still considered the most perfect. Such systems are called I-systems (from the word "interlingua"). However, the task of creating them, set back in the late 50s and early 60s, has not been fully resolved so far, despite the efforts of the International Federation of IFIP - the world community of scientists in the field of information processing.

Scientists thought about how to formalize and build algorithms for working with texts, what dictionaries should be entered into the machine, what linguistic patterns should be used in machine translation. Traditional linguistics did not have such ideas - not only in terms of semantics, but also in terms of syntax. At that time, there were no lists of syntactic constructions for any language, the conditions for their compatibility and interchangeability were not studied, the rules for constructing large units of syntactic structure from smaller constituent elements were not developed.

The need to create the theoretical foundations of machine translation led to the formation and development of mathematical linguistics. The leading role in this matter in the USSR was played by the mathematicians A.A. Lyapunov, O.S. Kulagina, V.A. Uspensky, linguists V.Yu. Rosenzweig, P.S. Kuznetsov, R.M. Frumkina, A.A. Reformatsky, I.A. Melchuk, V.V. Ivanov. Kulagina's dissertation was devoted to the study of the formal theory of grammars (simultaneously with N. Khomsky in the USA), Kuznetsov put forward the task of axiomatization of linguistics, which goes back to the works of F.F. Fortunatov.

On May 6, 1960, the Decree of the Presidium of the USSR Academy of Sciences "On the development of structural and mathematical methods for the study of language" was adopted, and corresponding divisions were created at the Institute of Linguistics and the Institute of the Russian Language. Since 1960, the country's leading humanitarian universities - the Faculty of Philology of Moscow State University, Leninrad, Novosibirsk Universities, Moscow State Institute of Foreign Languages ​​- began training personnel in the field of automatic text processing.

However, machine translation works of this period, called "classical", are of theoretical rather than practical interest. Cost-effective machine translation systems began to be created only in the eighties of the last century. I will talk about this later in Section 2.1, Machine Translation.

The 1960s - 70s include deep theoretical developments using the methods of set theory and mathematical logic, such as field theory and fuzzy set theory.

The author of field theory in linguistics was the Soviet poet, translator and linguist V.G. Admoni. He initially developed his theory on the basis of the German language. For Admoni, the concept of "field" denotes an arbitrary non-empty set of linguistic elements (for example, "lexical field", "semantic field").

The structure of the field is heterogeneous: it consists of a core, the elements of which have a complete set of features that define a set, and a periphery, the elements of which can have both the features of a given set (not all) and neighboring ones. I will give an example illustrating this statement: for example, in English, the field of compound words (“day-dream” - “dream” is difficult to separate from the field of phrases (“tear gas” - “tear gas”).

The theory of fuzzy sets already mentioned above is closely related to field theory. In the USSR, linguists V.G. Admoni, I.P. Ivanova, G.G. Pochentsov, however, its ancestor was the American mathematician L. Zadeh, who in 1965 published the article “Fuzzy Logic”. Giving a mathematical justification for the theory of fuzzy sets, Zade considered them on the basis of linguistic material.

In this theory, we are talking not so much about the belonging of elements to a given set (АОа), as about the degree of this membership (mАОа), since peripheral elements can belong to several fields to one degree or another. Zade (Lofti-zade) was a native of Azerbaijan, until the age of 12 he had the practice of communicating in four languages ​​- Azerbaijani, Russian, English and Persian - and used three different alphabets: Cyrillic, Latin, Arabic. When a scientist is asked what is common between fuzzy set theory and linguistics, he does not deny this connection, but clarifies: “I am not sure that the study of these languages ​​\u200b\u200bhas had a big impact on my thinking. If this was the case, then only subconsciously. In his youth, Zadeh studied at a Presbyterian school in Tehran, and after World War II he emigrated to the United States. “The question is not whether I am an American, Russian, Azerbaijani or someone else,” he said in one of the conversations, “I am shaped by all these cultures and peoples and feel quite comfortable among each of them.” In these words there is something akin to what characterizes the theory of fuzzy sets - a departure from unambiguous definitions and sharp categories.

In our country, in the 70s, the works of Western linguists of the twentieth century were translated and studied. I.A. Melchuk translated the works of N. Chomsky into Russian. ON THE. Slyusareva in her book "The Theory of F. de Saussure in the Light of Modern Linguistics" connects the postulates of Saussure's teaching with the actual problems of linguistics of the 70s. There is a tendency towards further mathematization of linguistics. The leading domestic universities are training personnel in the specialty "Mathematical (theoretical, applied) linguistics". At the same time in the West there is a sharp leap in the development of computer technology, which requires more and more new linguistic foundations.

In the 1980s, Professor of the Institute of Oriental Studies of the Academy of Sciences Yu.K. Lekomtsev, while analyzing the language of linguistics through the analysis of schemes, tables and other types of notation used in linguistic descriptions, considers mathematical systems suitable for these purposes (mainly systems of matrix algebra).

Thus, throughout the twentieth century, there was a convergence of the exact and humanities. The interaction of mathematics with linguistics increasingly found practical applications. More on this in the next chapter.

Chapter 2. Selected examples of the use of mathematics in linguistics

2.1 Machine translation

The idea of ​​translating from one language into another with the help of a universal mechanism arose several centuries before the first developments in this area began - back in 1649, Rene Descartes proposed the idea of ​​a language in which the equivalent ideas of different languages ​​would be expressed by one symbol. The first attempts to implement this idea in the 1930s-40s, the beginning of theoretical developments in the middle of the century, the improvement of translation systems with the help of technology in the 1970s-80s, the rapid development of translation technology in the last decade - these are the stages in the development of machine translation as an industry. It is from the works on machine translation that computer linguistics as a science has grown.

With the development of computer technology in the late 70s and early 80s, researchers set themselves more realistic and cost-effective goals - the machine became not a competitor (as was previously assumed), but an assistant to a human translator. Machine translation ceases to serve exclusively military tasks (all Soviet and American inventions and research, focused primarily on Russian and English, contributed to the Cold War in one way or another). In 1978, natural language words were transmitted over the Arpa interconnected network, and six years later, the first microcomputer translation programs appeared in the United States.

In the 70s, the Commission of the European Communities buys the English-French version of the Systran computer translator, ordering also the French-English and Italian-English versions, and the Russian-to-English translation system used by the US Armed Forces. This is how the foundations of the EUROTRA project were laid.

About the revival of machine translation in the 70-80s. The following facts testify: the Commission of the European Communities (CEC) buys the English-French version of Systran, as well as the translation system from Russian into English (the latter developed after the ALPAC report and continued to be used by the US Air Force and NASA); in addition, the CEC orders the development of the French-English and Italian-English versions. Simultaneously, there is a rapid expansion of machine translation activities in Japan; in the USA, the Pan American Health Organization (PAHO) orders the development of a Spanish-English direction (SPANAM system); The US Air Force is funding the development of a machine translation system at the Linguistic Research Center at the University of Texas at Austin; The TAUM group in Canada is making notable progress in developing their METEO (meteorological translation) system. A number of projects started in the 70s and 80s. subsequently developed into full-fledged commercial systems.

During the period 1978-93, 20 million dollars were spent on research in the field of machine translation in the USA, 70 million in Europe, and 200 million in Japan.

One of the new developments is the TM (translation memory) technology, which works on the principle of accumulation: during the translation process, the original segment (sentence) and its translation are saved, resulting in the formation of a linguistic database; if an identical or similar segment is found in the newly translated text, it is displayed along with the translation and an indication of the percentage match. The translator then makes a decision (to edit, reject or accept the translation), the result of which is stored by the system, so there is no need to translate the same sentence twice. A well-known commercial system based on TM technology is currently developed by the TRADOS system (founded in 1984).

Currently, several dozen companies are developing commercial machine translation systems, including: Systran, IBM, L&H (Lernout & Hauspie), Transparent Language, Cross Language, Trident Software, Atril, Trados, Caterpillar Co., LingoWare; Ata Software; Linguistica b.v. and others. Now you can use the services of automatic translators directly on the Web: alphaWorks; PROMT's Online Translator; LogoMedia.net; AltaVista's Babel Fish Translation Service; InfiniT.com; Translating the Internet.

Commercially effective translation systems appeared in the second half of the 80s in our country as well. The very concept of machine translation has expanded (it began to include “the creation of a number of automatic and automated systems and devices that automatically or semi-automatically perform the entire translation cycle or individual tasks in a dialogue with a person”), and government appropriations for the development of this industry have increased.

Russian, English, German, French and Japanese became the main languages ​​of domestic translation systems. The All-Union Translation Center (VTsP) developed a system for translating from English and German into Russian on a computer ES-1035-ANRAP. It consisted of three dictionaries - input English and German and output Russian - under a single software. There were several interchangeable specialized dictionaries - for computer technology, programming, radio electronics, mechanical engineering, agriculture, metallurgy. The system could work in two modes - automatic and interactive, when the screen displayed the source text and translation per phrase, which a person could edit. The speed of translating text into ANRAP (from the beginning of typing to the end of printing) was approximately 100 pages per hour.

In 1989, a family of commercial translators of the SPRINT type was created, working with Russian, English, German and Japanese. Their main advantage was their compatibility with the IBM PC - thus, domestic machine translation systems reached the international level of quality. At the same time, a system of machine translation from French into Russian FRAP is being developed, which includes 4 stages of text analysis: graphematic, morphological, syntactic and semantic. In LGPI them. Herzen, work was underway on a four-language (English, French, Spanish, Russian) SILOD-MP system (English-Russian and Franco-Russian dictionaries were used in industrial mode.

For specialized translation of texts on electrical engineering, the ETAP-2 system existed. The analysis of the input text in it was carried out at two levels - morphological and syntactic. The ETAP-2 dictionary contained about 4 thousand entries; the stage of text transformation - about 1000 rules (96 general, 342 private, the rest are dictionary). All this ensured a satisfactory quality of translation (say, the title of the patent "Optical phase grid arrangement and coupling device having such an arrangement" was translated as "An optical phase grid device and a connecting device with such a device" - despite the tautology, the meaning is preserved).

At the Minsk Pedagogical Institute of Foreign Languages, on the basis of the English-Russian dictionary of word forms and phrases, a system for machine translation of titles was invented, at the Institute of Oriental Studies of the Academy of Sciences - a system for translating from Japanese into Russian. The first automatic vocabulary and terminology service (SLOTHERM) for computing and programming, created at the Moscow Research Institute of Automation Systems, contained approximately 20,000 terms in an explanatory dictionary and special dictionaries for linguistic research.

Machine translation systems gradually began to be used not only for their intended purpose, but also as an important component of automatic learning systems (for teaching translation, checking spelling and grammatical knowledge).

The 90s brought with it the rapid development of the PC market (from desktop to pocket) and information technology, the widespread use of the Internet (which is becoming more international and multilingual). All this made the further development of automated translation systems in demand. Since the early 1990s Domestic developers are also entering the PC systems market.

In July 1990, the first commercial machine translation system in Russia called PROMT (PROgrammer's Machine Translation) was presented at the PC Forum in Moscow. PROMT won the NASA competition for the supply of MP systems (PROMT was the only non-American company in this competition). In 1992, PROMT released a whole family of systems under the new name STYLUS for translation from English, German, French, Italian and Spanish into Russian and from Russian into English, and the world's first machine translation system for Windows was created on the basis of STYLUS in 1993. In 1994, STYLUS 2.0 was released for Windows 3.X/95/NT, and in 1995-1996 the third generation of machine translation systems, fully 32-bit STYLUS 3.0 for Windows 95/NT, while the development of completely new, world's first Russian-German and Russian-French machine translation systems was successfully completed.

In 1997, an agreement was signed with the French company Softissimo on the creation of translation systems from French into German and English and vice versa, and in December of this year, the world's first German-French translation system was released. In the same year, the PROMT company released a system implemented using the Giant technology, which supports several language directions in one shell, as well as a special translator for working on the Internet WebTranSite.

In 1998, a whole constellation of programs was released under the new name PROMT 98. A year later, PROMT released two new products: a unique software package for working on the Internet - PROMT Internet, and a translator for corporate mail systems - PROMT Mail Translator. In November 1999, PROMT was recognized as the best machine translation system tested by the French magazine PC Expert, outperforming its competitors by 30 percent. Special server solutions have also been developed for corporate clients - the corporate translation server PROMT Translation Server (PTS) and the Internet solution PROMT Internet Translation Server (PITS). In 2000, PROMT updated its entire line of software products by releasing a new generation of MT systems: PROMT Translation Office 2000, PROMT Internet 2000 and Magic Gooddy 2000.

Online translation with the support of the PROMT system is used on a number of domestic and foreign sites: PROMT's Online Translator, InfiniT.com, Translate.Ru, Lycos, etc., as well as in institutions of various profiles for translating business documents, articles and letters (there are translation systems built directly into Outlook Express and other email clients).

Nowadays, new machine translation technologies are emerging based on the use of artificial intelligence systems and statistical methods. About the latter - in the next section.

2.2 Statistical methods in language learning

Considerable attention in modern linguistics is given to the study of linguistic phenomena using the methods of quantitative mathematics. Quantitative data often help to more deeply comprehend the phenomena under study, their place and role in the system of related phenomena. The answer to the question "how much" helps to answer the questions "what", "how", "why" - such is the heuristic potential of a quantitative characteristic.

Statistical methods play a significant role in the development of machine translation systems (see Section 2.1). In the statistical approach, the translation problem is considered in terms of a noisy channel. Imagine that we need to translate a sentence from English into Russian. The noisy channel principle offers us the following explanation of the relationship between an English and a Russian sentence: an English sentence is nothing but a Russian sentence distorted by some kind of noise. In order to recover the original Russian sentence, we need to know what people usually say in Russian and how Russian phrases are distorted into English. The translation is carried out by searching for such a Russian sentence that maximizes the products of the unconditional probability of the Russian sentence and the probability of the English sentence (original) given the given Russian sentence. According to Bayes' theorem, this Russian sentence is the most likely translation of English:

where e is the translation sentence and f is the original sentence

So we need a source model and a channel model, or a language model and a translation model. The language model must assign a probability score to any sentence in the target language (in our case, Russian), and the translation model to the original sentence. (see table 1)

In general, a machine translation system operates in two modes:

1. System training: a training corpus of parallel texts is taken, and using linear programming, such values ​​of translation correspondence tables are searched for that maximize the probability of (for example) the Russian part of the corpus with the available English according to the selected translation model. A model of the Russian language is built on the Russian part of the same corpus.

2. Exploitation: based on the obtained data for an unfamiliar English sentence, a Russian is searched that maximizes the product of the probabilities assigned by the language model and the translation model. The program used for such a search is called a decoder.

The simplest statistical translation model is the literal translation model. In this model, it is assumed that to translate a sentence from one language to another, it is enough to translate all the words (create a “bag of words”), and the model will provide their placement in the correct order. To reduce P(a, f | e) to P(a | e , f), i.e. probabilities of a given alignment given a pair of sentences, each probability P(a, f | e) is normalized by the sum of the probabilities of all alignments of a given pair of sentences:

The implementation of the Viterbi algorithm used to train Model #1 is as follows:

1. The entire table of translation correspondence probabilities is filled with the same values.

2. For all possible variants of pairwise connections of words, the probability P(a, f | e) is calculated:

3. The values ​​of P(a, f | e) are normalized to obtain the values ​​of P(a | e, f).

4. The frequency of each translation pair is calculated, weighted by the probability of each alignment option.

5. The resulting weighted frequencies are normalized and form a new table of translation correspondence probabilities

6. The algorithm is repeated from step 2.

Consider, as an example, the training of a similar model on a corpus of two pairs of sentences (Fig. 2):

White House

  • House/House

After a large number of iterations, we will get a table (Table 2), which shows that the translation is carried out with high accuracy.

Also, statistical methods are widely used in the study of vocabulary, morphology, syntax, and style. Scientists from Perm State University conducted a study based on the assertion that stereotypical phrases are an important "building material" of the text. These phrases consist of "nuclear" repeated words and dependent words-specifiers and have a pronounced stylistic coloring.

In the scientific style, "nuclear" words can be called: research, study, task, problem, question, phenomenon, fact, observation, analysis etc. In journalism, other words will be “nuclear”, which have an increased value specifically for the text of the newspaper: time, person, power, business, action, law, life, history, place etc. (total 29)

Of particular interest to linguists is also the professional differentiation of the national language, the peculiarity of the use of vocabulary and grammar, depending on the type of occupation. It is known that drivers in professional speech use the form w about fer, the medics say k about club instead of cocktail Yu sh - such examples can be given. The task of statistics is to track the variability of pronunciation and the change in the language norm.

Professional differences lead to differences not only grammatical, but also lexical. Yakut State University named after M.K. Ammosov, 50 questionnaires were analyzed with the most common reactions to certain words among physicians and builders (Table 3).

Builders

Human

patient (10), personality (5)

man (5)

good

help (8), help (7)

evil (16)

a life

death (10)

lovely (5)

death

corpse (8)

life (6)

the fire

heat (8), burn (6)

fire (7)

finger

hand (14), panaritium (5)

large (7), index (6)

eyes

vision (6), pupil, ophthalmologist (5 each)

brown (10), large (6)

head

mind (14), brains (5)

big (9), smart (8), smart (6)

lose

consciousness, life (4 each)

money (5), find (4)

It can be noted that physicians more often than builders give associations related to their professional activities, since the stimulus words given in the questionnaire have more to do with their profession than with the profession of a builder.

Statistical regularities in a language are used to create frequency dictionaries - dictionaries that give numerical characteristics of the frequency of words (word forms, phrases) of any language - the language of the writer, any work, etc. Usually, the frequency of occurrence of a word is used as a characteristic of the frequency of occurrence of a word in the text of a certain volume

The model of speech perception is impossible without a dictionary as its essential component. In the perception of speech, the basic operational unit is the word. From this it follows, in particular, that each word of the perceived text must be identified with the corresponding unit of the listener's (or reader's) internal vocabulary. It is natural to assume that from the very beginning the search is limited to some subdomains of the dictionary. According to most modern theories of speech perception, the actual phonetic analysis of the sounding text in a typical case provides only some partial information about the possible phonological appearance of the word, and this kind of information corresponds to not one, but a certain MANY words of the dictionary; Therefore, two problems arise:

(a) select the appropriate set according to certain parameters;

(b) within the bounds of the outlined set (if it is allocated adequately) to "eliminate" all words, except for the only one that best corresponds to the given word of the recognized text. One of the "dropout" strategies is to exclude low-frequency words. It follows that the vocabulary for speech perception is a frequency dictionary. It is the creation of a computer version of the frequency dictionary of the Russian language that is the initial task of the presented project.

Based on the material of the Russian language, there are 5 frequency dictionaries (not counting branch dictionaries). Let us note only some general shortcomings of the existing dictionaries.

All known frequency dictionaries of the Russian language are based on processing arrays of written (printed) texts. Partly for this reason, when the identity of a word is largely based on formal, graphic coincidence, semantics is not sufficiently taken into account. As a result, the frequency characteristics are also shifted, distorted; for example, if the compiler of the frequency dictionary includes words from the combination "each other" in the general statistics of the use of the word "friend", then this is hardly justified: given the semantics, we must admit that these are already different words, or rather, that an independent dictionary unit is just the combination as a whole.

Also, in all existing dictionaries, words are placed only in their basic forms: nouns in the singular form, nominative case, verbs in the infinitive form, etc. Some of the dictionaries provide information about the frequency of word forms, but usually they do not do it consistently enough, not in an exhaustive way. The frequencies of different word forms of the same word obviously do not match. The developer of a speech perception model must take into account that in a real perceptual process, it is precisely a specific word form that is “immersed” in the text that is subject to recognition: based on the analysis of the initial section of the exponent of the word form, a set of words with an identical beginning is formed, and the initial section of the word form is not necessarily identical to the initial section of the dictionary form . It is the word form that has a specific rhythmic structure, which is also an extremely important parameter for the perceptual selection of words. Finally, in the final representation of the recognized utterance, again, the words are represented by the corresponding word forms.

There are many works that demonstrate the importance of frequency in the process of speech perception. But we are not aware of works where the frequency of word forms would be used - on the contrary, all authors practically ignore the frequency of individual word forms, referring exclusively to lexemes. If the results obtained by them are not considered artifacts, one has to assume that the native speaker somehow has access to information about the ratio of the frequencies of word forms and dictionary forms, i.e., in fact, lexemes. Moreover, such a transition from a word form to a lexeme, of course, cannot be explained by natural knowledge of the corresponding paradigm, since frequency information must be used before the final identification of the word, otherwise it simply loses its meaning.

According to the primary statistical characteristics, it is possible to determine with a given relative error that part of the vocabulary, which includes words with a high frequency of occurrence, regardless of the type of text. It is also possible, by introducing stepwise ordering into the dictionary, to obtain a series of dictionaries covering the first 100, 1000, 5000, etc. of frequent words. The statistical characteristics of the dictionary are of interest in connection with the semantic analysis of vocabulary. The study of subject-ideological groups and semantic fields shows that lexical associations are supported by semantic links that are concentrated around lexemes with the most common meaning. The description of meanings within the lexico-semantic field can be carried out by identifying words with the most abstract lexemes in meaning. Apparently, "empty" (from the point of view of nominative potencies) dictionary units constitute a statistically homogeneous layer.

Vocabularies for individual genres are no less valuable. Studying the measure of their similarity and the nature of statistical distributions will provide interesting information about the qualitative stratification of vocabulary depending on the sphere of speech use.

Compilation of large frequency dictionaries requires the use of computer technology. The introduction of partial mechanization and automation in the process of working on a dictionary is of interest as an experiment in the machine processing of dictionaries for different texts. Such a dictionary requires a more rigorous system for processing and accumulating vocabulary material. In miniature, this is an information retrieval system that is able to provide information about various aspects of the text and vocabulary. Some basic requests to this system are planned from the very beginning: the total number of inventory words, the statistical characteristics of a single word and entire dictionaries, the ordering of frequent and rare zones of the dictionary, etc. The machine card file allows you to automatically build reverse dictionaries for individual genres and sources. Many other useful statistical information about the language will be extracted from the accumulated array of information. The computer frequency dictionary creates an experimental basis for the transition to a more extensive automation of vocabulary work.

The statistical data of frequency dictionaries can also be widely used in solving other linguistic problems - for example, in analyzing and determining the active means of word formation of the modern Russian language, solving issues of improving graphics and spelling, which are related to taking into account statistical information about the vocabulary (it is important to take into account probabilistic characteristics of grapheme combinations, types of letter combinations realized in words), practical transcription and transliteration. The statistical parameters of the dictionary will also be useful in solving problems of automating typing, recognition and automatic reading of literal text.

Modern explanatory dictionaries and grammars of the Russian language are mainly built on the basis of literary and artistic texts. There are frequency dictionaries of the language of A.S. Pushkin, A.S. Griboedova, F.M. Dostoevsky, V.V. Vysotsky and many other authors. At the Department of History and Theory of Literature of the Smolensk State. Pedagogical University has been working for a number of years to compile frequency dictionaries of poetic and prose texts. For this study, frequency dictionaries of all the lyrics of Pushkin and two more poets of the golden age - "Woe from Wit" by Griboyedov and all of Lermontov's poetry were selected; Pasternak and five other poets of the Silver Age - Balmont 1894-1903, "Poems about the Beautiful Lady" by Blok, "Stone" by Mandelstam, "Pillar of Fire" by Gumilyov, "Anno Domini MCMXXI" by Akhmatova and "Sisters of My Life" by Pasternak and four more poets of the Iron Age - "Poems by Yuri Zhivago", "When it clears up", the entire corpus of lyrics by M. Petrovs, "The road is far away", "Windscreen", "Farewell to the snow" and "Horseshoes" by Mezhirov, "Antimirov" by Voznesensky and "Snowballs » Rylenkova.

It should be noted that these dictionaries are different in nature: some represent the vocabulary of one dramatic work, others - books of lyrics, or several books, or the entire corpus of the poet's poems. The results of the analysis presented in this paper should be taken with caution, they cannot be taken as an absolute. However, with the help of special measures, the difference in the ontological nature of texts can be reduced to a certain extent.

In recent years, the opposition between colloquial and book speech has become more and more clearly realized. This issue is especially sharply discussed among methodologists who demand a turn in teaching towards the spoken language. However, the specificity of colloquial speech still remains unexplained.

Dictionaries were processed by creating a user application in the environment of the EXCEL97 office program. The application includes four worksheets of the EXCEL book - "Title Sheet", "Dictionaries" sheet with initial data, "Proximities" and "Distances" with results, as well as a set of macros.

The initial information is entered on the "Dictionaries" sheet. Dictionaries of the studied texts are written into EXCEL cells, the last column S is formed from the results obtained and is equal to the number of words found in other dictionaries. The tables "Proximity" and "Distances" contain calculated measures of proximity M, correlation R and distance D.

Application macros are event-based programming procedures written in Visual Basic for Application (VBA). Procedures are based on VBA library objects and their processing methods. So, for operations with worksheets of the application, the key object Worksheet (worksheet) and the corresponding method of activating the sheet Activate (activate) are used. Setting the range of the analyzed source data on the Dictionary sheet is performed by the Select method of the Range object (range), and the transfer of words as values ​​to variables is performed as the Value property (value) of the same Range object.

Despite the fact that rank correlation analysis makes us cautious about the dependence of topics between different texts, most of the most frequent words in each text have matches in one or more other texts. Column S shows the number of such words among the 15 most frequent words for each author. Words in bold type appear only in one poet's words in our table. Blok, Akhmatova and Petrovs have no highlighted words at all, they have S = 15. These three poets have the same 15 most frequent words, they differ only in the place in the list. But even Pushkin, whose vocabulary is the most original, has S = 8, and there are 7 highlighted words.

The results show that there is a certain layer of vocabulary that concentrates the main themes of poetry. As a rule, these words are short: out of the total number (225) of single-syllable word usages 88, two-syllable 127, three-syllable 10. Often these words represent the main mythologemes and can fall into pairs: night - day, earth - sky (sun), God - man (people), life - death, body - soul, Rome - world(at Mandelstam); can be combined into mythologems of a higher level: sky, star, sun, earth; in a person, as a rule, the body, heart, blood, arm, leg, cheek, eyes stand out. Of the human states, preference is given to sleep and love. The house and cities belong to the human world - Moscow, Rome, Paris. Creativity is represented by lexemes word and song.

Griboedov and Lermontov have almost no words denoting nature among the most frequent words. They have three times as many words denoting a person, parts of his body, elements of his spiritual world. Pushkin and poets of the twentieth century. designations of man and nature are approximately equal. In this important aspect of the subject, we can say that the twentieth century. followed Pushkin.

Minimal Theme case among the most frequent words, it is found only in Griboyedov and Pushkin. Lermontov and poets of the twentieth century. it gives way to a minimal theme word. The word does not exclude deeds (the biblical interpretation of the topic: in the New Testament, all the teachings of Jesus Christ are regarded as the word of God or the word of Jesus, and the apostles sometimes call themselves ministers of the Word). The sacred meaning of the lexeme word is convincingly manifested, for example, in Pasternak's verse "And the image of the world, revealed in the Word." The sacred meaning of the lexeme word in conjunction with and contrast with human affairs, it is convincingly manifested in the poem of the same name by Gumilyov.

Tokens that are found only in one text characterize the originality of a given book or a collection of books. For example, the word "mind" is the most frequent in Griboedov's comedy "Woe from Wit" - but it does not occur among the frequency words of other texts. The theme of the mind is by far the most significant in comedy. This lexeme accompanies the image of Chatsky, and the name of Chatsky is the most frequent in comedy. Thus, the work organically combines the most frequent common noun with the most frequent proper name.

The highest correlation coefficient connects the themes of the tragic books "The Pillar of Fire" by Gumilyov and "Anno Domini MCMXXI" by Akhmatova. Among the 15 most frequent nouns, there are 10 common ones, including blood, heart, soul, love, word, sky. Recall that Akhmatova's book included a miniature "You will not be alive ...", written between the arrest of Gumilyov and his execution.

The themes of the candle and the crowd in the studied material are found only in the "Poems of Yuri Zhivago". The theme of the candle in the verses from the novel has many contextual meanings: it is associated with the image of Jesus Christ, with the themes of faith, immortality, creativity, love date. The candle is the most important source of light in the central scenes of the novel. The theme of the crowd develops in connection with the main idea of ​​the novel, in which the private life of a person with its unshakable values ​​is opposed to the immorality of the new state, built on the principles of pleasing the crowd.

The work also involves the third stage, also reflected in the program - this is the calculation of the difference in the ordinal numbers of words common to two dictionaries and the average distance between the same words of two dictionaries. This stage allows moving from the general trends in the interaction of dictionaries identified with the help of statistics to a level approaching the text. For example, the books of Gumilyov and Akhmatova correlate statistically significantly. We look at which words turned out to be common for their dictionaries, and, first of all, we choose those whose serial numbers differ minimally or equal to zero. It is these words that have the same rank number and, consequently, it is these minimal themes in the minds of the two poets that are equally important. Next, you should move to the level of texts and contexts.

Quantitative methods also help to study the characteristics of peoples - native speakers. Say, there are 6 cases in Russian, there are no cases in English, and in some languages ​​of the peoples of Dagestan, the number of cases reaches 40. L. Perlovsky in his article “Consciousness, Language and Culture” correlates these characteristics with the tendency of peoples to individualism or collectivism, with perception of things and phenomena separately or in connection with others. After all, it was in the English-speaking world (there are no cases - the thing is perceived “by itself”) that such concepts as individual freedom, liberalism and democracy appeared (I note that I use these concepts only in connection with the language, without any evaluative characteristics). Despite the fact that such guesses still remain only at the level of bold scientific hypotheses, they help to look at already familiar phenomena in a new way.

As we can see, quantitative characteristics can be applied in completely different areas of linguistics, which increasingly blurs the boundaries between "exact" and "humanitarian" methods. Linguistics is increasingly resorting to the help of not only mathematics, but also computer technology to solve its problems.

2.3 Learning the language by methods of formal logic

With non-quantitative methods of mathematics, in particular, with logic, modern theoretical linguistics interacts no less fruitfully than with quantitative ones. The rapid development of computer technologies and the growth of their role in the modern world required a revision of the approach to the interaction of language and logic in general.

The methods of logic are widely used in the development of formalized languages, in particular, programming languages, the elements of which are some symbols (akin to mathematical), chosen (or constructed from previously selected symbols) and interpreted in a certain way, related to any "traditional" use, understanding and understanding. functions of the same symbols in other contexts. A programmer constantly deals with logic in his work. The meaning of programming is just to teach the computer to reason (in the broadest sense of the word). At the same time, the methods of "reasoning" turn out to be very different. Every programmer spends a certain amount of time looking for bugs in their own and other people's programs. That is, to search for errors in reasoning, in logic. And this also leaves its mark. It is much easier to detect logical errors in ordinary speech. The relative simplicity of the languages ​​studied by logicians allows them to elucidate the structures of these languages ​​more clearly than is achievable by linguists who analyze exclusively complex natural languages. In view of the fact that the languages ​​studied by logicians use relations copied from natural languages, logicians are able to make significant contributions to the general theory of language. The situation here is similar to that which takes place in physics: the physicist also formulates theorems for ideally simplified cases that do not occur in nature at all - he formulates laws for ideal gases, ideal liquids, talks about motion in the absence of friction, etc. For these idealized cases, simple laws can be established that would greatly contribute to the understanding of what really happens and what would probably remain unknown to physics if it tried to consider reality directly, in all its complexity.

In the study of natural languages, logical methods are used so that language learners can not stupidly “memorize” as many words as possible, but better understand its structure. L. Shcherba also used in his lectures an example of a sentence built according to the laws of the Russian language: “The glitched kuzdra shteko boked the bokra and curls the bokra,” and then asked the students what this meant. Despite the fact that the meaning of the words in the sentence remained unclear (they simply do not exist in Russian), it was possible to clearly answer: “kuzdra” is the subject, a feminine noun, in the singular, nominative case, “bokr” is animated, and etc. The translation of the phrase turns out to be something like this: “Something feminine in one go did something over some kind of male creature, and then began to do something long, gradual with its cub.” A similar example of a text (artistic) from non-existent words, built entirely according to the laws of the language, is Lewis Carroll's Jabberwock (in Alice in Wonderland, Carroll, through the mouth of his character Humpty Dumpty, explains the meaning of the words he invented: "cooked" - eight o'clock in the evening, when it's time to cook dinner, "chlivky" - flimsy and dexterous, "shorek" - a cross between a ferret, a badger and a corkscrew, "dive" - ​​jump, dive, spin, "nava" - grass under the sundial (extends a little to the right , a little to the left and a little back), “grunt” - grunt and laugh, “zelyuk” - a green turkey, “myumzik” - a bird; her feathers are disheveled and stick out in all directions, like a broom, “mova” - far from home) .

One of the main concepts of modern logic and theoretical linguistics, used in the study of languages ​​of various logico-mathematical calculus, natural languages, to describe the relationship between languages ​​of different "levels" and to characterize the relationship between the languages ​​under consideration and the subject areas described with their help, is the concept of metalanguage. A metalanguage is a language used to express judgments about another language, the language-object. With the help of a metalanguage, they study the structure of character combinations (expressions) of the language-object, prove theorems about its expressive properties, about its relation to other languages, etc. The language being studied is also called the subject language in relation to this metalanguage. Both the subject language and the metalanguage can be ordinary (natural) languages. The metalanguage may differ from the object language (for example, in an English textbook for Russians, Russian is the metalanguage, and English is the object language), but it may also coincide with it or differ only partially, for example, in special terminology (Russian linguistic terminology is an element of the metalanguage to describe the Russian language, the so-called semantic factors are part of the metalanguage for describing the semantics of natural languages).

The concept of "metalinguage" has become very fruitful in connection with the study of formalized languages ​​that are built within the framework of mathematical logic. Unlike formalized subject languages, in this case the metalanguage, by means of which the metatheory is formulated (studying the properties of the subject theory formulated in the subject language), is, as a rule, an ordinary natural language, in some special way a limited fragment of a natural language that does not contain any kind of ambiguity. , metaphors, "metaphysical" concepts, etc. elements of ordinary language that prevent its use as a tool for accurate scientific research. At the same time, the metalanguage itself can be formalized and (regardless of this) become the subject of research carried out by means of the metametalanguage, and such a series can be “thought” as growing indefinitely.

Logic teaches us a fruitful distinction between the language-object and the metalanguage. The language-object is the very subject of logical research, and the metalanguage is that inevitably artificial language in which such research is conducted. Logical thinking just consists in formulating the relations and structure of a real language (object language) in the language of symbols (metalanguage).

The metalanguage must in any case be “not poorer” than its objective language (that is, for each expression of the latter in the metalanguage there must be its name, “translation”) - otherwise, if these requirements are not met (which certainly takes place in natural languages, if special agreements do not provide otherwise) semantic paradoxes (antinomies) arise.

As more and more new programming languages ​​were created, in connection with the problem of programming translators, there was an urgent need to create metalanguages. At present, the Backus-Naur form metalanguage (abbreviated as BNF) is the most commonly used for describing the syntax of programming languages. It is a compact form in the form of some formulas similar to mathematical ones. For each concept of the language there is a unique metaformula (normal formula). It consists of left and right parts. The left side specifies the concept being defined, and the right side specifies the set of admissible language constructs that are combined into this concept. The formula uses special metacharacters in the form of angle brackets, which contain the defined concept (in the left side of the formula) or a previously defined concept (in its right side), and the separation of the left and right parts is indicated by the "::=" metacharacter, the meaning of which is equivalent to the words "by definition there is". Metalinguistic formulas are embedded in translators in some form; with their help, the constructs used by the programmer are checked for formal compliance with any of the constructs that are syntactically valid in this language. There are also separate metalanguages ​​of various sciences - thus, knowledge exists in the form of various metalanguages.

Logical methods also served as the basis for the creation of artificial intelligence systems based on the concept of connectionism. Connectionism is a special trend in philosophical science, the subject of which is questions of knowledge. Within the framework of this trend, attempts are being made to explain the intellectual abilities of a person using artificial neural networks. Composed of a large number of structural units similar to neurons, with a weight assigned to each element that determines the strength of the connection with other elements, neural networks are simplified models of the human brain. Experiments with neural networks of this kind have demonstrated their ability to learn to perform tasks such as pattern recognition, reading, and identifying simple grammatical structures.

Philosophers began to take an interest in connectionism, as the connectionist approach promised to provide an alternative to the classical theory of the mind and the idea widely held within this theory that the workings of the mind are similar to the processing of symbolic language by a digital computer. This concept is very controversial, but in recent years it has found more and more supporters.

The logical study of language continues Saussure's concept of language as a system. The fact that it is constantly continuing confirms once again the boldness of scientific conjectures of the beginning of the last century. I will devote the last section of my work to the prospects for the development of mathematical methods in linguistics today.

2.4 Prospects for the application of mathematical methods in linguistics

In the era of computer technology, the methods of mathematical linguistics have received a new development perspective. The search for solutions to the problems of linguistic analysis is now increasingly being implemented at the level of information systems. At the same time, automation of the process of processing linguistic material, providing the researcher with significant opportunities and advantages, inevitably puts forward new requirements and tasks for him.

The combination of "exact" and "humanitarian" knowledge has become fertile ground for new discoveries in the field of linguistics, computer science and philosophy.

Machine translation from one language to another remains a rapidly growing branch of information technology. Despite the fact that computer-assisted translation can never be compared in quality to human translation (especially for literary texts), the machine has become an indispensable assistant to a person in translating large volumes of text. It is believed that in the near future more advanced translation systems will be created, based primarily on the semantic analysis of the text.

An equally promising direction is the interaction of linguistics and logic, which serves as a philosophical foundation for understanding information technology and the so-called "virtual reality". In the near future, work will continue on the creation of artificial intelligence systems - although, again, it will never be equal to the human in its capabilities. Such competition is meaningless: in our time, the machine should become (and becomes) not a rival, but an assistant to man, not something from the realm of fantasy, but part of the real world.

The study of the language by statistical methods continues, which makes it possible to more accurately determine its qualitative properties. It is important that the most daring hypotheses about language find their mathematical, and therefore logical, proof.

The most significant thing is that various branches of the application of mathematics in linguistics, previously quite isolated, in recent years have been correlated with each other, connecting into a coherent system, by analogy with the language system discovered a century ago by Ferdinand de Saussure and Yvan Baudouin de Courtenay. This is the continuity of scientific knowledge.

Linguistics in the modern world has become the foundation for the development of information technology. As long as computer science remains a rapidly developing branch of human activity, the union of mathematics and linguistics will continue to play its role in the development of science.

Conclusion

Over the 20th century, computer technologies have come a long way - from military to peaceful use, from a narrow range of goals to penetration into all branches of human life. Mathematics as a science found ever new practical significance with the development of computer technology. This process continues today.

The previously unthinkable "tandem" of "physicists" and "lyricists" has become a reality. For the full interaction of mathematics and computer science with the humanities, qualified specialists were required from both sides. While computer scientists are increasingly in need of systematic humanitarian knowledge (linguistic, cultural, philosophical) in order to comprehend changes in the reality around them, in the interaction of man and technology, to develop more and more new linguistic and mental concepts, to write programs, then any "Humanities" in our time for their professional growth must master at least the basics of working with a computer.

Mathematics, being closely interconnected with informatics, continues to develop and interact with natural sciences and the humanities. In the new century, the trend towards the mathematization of science is not weakening, but, on the contrary, is increasing. On the basis of quantitative data, the laws of the development of the language, its historical and philosophical characteristics are comprehended.

Mathematical formalism is most suitable for describing patterns in linguistics (as, indeed, in other sciences - both the humanities and the natural). The situation sometimes develops in science in such a way that without the use of an appropriate mathematical language, it is impossible to understand the nature of physical, chemical, etc. process is not possible. Creating a planetary model of the atom, the famous English physicist of the XX century. E. Rutherford experienced mathematical difficulties. At first, his theory was not accepted: it did not sound convincing, and the reason for this was Rutherford's ignorance of the theory of probability, on the basis of the mechanism of which it was only possible to understand the model representation of atomic interactions. Realizing this, already by that time an outstanding scientist, the owner of the Nobel Prize, enrolled in the seminar of the mathematician Professor Lamb and for two years, together with the students, attended a course and worked out a workshop on the theory of probability. Based on it, Rutherford was able to describe the behavior of the electron, giving his structural model convincing accuracy and gaining recognition. The same is with linguistics.

This begs the question, what is so mathematical in objective phenomena, thanks to which they can be described in the language of mathematics, in the language of quantitative characteristics? These are homogeneous units of matter distributed in space and time. Those sciences that have gone farther than others towards the isolation of homogeneity, and turn out to be better suited for the use of mathematics in them.

The Internet, which rapidly developed in the 1990s, brought together representatives of various countries, peoples and cultures. Despite the fact that English continues to be the main language of international communication, the Internet has become multilingual in our time. This led to the development of commercially successful machine translation systems that are widely used in various fields of human activity.

Computer networks have become an object of philosophical reflection - more and more new linguistic, logical, worldview concepts have been created that help to understand "virtual reality". In many works of art, scenarios were created - more often pessimistic ones - about the dominance of machines over a person, and virtual reality - over the outside world. Far from always such forecasts turned out to be meaningless. Information technology is not only a promising industry for investing human knowledge, it is also a way to control information, and, consequently, over human thought.

This phenomenon has both a negative and a positive side. Negative - because control over information is contrary to the inalienable human right to free access to it. Positive - because the lack of this control can lead to catastrophic consequences for humanity. Suffice it to recall one of the wisest films of the last decade - "When the World Ends" by Wim Wenders, whose characters are completely immersed in the "virtual reality" of their own dreams recorded on a computer. However, not a single scientist and not a single artist can give an unambiguous answer to the question: what awaits science and technology in the future.

Focusing on the "future", sometimes seeming fantastic, was a distinctive feature of science in the mid-twentieth century, when inventors sought to create perfect models of technology that could work without human intervention. Time has shown the utopian nature of such research. However, it would be superfluous to condemn scientists for this - without their enthusiasm in the 1950s - 60s, information technology would not have made such a powerful leap in the 90s, and we would not have what we have now.

The last decades of the twentieth century changed the priorities of science - research, inventive pathos gave way to commercial interest. Again, this is neither good nor bad. This is a reality in which science is increasingly integrated into everyday life.

The 21st century has continued this trend, and in our time behind inventions are not only fame and recognition, but, first of all, money. This is also why it is important to ensure that the latest achievements of science and technology do not fall into the hands of terrorist groups or dictatorial regimes. The task is difficult to the point of impossibility; to realize it as much as possible is the task of the entire world community.

Information is a weapon, and weapons are no less dangerous than nuclear or chemical weapons - only it does not act physically, but rather psychologically. Humanity needs to think about what is more important for it in this case - freedom or control.

The latest philosophical concepts related to the development of information technologies and an attempt to comprehend them have shown the limitations of both natural-science materialism, which dominated during the 19th and early 20th centuries, and extreme idealism, which denies the significance of the material world. It is important for modern thought, especially the thought of the West, to overcome this dualism in thinking, when the surrounding world is clearly divided into material and ideal. The path to this is a dialogue of cultures, a comparison of different points of view on the surrounding phenomena.

Paradoxically, information technology can play an important role in this process. Computer networks, and especially the Internet, are not only a resource for entertainment and vigorous commercial activity, they are also a means of meaningful, controversial communication between representatives of various civilizations in the modern world, as well as for a dialogue between the past and the present. We can say that the Internet pushes the spatial and temporal boundaries.

And in the dialogue of cultures through information technology, the role of language as the oldest universal means of communication is still important. That is why linguistics, in interaction with mathematics, philosophy and computer science, has experienced its second birth and continues to develop today. The trend of the present will continue in the future - "until the end of the world", as 15 years ago, the same V. Wenders predicted. True, it is not known when this end will occur - but is it important now, because the future will sooner or later become the present anyway.

Appendix 1

Ferdinand de Saussure

The Swiss linguist Ferdinand de Saussure (1857-1913) is widely considered to be the founder of modern linguistics in its attempts to describe the structure of language rather than the history of particular languages ​​and language forms. In fact, the method of Structuralism in linguistics and literary studies and a significant branch of Semiotics find their major starting point in his work at the turn of the twentieth century. It has even been argued that the complex of strategies and conceptions that has come to be called "poststructuralism" - the work of Jacques Derrida, Michel Foucault, Jacques Lacan, Julia Kristeva, Roland Barthes, and others - is suggested by Saussure"s work in linguistics and anagrammatic readings of late Latin poetry. literary modernism to psychoanalysis and philosophy in the early twentieth century. As Algirdas Julien Greimas and Joseph Courtés argue in Semiotics and Language: An Analytic Dictionary, under the heading "Interpretation," a new mode of interpretation arose in the early twentieth century which they identify with Saussurean linguistics, Husserlian Phenomenology, and Freudian psychoanalysis. In this mode, "interpretation is no longer a matter of attributing a given content to a form which would otherwise lack one; rather, it is a paraphrase which formulates in another fashion the equivalent content of a signifying element within a given semiotic system" ( 159). in this understanding of "interpretation," form and content are not distinct; rather, every "form" is, alternatively, a semantic "content" as well, a "signifying form," so that interpretation offers an analogical paraphrase of something that already signifies within some other system of signification.

Such a reinterpretation of form and understanding - which Claude Lévi-Strauss describes in one of his most programmatic articulations of the concept of structuralism, in "Structure and Form: Reflections on a Work by Vladimir Propp" - is implicit in Saussure"s posthumous Course in General Linguistics (1916, trans., 1959, 1983).In his lifetime, Saussure published relatively little, and his major work, the Course, was the transcription by his students of several courses in general linguistics he offered in 1907-11. In the Course Saussure called for the "scientific" study of language as opposed to the work in historical linguistics that had been done in the nineteenth century. That work is one of the great achievements of Western intellect: taking particular words as the building blocks of language, historical (or "diachronic") linguistics traced the origin and development of Western languages ​​from a putative common language source, first an "Indo-European" language and then an earlier "p roto-Indo-European" language.

It is precisely this study of the unique occurrences of words, with the concomitant assumption that the basic "unit" of language is, in fact, the positive existence of these "word-elements," that Saussure questioned. His work was an attempt to reduce the mass of facts about language, studied so minutely by historical linguistics, to a manageable number of propositions. The "comparative school" of nineteenth-century Philology, Saussure says in the Course, "did not succeed in setting up the true science of linguistics" because "it failed to seek out the nature of its object of study" ( 3). That "nature," he argues, is to be found not simply in the "elemental" words that a language comprises - the seeming "positive" facts (or "substances") of language - but in the formal relationships that give rise to those "substances."

Saussure"s systematic reexamination of language is based upon three assumptions. The first is that the scientific study of language needs to develop and study the system rather than the history of linguistic phenomena. For this reason, he distinguishes between the particular occurrences of language - its particular "speech-events," which he designates as parole - and the proper object of linguistics, the system (or "code") governing those events, which he designates as langue. Such a systematic study, moreover, calls for a " synchronic" conception of the relationship among the elements of language at a particular instant rather than the "diachronic" study of the development of language through history.

This assumption gave rise to what Roman Jakobson in 1929 came to designate as "structuralism," in which "any set of phenomena examined by contemporary science is treated not as a mechanical agglomeration but as a structural whole the mechanical conception of processes yields to the question of their function" ("Romantic" 711). In this passage Jakobson is articulating Saussure"s intention to define linguistics as a scientific system as opposed to a simple, "mechanical" accounting of historical accidents. Along with this, moreover, Jakobson is also describing the second foundational assumption in Saussurean - we can now call it "structural" - linguistics: that the basic elements of language can only be studied in relation to their functions rather than in relation to their causes. European "words"), those events and entities have to be situated within a systemic framework in which they are related to other so-called events and entities. This is a radical reorientation in conceiving of experience and phenomena, one whose importance the philosopher Ernst Cassirer has compared to "the new science of Galileo which in the seventeenth century changed our whole concept of the physical world" (cited in Culler, Pursuit 2 4). This change, as Greimas and Courtés note, reconceives "interpretation" and thus reconceives explanation and understanding themselves. Instead of explanation "s being in terms of a phenomenon"s causes, so that, as an "effect," it is in some ways subordinate to its causes, explanation here consists in subordinating a phenomenon to its future-oriented "function" or "purpose." Explanation is no longer independent of human intentions or purposes (even though those intentions can be impersonal, communal, or, in Freudian terms, "unconscious").

In his linguistics Saussure accomplishes this transformation specifically in the redefinition of the linguistic "word," which he describes as the linguistic "sign" and defines in functionalist terms. The sign, he argues, is the union of "a concept and a sound image," which he called "signified and signifier " (66-67; Roy Harris"s 1983 translation offers the terms "signification" and "signal" ). The nature of their "combination" is "functional" in that neither the signified nor the signifier is the "cause" of the other; rather, "each its values ​​from the other" (8). element of language, the sign, relationally and makes the basic assumption of historical linguistics, namely, the identity of the elemental units of language and signification (i.e., "words"), subject to rigorous analysis. the word "tree" as the "same" word is not because the word is defined by inherent qualities - it is not a "mechanical agglomeration" of such qualities - but because it is defined as an element in a system, the "structural whole" ," of language.

Such a relational (or "diacritical") definition of an entity governs the conception of all the elements of language in structural linguistics. This is clearest in the most impressive achievement of Saussurean linguistics, the development of the concepts of the "phonemes" and "distinctive features" of language. Phonemes are the smallest articulated and signifying units of a language. They are not the sounds that occur in language but the "sound images" Saussure mentions, which are apprehended by speakers - phenomenally apprehended - as conveying meaning. (Thus, Elmar Holenstein describes Jakobson's linguistics, which follows Saussure in important ways, as "phenomenological structuralism.") It is for this reason that the leading spokesperson for Prague School Structuralism, Jan Mukarovsky, noted in 1937 that "structure . . . is a phenomenological and not an empirical reality; it is not the work itself, but a set of functional relationships which are located in the consciousness of a collective (generation, milieu, etc.)" (cited in Galan 35). Likewise, Lévi-Strauss, the leading spokesperson for French structuralism , noted in 1960 that "structure has no distinct content; it is content itself, and the logical organization in which it is arrested is conceived as a property of the real" (167; see also Jakobson, Fundamentals 27-28).

Phonemes, then, the smallest perceptible elements of language, are not positive objects but a "phenomenological reality." In English, for instance, the phoneme /t/ can be pronounced in many different ways, but in all cases an English speaker will recognize it as functioning as a /t/. An aspirated t (i.e., a t pronounced with an h-like breath after it), a high-pitched or low-pitched t sound, an extended t sound, and so on, will all function in the same manner in distinguishing the meaning of "to" and "do" in English. Moreover, the differences between languages ​​are such that phonological variations in one language can constitute distinct phonemes in another; thus, English distinguishes between /l/ and /r/, whereas other languages ​​are so structured that these articulations are considered variations of the same phoneme (like the aspirated and unaspirated t in English). In every natural language, the vast number of possible words is a combination of a small number of phonemes. English, for instance, possesses less than 40 phonemes that combine to form over a million different words.

The phonemes of language are themselves systematically organized structures of features. In the 1920s and 1930s, following Saussure "s lead, Jakobson and N. S. Trubetzkoy isolated the "distinctive features" of phonemes. These features are based upon the physiological structure of the speech organs - tongue, teeth, vocal chords, and so on - that Saussure mentions in the Course and that Harris describes as "physiological phonetics" ( 39; Baskin"s earlier translation uses the term "phonology" [(1959) 38]) - and they combine in "bundles" of binary oppositions to form phonemes. For instance, in English the difference between /t/ and /d/ is the presence or absence of "voice" (the engagement of the vocal chords), and on the level of voicing these phonemes reciprocally define one another. In this way, phonology is a specific example of a general rule of language described by Saussure: In language there are only differences. even more important: a difference generally implies positive terms between which the difference is set up; but in language there are only differences without positive terms. Whether we take the signified or the signifier, the language has neither ideas nor sounds that existed before the linguistic system. ( 120)

In this framework, linguistic identities are determined not by inherent qualities but by systemic ("structural") relationships.

I have said that phonology "followed the lead" of Saussure, because even though his analysis of the physiology of language production "would nowadays," as Harris says, "be called "physical," as opposed to either "psychological" or "functional "" (Reading 49), consequently in the Course he articulated the direction and outlines of a functional analysis of language. Similarly, his only extended published work, Mémoire sur le système primitif des voyelles dans les langues indo-européennes (Memoir on the primitive system of vowels in Indo-European languages), which appeared in 1878, was fully situated within the project of nineteenth- century historical linguistics. Nevertheless, within this work, as Jonathan Culler has argued, Saussure demonstrated "the fecundity of thinking of language as a system of purely relational items, even when working at the task of historical reconstruction" (Saussure 66). By analyzing the systematic structural relationships among phonemes to account for patterns of vowel alternation in existing Indo-European languages, Saussure suggested that in addition to several different phonemes /a/, there must have been another phoneme that could be described formally. "What makes Saussure"s work so very impressive," Culler concludes, "is the fact that nearly fifty years later, when cuneiform Hittite was discovered and deciphered, it was found to contain a phoneme, written h, which behaved as Saussure had predicted . He had discovered, by a purely formal analysis, what are now known as the laryngeals of Indo-European" (66).

This conception of the relational or diacritical determination of the elements of signification, which is both implicit and explicit in the Course, suggests a third assumption governing structural linguistics, what Saussure calls "the arbitrary nature of the sign." By this he means that the relationship between the signifier and signified in language is never necessary (or "motivated"): one could just as easily find the sound signifier arbre as the signifier tree to unite with the concept "tree". But more than this, it means that the signified is arbitrary as well: one could as easily define the concept "tree" by its woody quality (which would exclude palm trees) as by its size (which excludes the "low woody plants" we call bushes). This should make clear that the numbering of assumptions I have been presenting does not represent an order of priority: each assumption - the systemic nature of signification (best apprehended by studying language "synchronously"), the relational or "diacritical" nature of the elements of signification, the arbitrary nature of signs - derives its value from the others.

That is, Saussurean linguistics the phenomena it studies in overarching relationships of combination and contrast in language. In this conception, language is both the process of articulating meaning (signification) and its product (communication), and these two functions of language are neither identical nor fully congruent (see Schleifer, "Deconstruction"). Here, we can see the alternation between form and content that Greimas and Courtés describe in modernist interpretation: language presents contrasts that formally define its units, and these units combine on succeeding levels to create the signifying content. Since the elements of language are arbitrary, moreover, neither contrast nor combination can be said to be basic. Thus, in language distinctive features combine to form contrasting phonemes on another level of apprehension, phonemes combine to form contrasting morphemes, morphemes combine to form words, words combine to form sentences, and so on. In each instance, the whole phoneme, or word, or sentence, and so on, is greater than the sum of its parts (just as water, H2O, in Saussure"s example [(1959) 103] is more than the mechanical agglomeration of hydrogen and oxygen).

The three assumptions of the Course in General Linguistics led Saussure to call for a new science of the twentieth century that would go beyond linguistic science to study "the life of signs within society." Saussure named this science "semiology (from Greek semeîon "sign")" (16). The "science" of semiotics, as it came to be practiced in Eastern Europe in the 1920s and 1930s and Paris in the 1950s and 1960s, widened the study of language and linguistic structures to literary artifacts constituted (or articulated) by those structures. Throughout the late part of his career, moreover, even while he was offering the courses in general linguistics, Saussure pursued his own "semiotic" analysis of late Latin poetry in an attempt to discover deliberately concealed anagrams of proper names. The method of study was in many ways the opposite of the functional rationalism of his linguistic analyses: it attempted, as Saussure mentions in one of the 99 notebooks in which he pursued this study, to examine systematically the problem of "chance," which " becomes the inevitable foundation of everything" (cited in Starobinski 101). Such a study, as Saussure himself says, focuses on "the material fact" of chance and meaning (cited 101), so that the "theme-word" whose anagram Saussure is seeking, as Jean Starobinski argues, "is, for the poet , an instrument, and not a vital germ of the poem. The poem is required to re-employ the phonic materials of the theme-word" (45). In this analysis, Starobinski says, "Saussure did not lose himself in a search for hidden meanings." Instead, his work seems to demonstrate a desire to evade all the problems arising from consciousness: "Since poetry is not only realized in words but is something born from words, it escapes the arbitrary control of consciousness to depend solely on a kind of linguistic legality "(121).

That is, Saussure"s attempt to discover proper names in late Latin poetry - what Tzvetan Todorov calls the reduction of a "word . . . to its signifier" (266) - emphasizes one of the elements that governed his linguistic analysis, the arbitrary nature of the sign. (It also emphasizes the formal nature of Saussurean linguistics - "Language," he asserts, "is a form and not a substance" - which eliminate effectivelys semantics as a major object of analysis.) As Todorov concludes, Saussure"s work appears remarkably homogeneous today in its refusal to accept symbolic phenomena . . . . In his research on anagrams, he pays attention only to the phenomena of repetition, not to those of evocation. . . . In his studies of the Nibelungen, he recognizes symbols only in order to attribute them to mistaken readings: since they are not intentional, symbols do not exist. Finally in his courses on general linguistics, he contemplates the existence of semiology, and thus of signs other than linguistic ones; but this affirmation is at once limited by the fact that semiology is devoted to a single type of sign: those which are arbitrary. (269-70)

If this is true, it is because Saussure could not conceive of "intention" without a subject; he could not quite escape the opposition between form and content his work did so much to call into question. Instead, he resorted to "linguistic legality." Situated between, on the one hand, nineteenth-century conceptions of history, subjectivity, and the mode of causal interpretation governed by these conceptions and, on the other hand, twentieth-century "structuralist" conceptions of what Lévi-Strauss called "Kantianism without a transcendental subject" (cited in Connerton 23) - concepts that erase the opposition between form and content (or subject and object) and the hierarchy of foreground and background in full-blown structuralism, psychoanalysis, and even quantum mechanics - the work of Ferdinand de Saussure in linguistics and semiotics circumscribes a signal moment in the study of meaning and culture.

Ronald Schleifer

Annex 2

Ferdinand de Saussure (translation)

The Swiss linguist Ferdinand de Saussure (1857-1913) is considered the founder of modern linguistics - thanks to his attempts to describe the structure of the language, and not the history of individual languages ​​and word forms. By and large, the foundations of structural methods in linguistics and literary criticism and, to a large extent, semiotics were laid in his works at the very beginning of the twentieth century. It is proved that the methods and concepts of the so-called "post-structuralism", developed in the works of Jacques Derrida, Michel Foucault, Jacques Lacan, Julia Kristeva, Roland Barthes and others, go back to the linguistic works of Saussure and anagrammatic readings of late Roman poetry. It should be noted that Saussure's work on linguistics and linguistic interpretation helps to connect a wide range of intellectual disciplines - from physics to literary innovations, psychoanalysis and philosophy of the early twentieth century. A. J. Greimas and J. Kurte write in Semiotics and Language: “An analytical dictionary with the title “Interpretation” as a new kind of interpretation appeared at the beginning of the 20th century along with the linguistics of Saussure, the phenomenology of Husserl and the psychoanalysis of Freud. In such a case, "interpretation is not the attribution of a given content to a form that would otherwise lack one; rather, it is a paraphrase which formulates in another way the same content of a significant element within a given semiotic system" (159). In this understanding of "interpretation", form and content are inseparable; on the contrary, each form is filled with semantic meaning (“meaningful form”), so the interpretation offers a new, similar retelling of something meaningful in another sign system.

A similar understanding of form and content, presented by Claude Lévi-Strauss in one of the programmatic works of structuralism, ("Structure and Form: Reflections on the Works of Vladimir Propp"), can be seen in Saussure's posthumously published book A Course in General Linguistics (1916, trans., 1959, 1983). During his lifetime, Saussure published little, "Course" - his main work - was collected from the notes of students who attended his lectures on general linguistics in 1907-11. In the Course, Saussure called for a "scientific" study of language, contrasting it with nineteenth-century comparative-historical linguistics. This work can be considered one of the greatest achievements of Western thought: taking individual words as the structural elements of language as a basis, historical (or “diachronic”) linguistics proved the origin and development of Western European languages ​​​​from a common, Indo-European language - and an earlier Proto-Indo-European.

It is precisely this study of the unique occurrences of words, with the concomitant assumption that the basic "unit" of language is, in fact, the positive existence of these "word elements" that Saussure questioned. His work was an attempt to reduce the many facts about language casually studied by comparative linguistics to a small number of theorems. The comparative philological school of the 19th century, writes Saussure, "did not succeed in creating a real school of linguistics" because "it did not understand the essence of the object of study" (3). This "essence", he argues, lies not only in individual words - the "positive substances" of language - but also in the formal connections that help these substances to exist.

Saussure's "test" of language is based on three assumptions. First, the scientific understanding of language is based not on a historical, but on a structural phenomenon. Therefore, he distinguished between individual phenomena of the language - "events of speech", which he defines as "parole" - and the proper, in his opinion, object of study of linguistics, the system (code, structure) that controls these events ("langue"). Such a systematic study, moreover, requires a "synchronous" conception of the relationship between the elements of language at a given moment, rather than a "diachronic" study of the development of a language through its history.

This hypothesis was the forerunner of what Roman Jakobson in 1929 would call "structuralism" - a theory where "any set of phenomena investigated by modern science is considered not as a mechanical accumulation, but as a structural whole in which the constructive component is correlated with the function" ("Romantic "711). In this passage, Jakobson formulated Saussure's idea of ​​defining language as a structure, as opposed to the "mechanical" enumeration of historical events. In addition, Jakobson develops another Saussurean assumption, which became the forerunner of structural linguistics: the basic elements of language should be studied in connection not so much with their causes, but with their functions. Separate phenomena and events (say, the history of the origin of individual Indo-European words) should be studied not by themselves, but in a system in which they are correlated with similar components. This was a radical turn in the comparison of phenomena with the surrounding reality, the significance of which was compared by the philosopher Ernst Cassirer with "the science of Galileo, which turned the ideas about the material world in the seventeenth century." Such a turn, as Greimas and Kurthe note, changes the idea of ​​"interpretation", consequently, the explanations themselves. Phenomena began to be interpreted not in relation to the causes of their occurrence, but in relation to the effect that they can have in the present and future. Interpretation ceased to be independent of a person’s intentions (despite the fact that intentions can be impersonal, “unconscious” in the Freudian sense of the word).

In his linguistics, Saussure especially shows this turn in the change in the concept of the word in linguistics, which he defines as a sign and describes in terms of its functions. A sign for him is a combination of sound and meaning, "signified and designation" (66-67; in the English translation of 1983 by Roy Harris - "signification" and "signal"). The nature of this compound is "functional" (neither one nor the other element can exist without each other); moreover, "one borrows qualities from the other" (8). Thus, Saussure defines the main structural element of language - the sign - and makes the basis of historical linguistics the identity of signs to words, which requires a particularly rigorous analysis. Therefore, we can understand different meanings of, say, the same word "tree" - not because the word is only a set of certain qualities, but because it is defined as an element in the sign system, in the "structural whole", in the language.

Such a relative ("diacritical") concept of unity underlies the concept of all elements of the language in structural linguistics. This is especially clear in the most original discovery of Saussurean linguistics, in the development of the concept of "phonemes" and "distinctive features" of language. Phonemes are the smallest of the spoken and meaningful language units. They are not only sounds that occur in the language, but "sound images", notes Saussure, which are perceived by native speakers as having meaning. (It should be noted that Elmar Holenstein calls Jakobson's linguistics, which continues the ideas and concepts of Saussure in its main provisions, "phenomenological structuralism"). That is why the leading speaker of the Prague School of Structuralism, Jan Mukarowski, observed in 1937 that “structure. . . not an empirical, but a phenomenological concept; it is not the result itself, but a set of significant relations of the collective consciousness (generation, others, etc.)”. A similar thought was expressed in 1960 by Lévi-Strauss, the leader of French structuralism: “The structure has no definite content; it is meaningful in itself, and the logical construction in which it is enclosed is the imprint of reality.

In turn, phonemes, as the smallest linguistic elements acceptable for perception, represent a separate integral "phenomenological reality". For example, in English, the sound "t" can be pronounced differently, but in all cases, a person who speaks English will perceive it as "t". Aspirated, raised or lowered, a long "t" and the like will equally distinguish the meaning of the words "to" and "do". Moreover, the differences between languages ​​are such that varieties of one sound in one language can correspond to different phonemes in another; for example, "l" and "r" in English are different, while in other languages ​​they are varieties of the same phoneme (like the English "t", pronounced with and without aspiration). The vast vocabulary of any natural language is a set of combinations of a much smaller number of phonemes. In English, for example, only 40 phonemes are used to pronounce and write about a million words.

The sounds of a language are a systematically organized set of features. In the 1920s -1930s, following Saussure, Jacobson and N.S. Trubetskoy singled out the "distinctive features" of phonemes. These features are based on the structure of the organs of speech - tongue, teeth, vocal cords - Saussure notices this in the "Course of General Linguistics", and Harris calls it "physiological phonetics" (in Baskin's earlier translation, the term "phonology" is used) - they are connected in "knots » durg against a friend to make sounds. For example, in English, the difference between "t" and "d" is the presence or absence of a "voice" (the tension of the vocal cords), and the level of voice that distinguishes one phoneme from another. Thus, phonology can be considered an example of the general language rule described by Saussure: "There are only differences in language." Even more important is not this: the difference usually implies the exact conditions between which it is located; but in language there are only differences without precise conditions. Whether we are considering "designation" or "signified" - in the language there are neither concepts nor sounds that would have existed before the development of the language system.

In such a structure, linguistic analogies are defined not with the help of their inherent qualities, but with the help of system (“structural”) relations.

I have already mentioned that phonology in its development relied on the ideas of Saussure. Although his analysis of linguistic physiology in modern times, Harris says, "would be called 'physical', as opposed to 'psychological' or 'functional', in The Course he clearly articulated the direction and basic principles of the functional analysis of language. His only published work during his lifetime, Mémoire sur le système primitif des voyelles dans les langues indo-européennes (Notes on the original vowel system in the Indo-European languages), published in 1878, was completely in line with comparative historical linguistics of the 19th century. Nevertheless, in this work, says Jonathan Culler, Saussure showed "the fruitfulness of the idea of ​​language as a system of interconnected phenomena, even with its historical reconstruction." Analyzing the relationship between phonemes, explaining the alternation of vowels in the modern languages ​​of the Indo-European group, Saussure suggested that in addition to several different sounds "a", there must be other phonemes that are described formally. “What makes Saussure’s work particularly impressive,” Kaller concludes, “is that almost 50 years later, when Hittite cuneiform was discovered and deciphered, a phoneme was found, in writing denoted by “h”, which behaved as Saussure predicted. Through formal analysis, he discovered what is now known as the guttural sound in the Indo-European languages.

In the concept of a relative (diacritical) definition of signs, both explicit and implied in the Course, there is a third key assumption of structural linguistics, called by Saussure the "arbitrary nature of the sign." By this is meant that the relation between sound and meaning in language is not motivated by anything: one can just as easily connect the word "arbre" and the word "tree" with the concept of "tree". Moreover, this means that the sound is also arbitrary: you can define the concept of "tree" by the presence of its bark (except for palm trees) and by size (except for "low woody plants" - shrubs). From this it should be clear that all the assumptions I present are not divided into more and less important ones: each of them - the systemic nature of signs (most understandable in the "synchronous" study of the language), their relative (diacritical) essence, the arbitrary nature of signs - comes from from the rest.

Thus, in Saussurean linguistics, the studied phenomenon is understood as a set of comparisons and oppositions of language. Language is both an expression of the meaning of words (designation) and their result (communication) - and these two functions never coincide (see Shleifer's "Deconstruction of Language"). We can see the alternation of form and content that Greimas and Kurte describe in the latest version of interpretation: linguistic contrasts define its structural units, and these units interact on successive levels to create a certain meaningful content. Since the elements of language are random, neither contrast nor combination can be the basis. This means that in a language, distinctive features form a phonetic contrast at a different level of understanding, phonemes are combined into contrasting morphemes, morphemes - into words, words - into sentences, etc. In any case, an entire phoneme, word, sentence, etc. is more than the sum of its parts (just like water, in Saussure's example, more than the combination of hydrogen and oxygen).

Three assumptions of the "Course of General Linguistics" led Saussure to the idea of ​​a new science of the twentieth century, separate from linguistics, studying "the life of signs in society." Saussure called this science semiology (from the Greek "semeîon" - a sign). The "science" of semiotics, which developed in Eastern Europe in the 1920s and 1930s and in Paris in the 1950s and 1960s, extended the study of language and linguistic structures into literary finds composed (or formulated) in terms of these structures. In addition, in the twilight of his career, in parallel to his course in general linguistics, Saussure engaged in a "semiotic" analysis of late Roman poetry, trying to discover deliberately composed anagrams of proper names. This method was in many ways the opposite of rationalism in its linguistic analysis: it was an attempt, as Saussure writes in one of the 99 notebooks, to study in the system the problem of "probability", which "becomes the basis of everything." Such an investigation, Saussure himself claims, helps to focus on the "real side" of probability; The “key word” for which Saussure is looking for an anagram is, according to Jean Starobinsky, “a tool for the poet, and not the source of life for the poem. The poem serves to reverse the sounds of the key word. According to Starobinsky, in this analysis, "Saussure does not delve into the search for hidden meanings." On the contrary, in his works, a desire to avoid questions related to consciousness is noticeable: “since poetry is expressed not only in words, but also in what these words give rise to, it goes beyond the control of consciousness and depends only on the laws of language.”

Saussure's attempt to study proper names in late Roman poetry (Tsvetan Todorov called this an abbreviation of "a word ... only before it is written") emphasizes one of the components of his linguistic analysis - the arbitrary nature of signs, as well as the formal essence of Saussurean linguistics ("Language," claims he, "the essence of the form, not the phenomenon"), which excludes the possibility of analyzing the meaning. Todorov concludes that today Saussure's writings seem remarkably consistent in their reluctance to study symbols [phenomena that have a well-defined meaning]. . . . Exploring anagrams, Saussure pays attention only to repetition, but not to previous options. . . . Studying the Nibelungenlied, he defines the symbols only to assign them to erroneous readings: if they are unintentional, the symbols do not exist. After all, in his writings on general linguistics, he makes the assumption of the existence of a semiology that describes not only linguistic signs; but this assumption is limited by the fact that semilogy can only describe random, arbitrary signs.

If this is really so, it is only because he could not imagine "intention" without an object; he could not completely bridge the gap between form and content - in his writings this turned into a question. Instead, he turned to "linguistic legitimacy". Standing between, on the one hand, nineteenth-century concepts based on history and subjective conjectures, and methods of accidental interpretation based on these concepts, and, on the other hand, structuralist concepts, which Lévi-Strauss called "Kantianism without a transcendent actor" - erasing the opposition between form and content (subject and object), meaning and origin in structuralism, psychoanalysis and even quantum mechanics, Ferlinand de Saussure's writings on linguistics and semiotics mark a turning point in the study of meanings in language and culture.

Ronald Shleifer

Literature

1. Admoni V.G. Fundamentals of the theory of grammar / V.G. Admoni; USSR Academy of Sciences.-M.: Nauka, 1964.-104p.

3. Arapov, M.V., Herts, M.M. Mathematical methods in linguistics. M., 1974.

4. Arnold I.V. The semantic structure of the word in modern English and the methodology for its study. /I.V. Arnold-L .: Education, 1966. - 187 p.

6.Bashlykov A.M. Automatic translation system. / A.M. Bashlykov, A.A. Sokolov. - M.: LLC "FIMA", 1997. - 20 p.

7. Baudouin de Courtenay: Theoretical heritage and modernity: Abstracts of the reports of the international scientific conference / Ed.I.G. Kondratiev. - Kazan: KGU, 1995. - 224 p.

8. A. V. Gladkiy, Elements of Mathematical Linguistics. / . Gladkiy A.V., Melchuk I.A. -M., 1969. - 198 p.

9. Golovin, B.N. Language and statistics. /B.N. Golovin - M., 1971. - 210 p.

10. Zvegintsev, V.A. Theoretical and applied linguistics. / V.A. Zvegintsev - M., 1969. - 143 p.

11. Kasevich, V.B. Semantics. Syntax. Morphology. // V.B. Kasevich - M., 1988. - 292 p.

12. Lekomtsev Yu.K. Introduction to the formal language of linguistics / Yu.K. Lekomtsev. - M.: Nauka, 1983, 204 p., ill.

13. Linguistic legacy of Baudouin de Courtenay at the end of the twentieth century: Abstracts of the reports of the international scientific and practical conference March 15-18, 2000. - Krasnoyarsk, 2000. - 125 p.

Matveeva G.G. Hidden grammatical meanings and identification of the social person (“portrait”) of the speaker / G.G. Matveev. - Rostov, 1999. - 174 p.

14. Melchuk, I.A. Experience in building linguistic models "Meaning Text"./ I.A. Melchuk. - M., 1974. - 145 p.

15. Nelyubin L.L. Translation and applied linguistics / L.L. Nelyubin. - M.: Higher School, 1983. - 207 p.

16. On the exact methods of language research: on the so-called "mathematical linguistics" / O.S. Akhmanova, I.A. Melchuk, E.V. Paducheva and others - M., 1961. - 162 p.

17. Piotrovsky L.G. Mathematical Linguistics: Textbook / L.G. Piotrovsky, K.B. Bektaev, A.A. Piotrovskaya. - M.: Higher School, 1977. - 160 p.

18. He is. Text, machine, person. - L., 1975. - 213 p.

19. He is. Applied Linguistics / Ed. A.S. Gerda. - L., 1986. - 176 p.

20. Revzin, I.I. language models. M., 1963. Revzin, I.I. Modern structural linguistics. Problems and methods. M., 1977. - 239 p.

21. Revzin, I.I., Rozentsveig, V.Yu. Fundamentals of general and machine translation / Revzin I.I., Rozentsveig, V.Yu. - M., 1964. - 401 p.

22. Slyusareva N.A. The theory of F. de Saussure in the light of modern linguistics / N.A. Slyusareva. - M.: Nauka, 1975. - 156 p.

23. Owl, L.Z. Analytical linguistics / L.Z. Owl - M., 1970. - 192 p.

24. Saussure F. de. Notes on General Linguistics / F. de Saussure; Per. from fr. - M.: Progress, 2000. - 187 p.

25. He is. Course of General Linguistics / Per. from fr. - Yekaterinburg, 1999. -426 p.

26. Speech statistics and automatic text analysis / Ed. ed. R.G. Piotrovsky. L., 1980. - 223 p.

27. Stoll, P. Sets. Logic. Axiomatic theories. / R. Stoll; Per. from English. - M., 1968. - 180 p.

28. Tenier, L. Fundamentals of structural syntax. M., 1988.

29. Ubin I.I. Automation of translation activities in the USSR / I.I. Ubin, L.Yu. Korostelev, B.D. Tikhomirov. - M., 1989. - 28 p.

30. Faure, R., Kofman, A., Denis-Papin, M. Modern Mathematics. M., 1966.

31. Shenk, R. Processing of conceptual information. M., 1980.

32. Shikhanovich, Yu.A. Introduction to modern mathematics (initial concepts). M., 1965

33. Shcherba L.V. Russian vowels in qualitative and quantitative terms / L.V. Shcherba - L.: Nauka, 1983. - 159 p.

34. Abdullah-zade F. Citizen of the world // Spark - 1996. - No. 5. - p.13

35. V.A. Uspensky. Preliminary for the readers of the "New Literary Review" to the semiotic messages of Andrei Nikolaevich Kolmogorov. - New Literary Review. -1997. - No. 24. - S. 18-23

36. Perlovsky L. Consciousness, language and culture. - Knowledge is power. -2000. №4 - S. 20-33

37. Frumkina R.M. About us - obliquely. // Russian Journal. - 2000. - No. 1. - p. 12

38. Fitialov, S.Ya. On Syntax Modeling in Structural Linguistics // Problems of Structural Linguistics. M., 1962.

39. He is. On the Equivalence of NN Grammar and Dependency Grammar // Problems of Structural Linguistics. M., 1967.

40. Chomsky, N. Logical foundations of linguistic theory // New in linguistics. Issue. 4. M., 1965

41. Schleifer R. Ferdinand de Saussure//press. jhu.ru

42. www.krugosvet.ru

43. www.lenta.ru

45. press. jhu.ru

46. ​​en.wikipedia.org