Synthetic genomics: half a step away from the “element of life. From cell populations to individual cells

Genomics - the study of the whole genome

Latest advances in sequencing and development technical means for processing a large number clones in the gene library allowed scientists to study the entire genome of an organism at once. The complete sequences of many species have now been determined, including most of the so-called model genetic organisms such as E. coli;roundworm Caenorhabditis elegans; and, of course, the classic object of genetics, the fruit fly Drosophila melanogaster. In the 1990s, despite a number of troubles and disagreements, a project was launched to study the human genome (“Human Genome”), for which funds were allocated by National Institute health. In February 2001 large group researchers led by J. Craig Venter of the private laboratory Celera Genomics made a statement about the preliminary decoding of the human genome. The result of their work was published on February 16, 2001 in the journal Science.

Another version, submitted by a group from the International Human Genome Sequencing Consortium, was published on February 13, 2001 in the journal Nature.

The birth of genomics can be considered the middle of the 20th century, when geneticists mapped all the chromosomes of model organisms based on the frequency of recombinations (see Chapter 8). However, these maps showed only those genes for which mutant alleles were known, and therefore such maps cannot be called complete. Full DNA sequencing allows you to locate all the genes of an organism, as well as establish the sequence of bases between them.

Genomics is divided into structural and functional. Structural genomics aims to find out exactly where certain genes are located in chromosomal DNA. Computer programs recognize the typical beginnings and ends of genes, selecting those sequences that are most likely to be genes. Such sequences are called open reading frame (OFR). The same computer programs can also recognize typical introns in OFR sequences. After the introns are isolated from the potential gene, the computer uses the remaining code to determine the sequence of amino acids in the protein. Then these potential proteins are compared with those proteins whose functions are already known and whose sequences are already entered into the database. Thanks to this kind of programs, the so-called evolutionary conservatism: that for most genes in different organisms there are similar genes. From the standpoint of evolutionary development, this similarity is understandable: if a protein of one biological species is well adapted for its functions, then its gene is transmitted in the same form or with small changes to species derived from the initial. Evolutionary conservatism allows the identification of genes related to a given gene in other organisms. By comparing the resulting gene with those already known, it is often possible to determine its function, necessarily checking it in subsequent experiments.

Once all potential genes have been identified, the genetic mapping begins. The human genetic map is a rather confusing and motley diagram, since each gene is marked with a certain color depending on its function, which is established in comparison with other known genes. Most human genes, like the genes of all eukaryotes in general, have large introns. According to rough estimates, among the published sequences, about a third or a quarter are introns. Curiously, only about 1.5% of the entire human genome (about 2.9 x 10 9 base pairs) contains sequences (exons) that code for proteins. Also, this DNA only seems to contain 35,000-45,000 genes, which is less than predicted. We have yet to understand how a relatively small number of genes code for such a complex organism.

Number of copies of repetitive DNA different people is not the same, so they can be used to establish the identity, including in forensic medicine.

functional genomics is the study of gene function at the level of the entire genome. Although potential genes can be identified by their similarity to genes that perform known functions in other organisms, all guesses should be tested against the organism under study. In some model organisms, such as nutritional yeast, it is possible to systematically turn off the function of genes one by one. Turning off the gene occurs by replacing its functional form with an erased form on a special vector. Then get a strain with a disabled gene and evaluate its phenotype. In an ongoing program to analyze the nutritional yeast genome, several thousand genes have been turned off one by one.

Another method of functional genomics is that they study the mechanism of transcription at the level of the entire genome. This method based on the assumption that most biological phenomena represent complex processes involving many genes. Of particular interest to researchers are the processes associated with the development of the organism, which we mentioned in Chap. 11. If gene transcription is studied in different conditions growth, then you can get an idea of ​​the complete genetic pathways of the development of the organism.

But how can transcription be studied at the genome-wide level? Again, new technologies help scientists in this. The DNA of each gene in the genome or some part of the genome is placed on the surface of small glass plates arranged in order. Then they are exposed to all types of mRNA found in the cell given organism. DNA on plates is obtained in two ways. In one way, all mRNAs are subjected to reverse transcription to get short complementary DNA molecules corresponding to one gene. In another way, genes (or parts of genes) are synthesized one base at a time in certain areas of the plates. Synthesis is carried out by robots that open and close the glass surface in a certain order. Records with the genome of many organisms can be purchased from chemical companies.

Genomics is usually called one of the branches molecular biology. Its main task lies in the so-called genome sequencing - the study of the nucleotide sequences of DNA and RNA. Do not confuse the words genetics and genomics. Genetics deals with the study of the mechanisms of heredity and variability, and genomics is designed to put into practice the knowledge gained.

From the history of science

As a special area, genomics was formed in 1980-1990 along with the emergence of the first projects for sequencing (molecular analysis) of genomes certain types living organisms.

Structure of genomics

In modern genomics, there are many subsections:

  • comparative or evolutionary genomics, it is based on a comparison of the organization and content of the genomes of various living organisms;
  • functional genomics - studies in detail the functions of genes, their impact on gene activity;
  • Structural genomics deals with sequencing, the molecular analysis of DNA, on the basis of which genomic maps are created and can be compared.

Why do we need genomics

A large number of genomes of various microorganisms (mainly pathogenic) have been deciphered. This makes it possible to search for drug target genes here and to manufacture new drugs.

Genomics is perceived as an integral, necessary part general biology. It is able to make a significant contribution to the development of biotechnology, Agriculture, healthcare.

In a hospital in Wisconsin, a three-year-old toddler baffled doctors for a long time. In this child, the intestines were edematous, and were almost completely riddled with abscesses. This child had survived more than a hundred surgeries by the age of three. The baby was given a complete sequence of the coding regions of his DNA, the culprit of the disease was identified - the XIAP protein, which is involved in the signal chains of programmed cell death, plays a very important role in the immune system. Due to the diagnosis, physiologists recommended a bone marrow transplant. The baby was saved.

Another case involved an atypical cancer in a thirty-nine-year-old woman who suffered from an acute form of promyelocytic leukemia. When using standard diagnostic methods, the disease could not be detected. But when deciphering and analyzing the genome cancer cells it was possible to find out that a large section of the fifteenth chromosome moved to the seventeenth, which provoked a certain gene interaction. The patient was prescribed adequate treatment.

First draft, 2003 - completion of the project). Its development became possible not only due to the improvement of biochemical methods, but also due to the emergence of a more powerful computer science which made it possible to work with huge amounts of data. The length of genomes in living organisms is sometimes measured in billions of base pairs. For example, the human genome is about 3 billion base pairs. The largest known (at the beginning of 2010) genomes belongs to one of the lungfish species (approximately 110 billion pairs).

Sections of genomics

Structural genomics

Structural genomics - the content and organization of genomic information. It aims to study genes with a known structure in order to understand their function, as well as to determine spatial structure the maximum number of "key" protein molecules and its influence on interactions.

functional genomics

Functional genomics is the implementation of the information recorded in the genome from the gene to the trait.

Comparative genomics

Comparative genomics (evolutionary) - comparative studies of the content and organization of genomes different organisms.

Obtaining complete genome sequences has shed light on the degree of differences between the genomes of different living organisms. The table below presents preliminary data on the similarity of the genomes of different organisms with the human genome. The similarity is given as a percentage (reflecting the proportion of base pairs that are identical in the two compared species).

View similarity Notes and sources
Human 99,9 % Human Genome Project
100 % identical twins
Chimpanzee 98,4 % Americans for Medical Progress;
98,7 % Richard Mural of Celera Genomics, quoted on MSNBC
Bonobo, or pygmy chimpanzee The same as for chimpanzees.
Gorilla 98,38 % Based on the study of intergenic non-repetitive DNA (American Journal of Human Genetics, February 2001, 682, pp. 444-456)
Mouse 98 %
85 % when comparing all sequences encoding proteins, NHGRI
Dog 95 % Jon Entine at the San Francisco Examiner
C.elegans 74 % Jon Entine at the San Francisco Examiner
Banana 50 % Americans for Medical Progress
Narcissus 35 % Steven Rose in The Guardian January 22

Examples of the application of genomics in medicine

In a Wisconsin hospital, a three-year-old child baffled doctors for a long time, his intestines were swollen and completely riddled with abscesses. By the age of three, this child had experienced more than a hundred separate surgeries. For him, a full sequence of the coding regions of his DNA was ordered, according to the results, with the help of improvised means, the culprit of the disease was identified - the XIAP protein involved in the signal chains of programmed cell death. At normal operation it plays a very important role in the immune system. Based on this diagnosis, the physiologists recommended bone marrow transplantation in June 2010. By mid-June, the child was already able to eat for the first time in his life.

Another case was associated with an atypical cancer in a 39 year old woman suffering from acute form promyelocytic leukemia. At standard methods diagnosis, however, the disease was not identified. But when deciphering and analyzing the genome of cancer cells, it turned out that a large section of the 15th chromosome moved to the 17th, which caused a certain gene interaction. As a result, the woman received the treatment she needed.

Notes

see also

Links

  • Tishchenko P.D. Genomics: a new type of science in a new cultural situation.
  • Complete Microbial Genomes (completely decoded genomes of bacteria and archaea).

Wikimedia Foundation. 2010 .

Synonyms:

See what "Genomics" is in other dictionaries:

    genomics- * genomics * genomics is a new direction of genetics, the science of genomes, including the study of their structure, functioning and evolution on the molecular, chromosomal, biochemical, physiological levels. One of the tasks of structural G. is ... ... Genetics. encyclopedic Dictionary

    Exist., number of synonyms: 1 genetics (11) ASIS synonym dictionary. V.N. Trishin. 2013 ... Synonym dictionary

    genomics- The science that studies all genes and their role in the structure of the body, as in normal condition, and in case of disease Subjects of biotechnology EN genomics … Technical Translator's Handbook

    Genomics- reading the genome, in particular, of a person, and related scientific and technical activities: ஐ It is obvious that it was easier to come up with impunity to differentiate directions in technobiology, since calling for plagiarism and even improvement ... ... Lem's world - dictionary and guide

    genomics- Genomics Genomics The study of the entire set of genes that make up an organism ... Explanatory English-Russian Dictionary of Nanotechnology. - M.

    genomics- genomika statusas T sritis augalininkystė apibrėžtis Nauja genetikos kryptis, kuri apima genomo individualių genų molekulių lygyje, geno sandaros, jo raiškos, aktyvumo reguliavimo mechanizmo ir genų panaudojimo genų inžinerijos tikslams… … Žemės ūkio augalų selekcijos ir sėklininkystės terminų žodynas

    Branch of genetics that studies the structure and functioning of the genome decomp. organisms with the help of biol., physical. chem. and computer methodsNatural science. encyclopedic Dictionary

    genomics- gene omics, and... Russian spelling dictionary

    Genomics- a section of genetics, the subject of which is the study of the principles of building genomes and their structural functional organizationDictionary of Psychogenetics

    Seeks to describe the three-dimensional structure of each protein encoded by a given genome. A combination of experimental and modeling approaches is used. The fundamental difference between structural genomics and traditional structural ... ... Wikipedia

Books

  • Clinical genetics. Genomics and proteomics of hereditary pathology. Tutorial. Vulture UMO on classical university education, Mutovin Gennady Romanovich. The book discusses the main provisions and concepts of clinical genetics, taking into account the results of the international scientific program `Human Genome` (1988-2005). History, provisions,…

At the end of the 20th century, molecular technologies developed so intensively that the prerequisites were created for the systematic study of the structure of genomes. different types living beings, including humans. One of the most significant goals of these projects is to determine the complete nucleotide sequence of genomic DNA. Thus, a new science was born - genomics.

The beginning of the new millennium was marked by the largest discovery in the field of genomics - the structure of the human genome was deciphered. The news turned out to be so significant that it became the subject of discussion between the presidents of the leading countries of the world. However, many people were not impressed by this message. First of all, this is due to a lack of understanding of what a genome is, what is its structure and what does its decoding mean? Does this news have anything to do with medicine and can it affect each of us? What is molecular medicine and is its development related to deciphering the structure of the genome? Moreover, some people have fears that again a new discovery of scientists to humanity? Will this data be used for military purposes? Will this be followed by a general compulsory genetic examination - a kind of genetic passportization of the population? Will our genome be the subject of analysis and how confidential will the information obtained be? All these issues are currently being actively discussed in the scientific community.

Of course, genomics did not begin with humans, but with much more simply organized living beings. At present, the nucleotide sequence of the genomic DNA of many hundreds of species of microorganisms has been deciphered, most of which are pathogenic. For prokaryotes, the completeness of the analysis turned out to be absolute, that is, not a single nucleotide remains undeciphered! As a result, not only all the genes of these microorganisms are identified, but also the amino acid sequences of the proteins encoded by them are determined. We have repeatedly noted that knowledge of the amino acid sequence of a protein makes it possible to fairly accurately predict its structure and functions. It opens the possibility of obtaining antibodies to this predictive protein, its isolation from the microorganism and direct biochemical analysis. Let's think about what this means for the development of fundamentally new methods of fighting infections if the doctor not only knows how the genes of the infecting microorganism are arranged, but also what is the structure and function of all its proteins? Microbiology is now undergoing tremendous changes due to the emergence of a huge amount of new knowledge, the significance of which we currently do not fully understand. It will probably take decades to adjust this new information to the needs of mankind, primarily in the field of medicine and agriculture.

The transition from prokaryotes to eukaryotes in terms of deciphering the structure of the genome is accompanied by great difficulties, and not only because the length of higher DNA is thousands, and sometimes hundreds of thousands of times longer, but its structure becomes more complex. Recall that a large number of non-coding DNA appears in the genome of higher animals, a significant part of which is repetitive sequences. They introduce significant confusion into the correct docking of already deciphered DNA fragments. And, besides, tandem repetitions themselves are difficult to decipher. In the area of ​​localization of such repeats, DNA can have an unusual configuration, which makes its analysis difficult. Therefore, in the genome of one of the types of microscopic roundworm (nematode) - the first multicellular organism for which it was possible to determine the nucleotide sequence of DNA - there are already a number of obscure places left. True, their specific gravity is less than a hundredth of a percent of the total length of DNA, and these ambiguities do not concern genes or regulatory elements. The nucleotide sequence of all 19,099 genes of this worm, distributed over an area of ​​97 million base pairs, was completely determined. Therefore, the work on deciphering the nematode genome should be recognized as very successful.

Even greater success is associated with the deciphering of the Drosophila genome, which is only 2 times smaller than human DNA and 20 times larger than nematode DNA. Despite the high degree of genetic knowledge of Drosophila, about 10% of its genes were unknown until that moment. But the most paradoxical is the fact that the Drosophila, much more highly organized than the nematode, turned out to have fewer genes than the microscopic roundworm! It is difficult to explain from modern biological positions. More genes than in Drosophila are also present in the decoded genome of a plant from the cruciferous family - Arabidopsis, widely used by geneticists as a classic experimental object.

The development of genomic projects was accompanied by the intensive development of many areas of science and technology. So, a powerful impetus for its development received bioinformatics. A new mathematical apparatus was created for storing and processing huge amounts of information; supercomputer systems with unprecedented power have been designed; Thousands of programs have been written that make it possible in a matter of minutes to conduct a comparative analysis of various blocks of information, daily enter into computer databases new data obtained in various laboratories around the world, and adapt new information to that which was accumulated earlier. At the same time, systems were developed for the effective isolation of various elements of the genome and automatic sequencing, that is, the determination of DNA nucleotide sequences. On this basis, powerful robots have been designed that significantly speed up sequencing and make it less expensive.

The development of genomics, in turn, has led to the discovery of a huge number of new facts. The significance of many of them has yet to be assessed in the future. But even now it is obvious that these discoveries will lead to a rethinking of many theoretical positions concerning the origin and evolution of various forms of life on Earth. They will help you better understand molecular mechanisms underlying the work of individual cells and their interactions; detailed deciphering of many hitherto unknown biochemical cycles; analysis of their connection with fundamental physiological processes. Thus, there is a transition from structural to functional genomics, which in turn creates the prerequisites for research molecular bases the functioning of the cell and the organism as a whole. The information already accumulated will be the subject of analysis over the next few decades. But each next step towards deciphering the structure of the genomes of different species gives rise to new technologies that facilitate the process of obtaining information. Thus, the use of data on the structure and function of the genes of lower organized species of living beings can significantly speed up the search for specific genes of higher ones. And even now, computer analysis methods used to identify new genes often replace rather laborious molecular methods search for genes.

The most important consequence of deciphering the structure of the genome a certain kind is the possibility of identifying all its genes and, accordingly, identifying and determining the molecular nature of the transcribed RNA molecules and all of its proteins. By analogy with the genome, the concepts were born transcriptome, which unites the pool of RNA molecules formed as a result of transcription, and proteome, which includes many proteins encoded by genes. Thus, genomics creates the foundation for the intensive development of new sciences - proteomics and transcriptomics. Proteomics deals with the study of the structure and function of each protein; analysis protein composition cells; determination of the molecular basis of the functioning of a single cell, which is the result of the coordinated work of many hundreds of proteins, and the study of the formation of the phenotypic trait of an organism, which is the result of the coordinated work of billions of cells. Very important biological processes also occur at the RNA level. Their analysis is the subject of transcriptomics.

The greatest efforts of scientists in many countries of the world working in the field of genomics have been directed to solving international project"Human Genome". Significant progress in this area is associated with the implementation of the idea proposed by J.S. Venter, to search for and analyze expressed DNA sequences, which can later be used as a kind of "labels" or markers for certain parts of the genome. Another independent and no less fruitful approach was taken by the work of the group headed by Fr. Collins. It is based on the primary identification of genes for human hereditary diseases.

Deciphering the structure of the human genome led to a sensational discovery. It turned out that the human genome contains only 32,000 genes, which is several times less than the number of proteins. At the same time, there are only 24,000 protein-coding genes; the products of the remaining genes are RNA molecules. The percentage of similarity in DNA nucleotide sequences between different individuals, ethnic groups and races is 99.9%. This similarity is what makes us human - Homo sapiens! All our variability at the nucleotide level fits into a very modest figure - 0.1%. Thus, genetics leaves no room for ideas of national or racial superiority.

But, look at each other - we are all different. National, and even more so, racial differences are even more noticeable. So how many mutations determine the variability of a person not in percentage terms, but in absolute terms? In order to get this estimate, you need to remember what the size of the genome is. The length of a human DNA molecule is 3.2 x 10 9 base pairs. 0.1% of this is 3.2 million nucleotides. But remember that the coding part of the genome occupies less than 3% of the total length of the DNA molecule, and mutations outside this region, most often, do not have any effect on phenotypic variability. Thus, to obtain an integral estimate of the number of mutations that affect the phenotype, you need to take 3% of 3.2 million nucleotides, which will give us a figure of the order of 100,000. That is, about 100 thousand mutations form our phenotypic variability. If we compare this figure with total number genes, it turns out that on average there are 3-4 mutations per gene.

What are these mutations? Their vast majority (at least 70%) determines our individual non-pathological variability, what distinguishes us, but does not make us worse in relation to each other. This includes features such as eye, hair, skin color, body type, height, weight, type of behavior, which is also largely genetically determined, and much more. About 5% of mutations are associated with monogenic diseases. About a quarter of the remaining mutations belong to the class of functional polymorphisms. They are involved in the formation of hereditary predisposition to widespread multifactorial pathology. Of course, these estimates are rather rough, but they make it possible to judge the structure of human hereditary variability.



This is part 1 of the history of genomics, called "Genomic Projects". In this part, I will try to talk popularly about how the first methods of reading genetic sequences appeared, what they consisted of, and how genomics moved from reading individual genes to reading complete genomes, including complete genomes specific people.

Soon after the discovery of Watson and Crick (Fig. 1), the science of genomics was born. Genomics is the science of studying the genomes of organisms, which involves the intensive reading of complete DNA sequences (sequencing) and their mapping into genetic maps. This science also considers the interactions between genes and alleles of genes and their diversity, patterns in evolution and the structure of genomes. The development of this area has been so rapid that, quite recently, text editors like Microsoft Word did not know the word "genome" and tried to correct it to the word "dwarf".

Rice. oneJames Watson (left) and Francis Crick (right) - scientists who discovered the DNA double helix

The very first gene read was the shell gene of bacteriophage MS2, studied in the laboratory of Walter Fyers in 1972. In 1976, other bacteriophage genes were also known - its replicase, the gene responsible for the reproduction of viral particles. Short RNA molecules were already read relatively easily, but large DNA molecules were not yet able to read properly. For example, the 24-letter sequence of the lactose operon gene sequence obtained in 1973 by Walter Gilbert and Allen Max was regarded as a significant breakthrough in science. Here is the sequence:

5"—TGGAATTGTGAGCGGATAACAATT 3"
3"—ACCTTAACACTCGCCTATTGTTAA 5"

The first DNA reading techniques were very inefficient and used radioactive labels for DNA and chemical methods to distinguish between nucleotides. For example, one could take enzymes that cut the nucleotide sequence with different probabilities after different letters. The DNA molecule consists of 4 letters (nucleotides) A, T, G and C, which are part of a double anti-parallel (two strands are directed in opposite sides) spirals. Inside this helix, the nucleotides are opposite each other in accordance with the rule of complementarity: opposite A in the other chain is T, opposite G is C and vice versa.

Gilbert and Maxam used 4 types of enzymes. One cut after A or G, but better after A (A>G), the second cut better after G (G>A), the third after C, and the fourth after C or T (C+T). The reaction was carried out in 4 test tubes with each type of enzyme, and then the products were placed on a gel. DNA is a charged molecule and when the current is turned on it runs from minus to plus. Smaller molecules run faster, so the cut DNA molecules line up in length. Looking at the 4 lanes of the gel, one could tell in what sequence the nucleotides are located.

A breakthrough in the field of DNA sequencing came when the English biochemist Frederick Sanger in 1975 proposed the so-called “strand termination method” for reading DNA sequences. But before talking about this method, it is necessary to introduce the processes occurring during the synthesis of new DNA molecules. For DNA synthesis, an enzyme is needed - DNA-dependent DNA polymerase, which is able to complete the construction of a single-stranded DNA molecule to a double-stranded one. To do this, the enzyme needs a "seed" - a primer, a short DNA sequence that can bind to a long single-stranded molecule, which we want to build up to a double-stranded one. The nucleotides themselves are also required in the form of nucleotide triphosphates and certain conditions, such as a certain content of magnesium ions in the medium and a certain temperature. Synthesis always goes in one direction from the end called 5' to the end called 3'. Of course, to read DNA, you need a large amount of matrix - that is, copies of the DNA that is going to be read.

In 1975, Sanger came up with the following. He took special (terminating) nucleotides, which, having joined the growing chain of the DNA molecule, interfered with the attachment of subsequent nucleotides, that is, they “broke” the chain. Then he took 4 test tubes, to each of which he added all 4 types of nucleotides and one type of terminating nucleotides in a small amount. Thus, in a test tube where the terminating nucleotide “A” was located, the synthesis of each new DNA molecule could break off at any place where “A” should have stood, in a test tube with a terminating “G” - anywhere where G should stand, and so on. Further. 4 lanes from 4 tubes were applied to the gel (Fig. 2) and again the shortest molecules “ran” forward, and the longest remained at the beginning, and by the differences in the bands it was possible to tell which nucleotide follows which. To see the bands, one of the four nucleotides (A, T, G, or C) was labeled, without changing the chemical properties, using radioactive isotopes.

Rice. 2Sanger method. Three series of 4 tracks are shown.

Using this method, the first DNA-based genome was read, the bacteriophage ϕX174 genome, 5.386 nucleotides long (the MS2 phage genome read earlier was RNA-based and had a genome 3.569 nucleotides long).

The Sanger method was significantly improved in the laboratory of Leroy Hood, where in 1985 the radioactive label was replaced with a luminous, fluorescent label. This made it possible to create the first automatic sequencer: each DNA molecule was now colored different color depending on what was the last letter (color-labeled nucleotide terminating the chain). The fragments were separated by size on the gel, and the machine automatically read the luminescence spectrum of the incoming bands, outputting the results to a computer. As a result of this procedure, a chromatogram is obtained (Fig. 2), according to which it is easy to establish a DNA sequence up to 1000 letters long, with a very small number of errors.



Rice. 3 An example of a chromatogram, on a modern sequencer, using the Sanger chain termination method and a glowing label.

For many years, the improved Sanger method will become the main method of mass genome sequencing and will be used for many whole genome projects, and Sanger in 1980 will receive a second nobel prize in chemistry (he received the first back in 1958 for reading the amino acid sequence of the insulin protein - the first protein read). The first complete genome cellular organism became the genome of a bacterium that causes some forms of pneumonia and meningitis - haemophilus influenzae in 1995. The genome of this bacterium was 1,830,137 nucleotides long. In 1998, the first genome of a multicellular animal, a roundworm, appears Caenorhabditis elegans(Fig. 4 on the right), with 98 million nucleotides, and then in the year 2000 the first plant genome appears - Arabidopsis thaliana(Fig. 4 on the left), relatives of horseradish and mustard. The genome of this plant is 157 million nucleotides long. The speed and scale of sequencing grew at an astonishing rate, and the emerging databases of nucleotide sequences were replenished faster and faster.


Rice. four Arabidopsis thaliana(left) and Caenorhabditis elegans(on right).

Finally, it was the turn of the mammalian genome: the mouse and human genomes. When, in 1990, James Watson led the full human genome reading project at the National Institutes of Health (NIH) in the US, many scientists were skeptical of the idea. Such a project required a colossal investment of money and time and, given limited opportunities existing machines for reading genomes, it seemed to many simply not feasible. On the other hand, the project promised revolutionary changes in medicine and understanding of the device. human body but there were problems here too. The fact is that at that moment there was no exact estimate of the number of genes in a person. Many believed that the complexity of the structure of the human body indicates the presence of hundreds of thousands of genes, and maybe several million, and, therefore, sorting out such a number of genes, even if their sequence could be read, would be an impossible task. It was in the presence of a large number of genes that many assumed the fundamental difference between man and other animals - a view subsequently refuted by the human genome project.

The very idea of ​​reading the human genome was born in 1986 at the initiative of the US Department of Energy, which subsequently funded the project together with NIH. The cost of the project was estimated at 3 billion dollars, and the project itself was designed for 15 years with the participation of a number of countries in the project: China, Germany, France, Great Britain and Japan. To read the human genome, the so-called “artificial bacterial chromosomes” (BAC - bacterial artificial chromosome) were used. In this approach, the genome is cut into many pieces, about 150,000 thousand nucleotides long. These fragments are inserted into artificial ring chromosomes that are inserted into bacteria. With the help of bacteria, these chromosomes multiply, and scientists get many copies of the same fragment of the DNA molecule. Each such fragment is then read separately, and the read pieces of 150,000 nucleotides are plotted on a chromosome map. This method allows quite accurate sequencing of the genome, but it is very time consuming.

But the human genome project was moving extremely slowly. The scientist Craig Venter and his company Celera Genomics, founded in 1998, have played about the same role in the history of genomics as Soviet Union influenced the flight of the Americans to the moon. Venter said his company would complete the human genome project before the government project was completed. The project will require only $300 million, a fraction of the cost of the government project, using new technology sequencing "whole genome shotgun" - reading random short fragments of the genome. When Francis Collins, who replaced James Watson as head of the Human Genome Reading Project in 1993, learned of Venter's intentions, he was shocked. “ We will make the human genome, and you can make a mouse Venter suggested. Science community I was excited, and there were a number of reasons for that. Firstly, Venter promised to finish his project in 2001, for 4 years ahead of time planned for the state project. Secondly, Celera Genomics was going to capitalize on the project by creating an absolute database that would be paid for by commercial pharmaceutical companies.

In 2000, Celera proved the effectiveness of her sequencing method by publishing the genome of the Drosophila fruit fly, together with the laboratory of geneticist Gerald Rubin (earlier, the whole genome shotgun was used to read the first genome of a bacterium, but few believed that this method was suitable for large genomes). It was this kick from a commercial company that stimulated the development of improved and the use of more modern methods reading genomes in the Human Genome Project. In 2001, a preliminary version of the genome was published by the State Genomic Project and Celera. Then was done preliminary estimate the number of genes in the human genome, 30-40 thousand. In 2004, the final version of the genome came out, almost two years ahead of schedule. In the last article, it was said that the number of genes in a person is supposedly only 20-25 thousand. This number is comparable with other animals, in particular with a worm C.elegans.

Almost no one guessed that the number of genes that ensure the work of our body can be so small. Later, other details became known: the human genome has a length of about three billion nucleotides, most The genome is made up of non-coding sequences, including all kinds of repeats. Only a small part of the genome actually contains genes - sections of DNA from which functional RNA molecules are read. Interesting fact that as knowledge of the human genome increased, the number of putative genes only decreased: many potential genes turned out to be pseudogenes (non-working genes), in other cases, several genes turned out to be part of the same gene.

Further sequencing rates increased exponentially. In 2005, the Chimpanzee genome was published, which confirmed the amazing similarity between monkeys and humans, which was seen by zoologists of the past. By 2008, the genomes of 32 vertebrates had been completely read, including cat, dog, horse, macaque, orangutan and elephant, 3 invertebrate deuterostome genomes, 15 insect genomes, 7 worm genomes, and hundreds of bacterial genomes.

Finally, in 2007, humanity approached the possibility of sequencing the genomes of individual people. The first person to read the complete individual genome, was Craig Venter (Fig. 4). At the same time, the genome was read in such a way that it was possible to compare Venter's chromosomes, inherited from both parents. So it was found that between one and another set of chromosomes within one person there are about three million one-letter nucleotide differences, not counting the huge number of large varying regions. A year later, the complete diploid genome of James Watson was published (Fig. 5). Watson's genome contained 3.3 million single-letter substitutions compared to the annotated human genome, of which more than 10,000 resulted in changes in the proteins that code for his genes. Watson's genome cost $1 million, that is, the price of reading genomes has fallen by more than 3,000 times in 10 years, but this is not the limit. Today, scientists are faced with the task of ‘1 genome - $1000 - 1 day’, and it no longer seems impossible with the advent of new sequencing technologies. The next part of the "story" will tell about them.


Rice. 5 James Watson and Craig Venter are the first humans to have individually read genomes.

  1. Watson J, Crick F: A Structure for Deoxyribose Nucleic Acid. Nature 1953(171):737-738.
  2. Min Jou W, Haegeman G, Ysebaert M, Fiers W: Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature 1972, 237(5350):82-88.
  3. Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, Min Jou W, Molemans F, Raeymaekers A, Van den Berghe A et al: Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature 1976, 260(5551):500-507.
  4. Gilbert W, Maxam A: The nucleotide sequence of the lac operator. Proc Natl Acad Sci U S A 1973, 70(12):3581-3584.
  5. Maxam AM, Gilbert W: A new method for DNA sequencing. Proc Natl Acad Sci U S A 1977, 74(2):560-564.
  6. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977, 74(12):5463-5467.
  7. Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE: Fluorescence detection in automated DNA sequence analysis. Nature 1986, 321(6071):674-679.
  8. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995, 269(5223):496-512.
  9. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 1998, 282(5396):2012-2018.
  10. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408(6814):796-815.
  11. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF et al: The genome sequence of Drosophila melanogaster. Science 2000, 287(5461):2185-2195.
  12. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al: The sequence of the human genome. Science 2001, 291(5507):1304-1351.
  13. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al: Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860-921.
  14. Finishing the euchromatic sequence of the human genome. Nature 2004, 431(7011):931-945.
  15. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005, 437(7055):69-87.
  16. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G et al: The diploid genome sequence of an individual human. PLoS Biol 2007, 5(10):e254.
  17. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT et al: The complete genome of an individual by massively parallel DNA sequencing. Nature 2008, 452(7189):872-876.
Part 2 - here