Mapping the gene frontier
READING the genetic instructions for a human being is like trying to find meaningful paragraphs in a strangely organized book. Varying amounts of what appear to be gibberish separate the relevant sentences and paragraphs. And the geneticist may not even know in which chapter specific paragraphs lie. Thus, reading the human genome, as this instruction set is called, involves more than decoding the sequence of some 3 billion ``letters'' in which the ``book'' is written. The reader also has to have maps that show the layout of the book and in which chapter - and where in that chapter - particular paragraphs lie.
Microbiologists plan to sequence the human genome by constructing these maps in sufficient detail to be accurate guides. They should finish such maps within another few years. With the maps' aid, they can work out the sequence of letters, concentrating first on the most interesting paragraphs. So far, they have sequenced only about .4 percent of the human genome. Getting the entire 3-billion-letter sequence could take one to two decades.
This ``book'' of organic life consists of molecules of DNA (deoxyribonucleic acid). It is written in a chemical code based on four ``letters'' called nucleotides. These are the building blocks of DNA. Each nucleotide consists of sugar, phosphate, and one of four nitrogen bases. For convenience, biochemists represent the four elements of the genetic code by the initials of these bases - A (adenine), C (cytosine), G (guanine), T (thymine).
The code is formed by taking the letters in groups of three. Four letters can form 64 different triplets. Most of these triplets stand for one or another of the 20 amino acids that are the building blocks of proteins. Several different triplets may code for the same amino acid.
Proteins are the main components of living organisms. They form structural elements of bone and other tissue. They act as hormones and enzymes to regulate biochemical activity. Since a series of triplets of the genetic code can represent a specific protein, the code can specify an organism's makeup in detail.
Sequencing the human genome, or any part of it, means uncovering the linear sequence of code letters (A,C,G,T) along the DNA involved. The sequence often is organized into segments called genes, 1,000 to 1 million letters long, which act together as a unit. Genes direct protein synthesis or perform regulatory functions in a cell. In this way, the information carried by DNA regulates the development and function of an organism.
The meaningful parts of a DNA gene sequence, called exons, are physically separate along the DNA molecule by sequences of code, called introns, that appear to have no genetic function. When a gene is activated, it is copied in such a way that the introns drop out and the exons join together to form a message that is sent to the relevant chemical machinery in the cell.
To use the book analogy, you can think of exons as parts of the paragraphs (genes) that specify various genetic functions. These paragraphs (genes) are organized into chapters, which are groups of DNA molecules called chromosomes. The book of the human genome consists of two, nearly identical, volumes of 23 chapters (chromosomes) each.
Microbiologists are beginning to find out which chromosome contains certain genes and how these genes are located relative to one another. They also are finding out where given genes are located physically along a specific chromosome. Their techniques include cutting up DNA molecules at known sites, studying how certain genes seem to be inherited within families, using distinctive markers (sequences of letters) along DNA molecules, and matching up sequences on the DNA molecule with templates that carry known sequences of genetic code. Within a few years, they expect to have a map with enough identified markers to readily find their way around all 23 pairs of chromosomes.
This is at the core of the effort to sequence the entire human genome. There are on the order of 100,000 genes. Only about 2 percent have been identified along the chromosomes. Finding the rest involves sorting through a mass of seemingly extraneously material. Some 90 percent of the code sequence of the genome has no presently known function. Many microbiologists call it junk DNA. Some consider it an evolutionary relic with no present function. Others warn that our ignorance is so great that this ``junk'' may hold the key to major genetic functions.
``I am absolutely convinced that when we know the sequence of the human genome we will discover things that have not heretofore even been imagined,'' says Stanford University's Paul Berg. He adds, ``My premise is that none of it is junk.''