Imagine the human genome as a string stretching out for the length of a football field, with all the genes that encode proteins clustered at the end near your feet. Take two big steps forward; all the protein information is now behind you.
The human genome has three billion base pairs in its DNA, but only about 2 percent of them encode proteins. The rest seems like pointless bloat, a profusion of sequence duplications and genomic dead ends often labeled “junk DNA.” This stunningly thriftless allocation of genetic material isn’t limited to humans: Even many bacteria seem to devote 20 percent of their genome to noncoding filler.
Many mysteries still surround the issue of what noncoding DNA is, and whether it really is worthless junk or something more. Portions of it, at least, have turned out to be vitally important biologically. But even beyond the question of its functionality (or lack of it), researchers are beginning to appreciate how noncoding DNA can be a genetic resource for cells and a nursery where new genes can evolve.
“Slowly, slowly, slowly, the terminology of ‘junk DNA’ [has] started to die,” said Cristina Sisu, a geneticist at Brunel University London.
Scientists casually referred to “junk DNA” as far back as the 1960s, but they took up the term more formally in 1972, when the geneticist and evolutionary biologist Susumu Ohno used it to argue that large genomes would inevitably harbor sequences, passively accumulated over many millennia, that did not encode any proteins. Soon thereafter, researchers acquired hard evidence of how plentiful this junk is in genomes, how varied its origins are, and how much of it is transcribed into RNA despite lacking the blueprints for proteins.
Technological advances in sequencing, particularly in the past two decades, have done a lot to shift how scientists think about noncoding DNA and RNA, Sisu said. Although these noncoding sequences don’t carry protein information, they are sometimes shaped by evolution to different ends. As a result, the functions of the various classes of “junk”—insofar as they have functions—are getting clearer.
Cells use some of their noncoding DNA to create a diverse menagerie of RNA molecules that regulate or assist with protein production in various ways. The catalog of these molecules keeps expanding, with small nuclear RNAs, microRNAs, small interfering RNAs and many more. Some are short segments, typically less than two dozen base pairs long, while others are an order of magnitude longer. Some exist as double strands or fold back on themselves in hairpin loops. But all of them can bind selectively to a target, such as a messenger RNA transcript, to either promote or inhibit its translation into protein.
These RNAs can have substantial effects on an organism’s well-being. Experimental shutdowns of certain microRNAs in mice, for instance, have induced disorders ranging from tremors to liver dysfunction.
By far the biggest category of noncoding DNA in the genomes of humans and many other organisms consists of transposons, segments of DNA that can change their location within a genome. These “jumping genes” have a propensity to make many copies of themselves—sometimes hundreds of thousands—throughout the genome, says Seth Cheetham, a geneticist at the University of Queensland in Australia. Most prolific are the retrotransposons, which spread efficiently by making RNA copies of themselves that convert back into DNA at another place in the genome. About half of the human genome is made up of transposons; in some maize plants, that figure climbs to about 90 percent.
Noncoding DNA also shows up within the genes of humans and other eukaryotes (organisms with complex cells) in the intron sequences that interrupt the protein-encoding exon sequences. When genes are transcribed, the exon RNA gets spliced together into mRNAs, while much of the intron RNA is discarded. But some of the intron RNA can get turned into small RNAs that are involved in protein production. Why eukaryotes have introns is an open question, but researchers suspect that introns help accelerate gene evolution by making it easier for exons to be reshuffled into new combinations.