To make better computers, researchers turn to molecular biology

Forget flash drives, hard drives, floppy disks, CDs, records, and VHS tapes. The most efficient way to store data may be all around you.

DNA Double Helix.

National Human Genome Research Institute/NIH

March 2, 2017

Computer engineers have created some amazingly small devices, capable of storing entire libraries of music and movies in the palm of your hand. But geneticists say Mother Nature can do even better.

DNA, where all of biology's information is stored, is incredibly dense. The whole genome of an organism fits into a cell that is invisible to the naked eye.

That's why computer scientists are turning to molecular biology to design the next best way to store humanity's ever-increasing collection of digital data.

Why many in Ukraine oppose a ‘land for peace’ formula to end the war

With every new app, selfie, blog post, or cat video, the hardware to store the world's vast archive of digital information is filling up. But, theoretically, DNA could store up to 455 exabytes per gram. In other words, you could have 44 billion copies of the extended versions of all three of The Lord of the Rings movies on the tip of your finger. (For reference, watching all those movies would take more than 164 million years.)

George Church, a geneticist at Harvard University and the Massachusetts Institute of Technology, first used DNA as storage for digital information in 2012, which he reported in a paper published in the journal Science. At the time, he revealed his success during an interview on the Colbert Report by showing Stephen Colbert a tiny piece of paper on which there was a small spot that contained millions of copies of Dr. Church's book, "Regenesis," in the form of DNA.

Church and his colleagues were focused on proving that digital information could indeed be encoded in DNA at the time. But since then, teams of engineers and biologists have expanded on this proof-of-concept and worked to squeeze more and more data into DNA, eyeing the vast storage Church had predicted possible. 

A team at the European Bioinformatics Institute (EBI) in Hinxton, Britain, reported that they had made the largest DNA archive ever in 2013, putting 739 kilobytes worth of computer files into DNA strands. (Church's book had required about 650 kilobytes.)

In July 2016, a team of Microsoft and University of Washington researchers announced that they had reset that record, storing 200 megabytes of data in DNA.

Howard University hoped to make history. Now it’s ready for a different role.

Now, researchers at the New York Genome Center and Columbia University have ramped up the density of data stored in DNA molecules. They were able to reach a density of 214 petabytes per gram of DNA, according to a paper published Thursday in the journal Science – which is over eight times as dense as previous work.

"This is a huge leap forward," says Church, who was not involved in the new research. Although he had calculated that this high data density was possible in his own work, Church and his team hadn't actually made it work.

"They've proven a hypothetical," he says in a phone interview with The Christian Science Monitor. 

From DVDs to DNA: How does it work?

Digital data in its simplest form is just 0s and 1s, Yaniv Erlich, lead author of the new study, explains in a phone interview with the Monitor. Any file, be it a computer program or a movie, is made up of a series of 0s and 1s. 

Similarly, DNA has its own series of letters, A, C, G, and T. Those letters represent the nucleotides – adenine, cytosine, guanine, and thymine – that are the basic structural units of DNA. 

So to convert digital data to DNA, Dr. Erlich's team and others have essentially translated 0s and 1s into As, Cs, Gs, and Ts. Then, the resulting DNA sequence is sent to a company that prints synthetic DNA, in this case San Francisco-based Twist Bioscience. What they receive back is a vial about half the size of a thumb that looks like it just has a little liquid in it. But there's actually DNA in there.

To access the data stored in it, the team sequences the DNA and translates it back into 0s and 1s. In this case, the researchers encoded and then retrieved a full computer operating system, an 1895 French film, "Arrival of a train at La Ciotat," a $50 Amazon gift card, a computer virus, a Pioneer plaque, and a 1948 study by information theorist Claude Shannon. As one of the tests of the data, Erlich used the computer operating system to play the game Minesweeper.

The genetic material is not extracted from any animal or plant. "DNA is just a hardware here," Erlich writes in a follow-up email to the Monitor. "It is not related to anything that is living and is not even derived from anything that was alive before. The synthesis, copying, and sequencing process are purely chemical."

A 'fountain' of information

Turning digital data into DNA may seem as simple as coming up with a code for 0s and 1s, and As, Cs, Gs, and Ts. But it's a bit more complicated than that.

First of all, Erlich says, not all DNA sequences are robust. For example, a string of all the same nucleotides, say, AAAAAAAAAAAA, is particularly fragile and difficult to read correctly. But the same isn't true for computer code. 

In addition, not all DNA molecules will survive the sequencing and retrieval process. And the scientists can't risk losing key pieces of the code. 

To resolve these problems, Erlich used what is known in computing as a fountain code to act as sort of gatekeeper that provided clues to the code rather than the code itself. Because DNA Fountain, as he calls the algorithm, can provide an unlimited amount of clues, if a few get lost in the process they will still be able to decode the DNA sequence in the end.

In addition to this method to make the translation more robust, Erlich wanted to see if the data-filled DNA could be replicated without error.

The process of sequencing the DNA includes removing some molecules from the sample. So to preserve the data and be able to access it, scientists have to be able to make copies, Erlich explains. So he made 25 copies, and copies of the copies, and copies of the copies of the copies, and so on nine times. And even in the most copied copies, he says, "we were able to perfectly retrieve this information. It's very robust."

Are we entering the age of DNA-computers?

Despite these strides to move digital data from hard drives to DNA and back, don't expect your next computer or smartphone to contain DNA.

"This is still the early stages of DNA storage. It's basic science," Erlich says. "It's not that tomorrow you're going to go to Best Buy and get your DNA hard drive. And we don't envision that this will be in some hard drive that people will buy."

"I think the more immediate use is for archiving," Church says. The method lends itself to archiving vast amounts of data that doesn't need to be accessed regularly, like video surveillance, for example, he says.

Besides density, one reason DNA data storage would be advantageous over, say, a massive warehouse full of hard drives, Erlich says, is that it doesn't need to be kept cool. Furthermore, DNA doesn't degrade like other data storage tools. Paleoanthropologists have sequenced DNA from Neanderthals and other ancient humans, so Erlich isn't concerned about the longevity of this sort of data storage.

The Microsoft researchers see the applications of DNA data storage more broadly. "Any organization or individual who needs long-term archival storage of large amounts of data would benefit from a DNA storage option," write Karin Strauss of Microsoft and Luis Ceze of the University of Washington in an email to the Monitor.

"For example, hospitals need to store clinical information for all their patients for a long time, research institutions have massive amounts of data from research projects that need to be preserved, and the emerging virtual reality industry needs high-capacity storage solutions for very large video files. In addition, consumers could benefit from DNA storage via the cloud, especially following the advent of highly portable video cameras and the demand to store personal video online."

Currently, the cost and time required for this process is somewhat prohibitive for consumer applications. It cost $7,000 to synthesize the DNA Erlich developed and another $2,000 to read it. The synthesis process took two weeks and the sequencing took about a day. 

That's not to say that DNA data storage won't touch consumers' everyday life. Church's team has worked with Technicolor to use the new data storage method to preserve the company's many old films.

During a media tour in 2016, Jean Bolot, vice-president for research and innovation at Technicolor, showed off a vial containing a million copies of the 1902 French silent film "A Trip to the Moon."

He said, "This, we believe, is what the future of movie archiving will look like."

[Editor's noteAn earlier version of this article erroneously suggested that the Columbia University researchers broke the Microsoft/University of Washington 200-megabyte milestone. An earlier version of the headline of this mistakenly conflated molecular biology with microbiology.]