Microsoft, university researchers break DNA data storage record
- 09 July, 2016 06:07
Researchers said the impressive part about reaching the 200MB milestone is not just how much data they were able to encode onto synthetic DNA and then decode, it's also the space they were able to store it in.
Once encoded, the data occupied a spot in a test tube "much smaller than the tip of a pencil," Douglas Carmean, the partner architect at Microsoft overseeing the project, said.
The DNA storage also has a half-life of 500 years, even in harsh conditions. The half-life of DNA -- just as with radioactive material -- determines its rate of decay or the length of time it takes half of its strand bonds to break.
Overall, though, this is a huge step forward. "Think of the amount of data in a big data center compressed into a few sugar cubes. Or all the publicly accessible data on the Internet slipped into a shoebox. That is the promise of DNA storage -- once scientists are able to scale the technology and overcome a series of technical hurdles," Microsoft stated in a blog.
The data stored on the molecular DNA included digital versions of works of art, including a high-definition music video by the band OK Go!, the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Guttenberg and the nonprofit Crop Trust's seed database on DNA strands.
DNA is needed as a storage medium because the world's data is growing exponentially and molecular-level storage is vastly more dense than hard drives, solid state drives (SSDs) or even up-and-coming technologies such as phase-change memory.
"Those systems also degrade after a few years or decades, while DNA can reliably preserve information for centuries," the University of Washington (UW) researchers stated in a news release. "DNA is best suited for archival applications, rather than instances where files need to be accessed immediately."
The UW and Microsoft researchers are one of two teams nationwide that have also demonstrated the ability to perform random access of data from a pool of molecules, which they described as a task similar to reassembling one chapter of a story from a library of torn books.
The researchers said they developed "a novel approach" to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences -- adenine, guanine, cytosine and thymine -- represented as As, Gs, Cs and Ts.
The digital data is broken down into pieces and stored by synthesizing it as a massive number of tiny DNA molecules, which can be dehydrated and preserved for long-term storage.
While advances in DNA storage rely on techniques pioneered by the biotechnology industry, it also requires lessons learned from information technology. For example, the Microsoft and UW team's encoding approach uses error correction schemes commonly used in computer memory.
"This is an example where we're borrowing something from nature -- DNA -- to store information. But we're using something we know from computers -- how to correct memory errors -- and applying that back to nature," said Luis Henrique Ceze, a UW associate professor of computer science and engineering and the university's principal researcher on the project.
To access the stored data, the researchers encode the equivalent of zip codes and street addresses into the DNA sequences. Polymerase Chain Reaction (PCR) techniques -- commonly used in molecular biology -- help them more easily identify the zip codes they are looking for.
Using DNA sequencing techniques, the researchers can then read the data and convert it back to a video, image or document file by using the street addresses to reorder the data.
Most of the world's data today is stored on magnetic and optical media. Tape technology has recently seen significant density improvements with tape cartridges as large as 185TB, and is the densest form of storage available commercially today, at about 10GB per millimeter (mm). Recent research reported feasibility of optical discs capable of storing 1PB, yielding a density of about 100GB/mm. Despite this improvement, storing zettabytes of data would still take millions of units, and use significant physical space.
DNA has a theoretical limit above one exabyte per millimeter, which is eight orders of magnitude denser than tape. DNA-based storage also has the benefit of eternal relevance: As long as there is DNA-based life, there will be strong reasons to read and manipulate DNA, the researchers stated in an April research paper.
According to the ongoing "Digital Universe" study by IDC and EMC, the amount of data is forecast to grow to over 16 zettabytes (ZB) in 2017. The Internet of Things, in large part, will be responsible for doubling digital data every two years, resulting in 44 trillion gigabytes (44ZB) by 2020.
"A significant fraction of this data is in archival form; for example, Facebook recently built an entire data center dedicated to 1 exabyte of cold storage," the scientists stated in their research paper.
Researchers have been experimenting with DNA as a data-storage medium for more than a dozen years, but it has progressed quickly. In 1999, DNA-based storage involved encoding and recovering just a 23-character message.
By 2013, scientists from U.K.-based EMBL-European Bioinformatics Institute claimed they'd encoded an MP3 version of Martin Luther King's "I Have a Dream" speech in DNA.
In April, Microsoft and UW researchers released their paper detailing how synthetic DNA could be used as a form of archival storage.
"DNA is an amazing information storage molecule that encodes data about how a living system works. We're repurposing that capacity to store digital data -- pictures, videos, documents," Ceze said. "This is one important example of the potential of borrowing from nature to build better computer systems."