ENTRIES TAGGED "genomics"

DNA: The perfect backup medium

DNA storage could change the way we store and archive information.

It wasn’t enough for Dr. George Church to help Gilbert “discover” DNA sequencing 30 years ago, create the foundations for genomics, create the Personal Genome Project, drive down the cost of sequencing,  and start humanity down the road of synthetic biology. No, that wasn’t enough.

He and his team decided to publish an easily understood scientific paper (““Next-generation Information Storage in DNA“) that promises to change the way we store and archive information. While this technology may take years to perfect, it provides a roadmap toward an energy efficient, archival storage medium with a host of built-in advantages.

The paper demonstrates the feasibility of using DNA as a storage medium with a theoretical capacity of 455 exabytes per gram. (An exabyte is 1 million terabytes.) Now before you throw away your massive RAID 5 cluster and purchase a series of sequencing machines, know that DNA storage appears to be very high latency. Also know that Church, Yuan Gao, and Sriram Kosuri are not yet writing 455 exabytes of data, they’ve started with a more modest goal of writing Church’s recent book on genomics to a 5.29 MB “bitstream,” here’s an excerpt from the paper:

We converted an html-coded draft of a book that included 53,426 words, 11 JPG images and 1 JavaScript program into a 5.27 megabit bitstream. We then encoded these bits onto 54,898 159nt oligonucleotides (oligos) each encoding a 96-bit data block (96nt), a 19-bit address specifying the location of the data block in the bit stream (19nt), and flanking 22nt common sequences for amplification and sequencing. The oligo library was synthesized by ink-jet printed, high-fidelity DNA microchips. To read the encoded book, we amplified the library by limited-cycle PCR and then sequenced on a single lane of an Illumina HiSeq.

If you know anything about filesystems, this is an amazing paragraph. They’ve essentially defined a new standard for filesystem inodes on DNA. Each 96-bit block has a 19-bit descriptor. They then read this DNA bitstream by using something called Polymerase Chain Reaction (PCR). This is important because it means that reading this information involves generating millions of copies of the data in a format that has been proven to be durable. This biological “backup system” has replication capabilities “built-in.” Not just that, but this replication process has had billions of years of reliability data available.

Read more…

Comment |
Sequencing a Genome a Week

Sequencing a Genome a Week

Radar Talks to OSCON Speaker David Dooling

The Human Genome Project took X years to fully sequence a single human's genetic information. At Washington University's Genome Center, they can now do one in a week. But when you're generating that much data, just keeping track of it can become a major challenge in itself. David Dooling is in charge of managing the massive output of the Center's herd of gene sequencing machines, and making it available to researchers inside the Center and around the world. He'll be speaking at OSCON, O'Reilly's Open Source Conference, on how he uses open source tools to keep things under control, and he agreed to give us an overview of how the field of genomics is evolving.

Comments: 4 |