Illustration by Zihao Lin.
A petabyte is the equivalent of 1015 bytes of digital information, one billion times larger than the standard megabyte that composes common digital files like an audio clip or high-resolution picture. Approximately 10 billion users’ photos that comprise Facebook’s storage warehouses amount to 1.8 petabytes of data. The entire Netflix library of raw video content totals around 3.1 petabytes.
Imagine being able to carry petabytes worth of information on a device that fits in your pocket. Though it may sound far-fetched, compact portability of massive amounts of data is closer to becoming a reality than ever before, thanks to recent work from the bioengineering lab of Robert Grass at ETH Zurich. The researchers’ breakthrough paper, published last December, demonstrates the first examples of using DNA as a medium for information storage within inanimate objects.
The rapidly expanding field of DNA data storage, dubbed “DNA-of-things” (DoT), has important implications for industry, specifically digital storage. As social reliance on digital systems mounts rapidly, current storage architectures such as hard drives are becoming constrained by the physical limitations of their shapes and sizes. DNA offers transformative advantages as a storage medium with its capacity to store data at unparalleled densities without degrading, as well as its ability to adopt any conceivable shape through DoT engineering. If properly protected, DNA can withstand both environmental and artificial stresses, allowing information to potentially be “cold-stored” for thousands of years. On top of this remarkable level of structural integrity, DNA exhibits a superior storage density of up to 455 exabytes per gram, where one exabyte equals one thousand petabytes. This value outstrips the currently most advanced hard drive storage method by 215 orders of magnitude, and renders DoT an innovation that could define the next decade of information technology.
Sequencing the genetic code of an object
Interest in DoT’s largely untapped potential led Grass and his collaborator Yaniv Erlich, Professor of Computer Science at Columbia University and Chief Science Officer of the online genealogy platform MyHeritage, to test DNA’s stability as a vehicle for information. “Erlich’s idea was to put not just a trackable barcode but ‘real information’ into materials—ideally something connected to the object itself,” Grass said. They settled on 3D-printing a Stanford bunny—a common graphical quality test object—embedded with its own blueprint.
First, the 45-kilobyte file detailing the bunny’s creation was converted from binary digital format to a DNA sequence. To protect it against the mechanical and thermal stresses of being incorporated into the polymer material that serves as the “ink” of the 3D-printer, the DNA was encapsulated into silica nanoparticles prior to mixing. The encoded polymer was extruded into a filament compatible with ordinary desktop 3D-printers. Once the bunny was printed, the researchers snipped off ten milligrams of its ear—an amount equal to 0.3 percent of total bodily mass—in order to recover DNA and synthesize another bunny. With the polymer dissolved and silica beads cleaned, the encapsulated dataset could be processed and the complete digital file recovered. Using DNA from the same ten milligrams of material, the team generated and recovered perfect digital files from five subsequent generations of bunnies, including one that was stored for nine months prior to re-synthesis.
The final generation of offspring was subject to sequencing errors and the dropout of over twenty percent of the original data, but Grass explains that these types of errors are already overcome by inserting backup data in the initial phase of the DNA storage process. Higher levels of redundancy allow for more generations of error-free replicas. “If you know that every generation loses two percent of data then you add ten percent redundancy at the beginning to allow for five generations. That’s much easier than trying to solve the problem down the road,” Grass said. The key outcomes of the file’s perfect recovery stand undiminished by the errors, he added. The bunnies’ generational synthesis provided support for the theoretical possibility that lost portions of data can be recovered with redundant DNA—a process likened to solving a sudoku puzzle given only limited information. The experiment also required a miniscule amount of sample material to recover the data necessary to engineer subsequent generations. A sample of such a small scale, essential to recreating offspring products without impairing their appearance or functionality, is only possible due to DNA’s enormous storage capacity.
Inspired by their successful pilot, Erlich urged the group to attempt a more ambitious follow-up, embedding more data in a new type of polymer. For the next experiment, the group repeated the sequencing and material infusion processes, this time for a 1.4-megabyte YouTube video in a transparent plexiglass polymer that would be 3D-printed into a lens. When mounted onto a frame, the lens exhibited normal optical properties, allowing an ordinary-looking, functional pair of glasses to secretly contain a DNA-encoded video message. Once again, a mere ten milligrams taken from the frames allowed for full recovery of the video file.
A revolutionary industrial force
Once it is polished, Grass and Erlich envision DoT technology to broadly impact the future of manufacturing on both personalized and industrial scales. This shared perspective arises from each scientist’s industrial background—Erlich’s as CSO of MyHeritage and Grass’ as CEO of Turbobeads, a nanotechnology company he founded ten years ago.
Their group devoted a significant portion of its research to examining factors that will influence DoT’s ability to become a widespread and flexible solution to the challenges of digital storage. This included testing new types of polymers that could serve as viable materials for DNA storage, as well as discussing potential DoT applications and their economic challenges. Personalized items will likely be the first industrial application. Dental implants, for example, could be encoded through DoT with information pertaining to an individual’s unique implant design. The problem, Grass said, would be the high upfront cost of encoding an entire blueprint. For every implant, a new set of DNA would have to be synthesized at around $1,000 per megabyte.
A more feasible general application could arise in products with long lifetimes, such as construction materials or other items which would retain their own instructions for replication when traditional data storage methods may have been lost. According to Grass, however, the most notable impact will be in the mass-manufacturing industry, in which the enormous initial cost of synthesizing the DNA for the first item would be offset by the ability to regenerate identical replicas with little material and no added expense. The ten milligrams snipped from each of the group’s test bunnies theoretically yielded enough data to create eighty quintillion (1018) offspring within five generations—without resynthesizing the initial DNA even once.
Though the researchers’ work suggests DNA could potentially store files of much greater size, their focus is to think in terms of an everyday budget rather than speculating about the dizzying cost of encoding a vast library of information. “At the moment, one hundred kilobytes costs perhaps ten or twenty dollars,” Grass said. “What can I do with that? Right now, our products contain zero or very little information. What if I could put just ten kilobytes of information into that item instead? That’s the first step.”
These preliminary steps have already turned heads in industrial and governmental sectors. Some companies currently offer megabyte-scale personal DNA sequencers, and the United States Intelligence Advanced Research Projects Activity recently committed $50 million toward DNA data storage research.
“We dream of a world where everyone can do DNA sequencing,” Grass said. “I see it as our job to give examples, even if they’re still academic, which demonstrate where we can get value out of this technology. We have a proof of concept that shows that it’s doable. Now the next step is to show that we can build a case where we can generate value.” The future, it seems, is hidden in plain sight after all.