• You are here:
  • American Chemical Society
  • Discover Chemistry

Recent advances in DNA computing

For immediate release, acs news service weekly presspac: november 17, 2021.

DNA molecules encode the instructions for life itself. They make up the genes responsible for everything from hair color to disease risk. And as if that weren’t enough, DNA can also perform calculations and compute! The molecules are fully programmable and can perform calculations very quickly in parallel, making them ideal for complex and time-consuming operations. Below are some recent papers published in ACS journals that report on innovations in DNA computing. 

“Advances in Applications of Molecular Logic Gates” ACS Omega Nov. 6, 2021 This review discusses recent advances in molecular logic gates, including those that incorporate DNA. The researchers describe how the gates are being used to monitor water quality, test food safety, and diagnose and treat diseases.  

“CRISPR-Powered DNA Computing and Digital Display” ACS Synthetic Biology Oct. 27, 2021 In this paper, the researchers developed a microfluidic chip with CRISPR reactions freeze-dried onto it that can perform and display the results of several different types of mathematical calculations, such as obtaining the square root of a number. They say the chip could someday be used to encrypt and conceal messages.

“Programmable DNAzyme Computing for Specific  In Vivo  Imaging: Intracellular Stimulus-Unlocked Target Sensing and Signal Amplification” Analytical Chemistry Aug. 27, 2021 Some biomarkers of cancer are present in both healthy and tumor tissues, just at slightly different levels. To distinguish these types of cells in living mice, the authors of this paper used programmable DNAzyme computing, which also could image the tumors, making them visible with fluorescent light.

“DNA Computing: NOT Logic Gates See the Light” ACS Synthetic Biology June 18, 2021 Researchers have used DNA to perform Boolean “AND” and “OR” functions, but it’s been difficult to construct “NOT” gates, in which the absence of an input is converted into an output. And premature execution of a “NOT” function can produce erroneous results. So, for greater control, this group created a novel photoactivatable “NOT” gate that responds to microRNA sequences.

The American Chemical Society (ACS) is a nonprofit organization chartered by the U.S. Congress. ACS’ mission is to advance the broader chemistry enterprise and its practitioners for the benefit of Earth and all its people. The Society is a global leader in promoting excellence in science education and providing access to chemistry-related information and research through its multiple research solutions, peer-reviewed journals, scientific conferences, eBooks and weekly news periodical Chemical & Engineering News . ACS journals are among the most cited, most trusted and most read within the scientific literature; however, ACS itself does not conduct chemical research. As a leader in scientific information solutions, its CAS division partners with global innovators to accelerate breakthroughs by curating, connecting and analyzing the world’s scientific knowledge. ACS’ main offices are in Washington, D.C., and Columbus, Ohio.

To automatically receive press releases from the American Chemical Society, contact newsroom@acs.org .

Note: ACS does not conduct research, but publishes and publicizes peer-reviewed scientific studies.

Media Contact

ACS Newsroom newsroom@acs.org

Text reading ACS Publications Most Trusted. Most Cited. Most Read

Discover Chemistry  —Menu

  • News Releases
  • ACS in the News

Accept & Close The ACS takes your privacy seriously as it relates to cookies. We use cookies to remember users, better understand ways to serve them, improve our value proposition, and optimize their experience. Learn more about managing your cookies at Cookies Policy .

1155 Sixteenth Street, NW, Washington, DC 20036, USA |  service@acs.org  | 1-800-333-9511 (US and Canada) | 614-447-3776 (outside North America)

  • Terms of Use
  • Accessibility

Copyright © 2024 American Chemical Society

[The current status and future prospects of DNA computing]

Affiliations.

  • 1 Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing 100071, China.
  • 2 State Key Laboratory of Pathogen and Biosecurity, Beijing 100071, China.
  • PMID: 33973429
  • DOI: 10.13345/j.cjb.200408

Abstract in English, Chinese

As the demand for high-performance computing continues to grow, traditional computing models are facing unprecedented challenges. Among the many emerging computing technologies, DNA computing has attracted much attention due to its low energy consumption and parallelism. The DNA circuit, which is the basis for DNA computing, is an important technology for the regulation and processing of the molecular information. This review highlights the basic principles of DNA computing, summarizes the latest research progress, and concludes with a discussion of the challenges of DNA computing. Such integrated molecular computing systems are expected to be widely used in the fields of aerospace, information security and defense system.

随着高性能计算需求的不断增长,传统计算模式面临着前所未有的巨大挑战。在众多新兴计算技术中,DNA计算系统以其低能耗、并行化等特点而广受关注。DNA电路 (DNA circuit) 是实现DNA计算的基础,也是该领域重要的分子信息调控和处理技术。文中重点介绍了DNA计算的基本原理,并总结了最新的研究进展,最后讨论了基于DNA计算所面临的挑战。此类集成的分子计算系统有望广泛应用于航空航天、信息安全及国防建设等领域。.

Keywords: DNA chip; DNA circuit; DNA computing; DNA strand replacement technology; gene circuit.

Publication types

  • DNA* / genetics

Book cover

Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences pp 243–252 Cite as

Review of Research Challenges and Future of in DNA Computing Applications

  • Sapna Jain 9 &
  • M. Afshar Alam 9  
  • Conference paper
  • First Online: 01 January 2022

574 Accesses

Part of the book series: Algorithms for Intelligent Systems ((AIS))

DNA computing is a computational development specialty that uses deoxyribonucleic acids to store data and perform complex counts. The use of DNA techniques lies in the way the processor may use DNA particles. For computational evaluation, DNA figuring uses a distinctive methodology. DNA processing analysis and progress are monitors that expect singular particles that fit as a data transporter to measure DNA particles. The use of DNA is for nuclear advancement, which includes information coding. Nuclear scale self-controlling programmable PCs are demonstrated, empowering every information and yield data to be in sub-nuclear structure. This paper presents an examination of the first ongoing advances in desoxyribonucleic corrosive critical thinking, which is a ton of, presents achievements and troubles for experts inside the not hence far-off future. This paper discusses the research challenges and future areas of DNA computing.

  • DNA computing
  • Cryptography
  • Cellular computing
  • Cloud computing

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Sori, A.A.: DNA computer; present and future. J. Eng. Res. Appl. 4 (6), 228–232 (2014) [Online]. Available at: www.ijera.com

Lipton, R.J.: DNA solution of hard computational problems. Science (80-) 268 (5210), 542–545 (1995). https://doi.org/10.1126/science.7725098

Hameed, K.: DNA computation based approach for enhanced computing power. Int. J. Emerg. Sci. 1 (1), 31–37 (2011)

Google Scholar  

Am. J. Sociol. 53 (9) (2019)

Li, D., Huang, H., Li, X., Li, X.: Hairpin formation in DNA computation presents limits for large NP-complete problems. BioSystems 72 (3), 203–207 (2003). https://doi.org/10.1016/S0303-2647(03)00145-X

Article   Google Scholar  

Lin, C., Ke, Y., Liu, Y., Mertig, M., Gu, J., Yan, H.: Functional DNA nanotube arrays: bottom-up meets top-down. Angew. Chem. 119 (32), 6201–6204 (2007). https://doi.org/10.1002/ange.200701767

Chhabra, R., Sharma, J., Liu, Y., Rinker, S., Yan, H.: DNA self-assembly for nanomedicine. Adv. Drug Deliv. Rev. 62 (6), 617–625 (2010). https://doi.org/10.1016/j.addr.2010.03.005

Lancashire, L.J., Lemetre, C., Ball, G.R.: An introduction to artificial neural networks in bioinformatics—application to complex microarray and mass spectrometry datasets in cancer studies. Brief. Bioinform. 10 (3), 315–329 (2009). https://doi.org/10.1093/bib/bbp012

Muraru, M., Popovici, M.-D.: DNA Computing—Modelling and Simulating a Molecular Turing Machine. U.P.B. Sci. Bull. Ser. C 71 (4) (2009)

Khanal, O., et al.: DNA retention on depth filters. J. Memb. Sci. 570–571 , 464–471 (2019). https://doi.org/10.1016/j.memsci.2018.10.058

Endo, K., Hayashi, K., Inoue, T., et al.: A versatile cis-acting inverter module for synthetic translational switches. Nat. Commun. 4 (1), 2393 (2013). Kabir, A.M.R., et al.: Sci. Technol. Adv. Mater. 21, 331 (2020)

Adleman, L.M.: Adleman1994. Science (80-) 266 , 1021–1024 (1994)

Leier, A., Richter, C., Banzhaf, W., Rauhe, H.: Cryptography with DNA binary strands. BioSystems 57 (1), 13–22 (2000). https://doi.org/10.1016/S0303-2647(00)00083-6

Namasudra, S., Devi, D., Kadry, S., Sundarasekar, R., Shanthini, A.: Towards DNA based data security in the cloud computing environment. Comput. Commun. 151 , 539–547 (2020). https://doi.org/10.1016/j.comcom.2019.12.041

Wang, Z., Chen, Y., Li, Y.: A brief review of computational gene prediction methods. Genomics Proteomics Bioinform. 2 (4), 216–221 (2004). https://doi.org/10.1016/S1672-0229(04)02028-5

Shen, H., Wang, Y., Wang, J., Li, Z., Yuan, Q.: Emerging biomimetic applications of DNA nanotechnology. ACS Appl. Mater. Interfaces 11 (15), 13859–13873 (2019). https://doi.org/10.1021/acsami.8b06175

https://theconversation.com/organic-computers-made-of-dna-could-process-data-inside-our-bodies-46364

Rashedul Kabir, A.M., Inoue, D., Kakugo, A.: Molecular swarm robots: recent progress and future challenges. Sci. Technol. Adv. Mater. 21 (1), 323–332 (2020). https://doi.org/10.1080/14686996.2020.1761761

Hagiya, M., Konagaya, A., Kobayashi, S., et al.: Molecular robots with sensors and intelligence. Acc. Chem. Res. 47 (6), 1681–1690 (2014)

Tanaka, F., Mochizuki, T., Liang, X., et al.: Robust and photocontrollable DNA capsules using azobenzenes. Nano Lett. 10 (9), 3560–3565 (2010)

Yang, Y., Endo, M., Hidaka, K., et al.: Photo-controllable DNA origami nanostructures assembling into predesigned multiorientational patterns. J. Am. Chem. Soc. 134 (51), 20645–20653 (2012)

Suzuki, Y., Endo, M., Yang, Y., et al.: Dynamic assembly/disassembly processes of photoresponsive DNA origami nanostructures directly visualized on a lipid membrane surface. J. Am. Chem. Soc. 136 (5), 1714–1717 (2014)

Endo, M., Miyazaki, R., Emura, T., et al.: Transcription regulation system mediated by mechanical operation of a DNA nanostructure. J. Am. Chem. Soc. 134 (6), 2852–2855 (2012)

Saito, H., Kobayashi, T., Hara, T., et al.: Synthetic translational regulation by an L7Ae-kink-turn RNP switch. Nat. Chem. Biol. 6 (1), 71–78 (2010)

Saito, H., Fujita, Y., Kashida, S., et al.: Synthetic human cell fate regulation by protein-driven RNA switches. Nat. Commun. 2 (1), 160, 1–9 (2011)

Hara, T., Saito, H., Inoue, T.: Directed evolution of a synthetic RNA–protein module to create a new translational switch. Chem. Commun. 49 (37), 3833–3835 (2013)

Ohno, H., Kobayashi, T., Kabata, R., et al.: Synthetic RNA–protein complex shaped like an equilateral triangle. Nat. Nanotechnol. 6 (2), 116–120 (2011)

Ohno, H., Osada, E., Inoue, T., et al.: Synthetic RNAprotein nanostructures and their potential applications. In: Guo, P., Haque, F. (eds.) RNA Nanotechnology and Therapeutics, pp. 303–312. CRC Press, Boca Raton, FL (2013)

Amrutha, A.S., Sunil Kumar, K.R., Tamaoki, N.: Azobenzene-based photoswitches facilitating reversible regulation of kinesin and myosin motor systems for nanotechnological applications. ChemPhotoChem 3 (6), 337–346 (2019)

Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332 (6034), 1196–1201 (2011)

Yoshimura, Y., Fujimoto, K.: Ultrafast reversible photo-crosslinking reaction: toward in situ DNA manipulation. Org. Lett. 10 (15), 3227–3230 (2008)

Jacob, G., Murugan, A.: DNA based cryptography: an overview and analysis. Int. J. Emerg. Sci. 3 (1), 36–42 (2013) [Online]. Available at: https://www.researchgate.net/publication/269098843_DNA_based_Cryptography_An_Overview_and_Analysis

Howard, J.: Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates Inc., Sunderland, MA (2001)

Sohal, M., Sharma, S.: BDNA-A DNA inspired symmetric key cryptographic technique to secure cloud computing. J. King Saud Univ.—Comput. Inf. Sci. (2018). https://doi.org/10.1016/j.jksuci.2018.09.024

Tanaka, K., Okamoto, A., Saito, I.: Public-key system using DNA as a one-way function for key distribution. BioSystems 81 (1), 25–29 (2005). https://doi.org/10.1016/j.biosystems.2005.01.004

Enayatifar, R., Abdullah, A.H., Isnin, I.F.: Chaos-based image encryption using a hybrid genetic algorithm and a DNA sequence. Opt. Lasers Eng. 56 , 83–93 (2014). https://doi.org/10.1016/j.optlaseng.2013.12.003

Saper, G., Hess, H.: Synthetic systems powered by biological molecular motors. Chem. Rev. 120 (1), 288–309 (2019)

Hess, H., Ross, J.L.: Non-equilibrium assembly of microtubules: from molecules to autonomous chemical robots. Chem. Soc. Rev. 46 (18), 5570–5587 (2017)

Liu, H., Schmidt, J.J., Bachand, G.D., et al.: Control of a biomolecular motor-powered nanodevice with an engineered chemical switch. Nat. Mater. 1 (3), 173–177 (2002)

Yokokawa, R., Takeuchi, S., Kon, T., et al.: Hybrid nanotransport system by biomolecular linear motors. J. Microelectromech. Syst. 13 (4), 612–619 (2004)

Download references

Author information

Authors and affiliations.

Jamia Hamdard, New Delhi, India

Sapna Jain & M. Afshar Alam

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sapna Jain .

Editor information

Editors and affiliations.

National Institute of Technology, Kurukshetra, Kurukshetra, India

Ankit Kumar Jain

Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India

Anupam Yadav

National Institute of Technology, Uttarakhand, Srinagar, India

Nitin Kumar

Campus Centre de Créteil, Université Paris-Est Créteil, Créteil, France

Patrick Siarry

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Jain, S., Afshar Alam, M. (2022). Review of Research Challenges and Future of in DNA Computing Applications. In: Dua, M., Jain, A.K., Yadav, A., Kumar, N., Siarry, P. (eds) Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-5747-4_21

Download citation

DOI : https://doi.org/10.1007/978-981-16-5747-4_21

Published : 01 January 2022

Publisher Name : Springer, Singapore

Print ISBN : 978-981-16-5746-7

Online ISBN : 978-981-16-5747-4

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • ACS AuthorChoice

Logo of acssd

Emerging Approaches to DNA Data Storage: Challenges and Prospects

Andrea doricchi.

† Istituto Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy

‡ Dipartimento di Chimica e Chimica Industriale, Università di Genova, via Dodecaneso 31, 16146 Genova, Italy

Casey M. Platnich

§ Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.

Andreas Gimpel

∥ Institute for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland

Friederikee Horn

⊥ Technical University of Munich, Department of Electrical and Computer Engineering Munchen, Bayern, DE 80333, Germany

German Lanzavecchia

# Dipartimento di Fisica, Università di Genova, via Dodecaneso 33, 16146 Genova, Italy

Aitziber L. Cortajarena

■ Center for Cooperative Research in Biomaterials (CICbiomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramón 194, 20014 Donostia-San Sebastián, Spain

○ Ikerbasque, Basque Foundation for Science, 48009 Bilbao, Spain

Luis M. Liz-Marzán

△ Biomedical Research Networking Center in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Av. Monforte de Lemos, 3-5. Pabellón 11. Planta 0, 28029 Madrid, Spain

▼ Second Physics Institute, University of Stuttgart, 70569 Stuttgart, Germany

⬡ Max Planck Institute for Solid State Research, 70569 Stuttgart, Germany

Reinhard Heckel

Robert n. grass, roman krahne, ulrich f. keyser, denis garoli.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0012.jpg

With the total amount of worldwide data skyrocketing, the global data storage demand is predicted to grow to 1.75 × 10 14 GB by 2025. Traditional storage methods have difficulties keeping pace given that current storage media have a maximum density of 10 3 GB/mm 3 . As such, data production will far exceed the capacity of currently available storage methods. The costs of maintaining and transferring data, as well as the limited lifespans and significant data losses associated with current technologies also demand advanced solutions for information storage. Nature offers a powerful alternative through the storage of information that defines living organisms in unique orders of four bases (A, T, C, G) located in molecules called deoxyribonucleic acid (DNA). DNA molecules as information carriers have many advantages over traditional storage media. Their high storage density, potentially low maintenance cost, ease of synthesis, and chemical modification make them an ideal alternative for information storage. To this end, rapid progress has been made over the past decade by exploiting user-defined DNA materials to encode information. In this review, we discuss the most recent advances of DNA-based data storage with a major focus on the challenges that remain in this promising field, including the current intrinsic low speed in data writing and reading and the high cost per byte stored. Alternatively, data storage relying on DNA nanostructures (as opposed to DNA sequence) as well as on other combinations of nanomaterials and biomolecules are proposed with promising technological and economic advantages. In summarizing the advances that have been made and underlining the challenges that remain, we provide a roadmap for the ongoing research in this rapidly growing field, which will enable the development of technological solutions to the global demand for superior storage methodologies.

1. Introduction

In the present digital era, the quantity of data being produced continues to increase exponentially, with the global demand for data storage expected to grow up to 1.75 × 10 14 GB by 2025 and by a further order of magnitude within the end of this decade. 1 The demand for denser and longer-lived information storage devices is also increasing. 2 Current storage technologies, including optical and magnetic devices, are reaching their information density limits and are thus not suitable for long-term (>50 years) storage, which means that valuable information needs to regularly be transferred to newer storage media if it is to be preserved for future generations. Innovative methods are required for long-term information storage to circumvent this laborious and costly process and to combat other pitfalls associated with current storage media (including energy consumption and insufficient data density). 3

Nature provides an inspiring example of how to encode, transmit, and preserve information by using DNA to store all genetic information in the form of a four nucleotide sequence. As evidenced by DNA’s invaluable role in the perpetuation of genetic information, these molecules are stable for thousands of years under suitable storage conditions; 4 for example, 300 000-year-old mitochondrial DNA from a bear has been successfully sequenced. 5 This DNA sample was preserved in bone, thereby demonstrating that the required power consumption for the archival storage of DNA is very low—another benefit compared with traditional data storage media. In addition to its stability and low cost of storage, DNA presents a major key advantage compared with existing data storage devices: data density. On the basis of its physical dimensions, DNA has a theoretical data density of 6 bits for every 1 nm of polymer, or ∼4.5 × 10 7 GB/g, 6 which is orders of magnitude higher than the densities achievable using traditional devices. 7 , 8

Significant advances have been made in recent years toward using DNA as a digital information storage medium. 9 − 14 Existing strategies to encode arbitrary information into DNA do so by translating the desired data (i.e., a movie, book, or picture) directly into the nucleotide sequence, which means that to write each data string, chemical DNA synthesis is employed. 15 In sequence-based DNA data storage, the major steps comprise: (1) encoding digital information, (2) data writing (synthesis of new oligonucleotides), (3) storing the DNA in physical or biological conditions, (4) random access, (5) data readout via DNA sequencing, and (6) decoding the DNA sequences back into the original digital code, as represented in Figure ​ Figure1 1 . 8 , 12 Over the past decades, substantial advances in biotechnology have significantly bolstered DNA data storage technologies. These include chemical and enzymatic DNA syntheses, 16 , 17 polymerase chain reaction (PCR) for DNA amplification, 18 and DNA sequencing. 19 Although none of these technologies was initially designed with digital data storage in mind, these considerable developments now enable procedures for writing, random accessing, reading, and editing of data encoded in DNA sequences. 10 , 11

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0001.jpg

General strategy for DNA data storage, wherein the data is stored directly in the sequence of the oligonucleotides. The six main steps—encoding, writing, storage, access, reading, and decoding—are depicted.

However, each of the procedures involved in DNA data storage—encoding, writing, storage, random access, reading, and decoding—has significant technical limitations that render DNA data storage, at present, not competitive with magnetic and solid-state storage devices. Because the de novo synthesis of long sequences of DNA remains challenging, 20 these sequences must be broken into smaller fragments (∼200 bases), which requires massive numbers of unique DNA sequences to be made. Data readout also presents several challenges: while in theory analogous to the magnetic readout of a hard disk drive, DNA sequencing must be employed to read out the information stored in individual oligonucleotides. Sequencing often relies on fluorescence outputs, which require expensive fluorophores, optical equipment, and trained personnel, as well as substantial amounts of DNA and long reading times ( Figure ​ Figure2 2 a,b). Nanopore methods may present an appealing alternative, as detailed further in this review. With the use of current technologies, DNA storage is estimated to cost 800 million USD per one terabyte of data (by contrast, tape storage costs approximately 15 USD per terabyte). 12 The high price of writing DNA data using existing methods prohibits its mainstream adoption as an information storage material.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0002.jpg

Comparison of the main differences between sequence-based (A,B) and structure-based DNA data storage (C,D), as has been presented in the literature to date. (A,B) Sequence-based storage relies on the de novo synthesis of DNA strands and the subsequent sequencing of these entities is performed using next-generation methods. Image adapted with permission from ref ( 12 ). Copyright 2019 Springer Nature. (C) By contrast, structure-based methods utilize self-assembly, which means that the information is encoded into their three-dimensional shape. Images adapted with permission: ref ( 21 ), copyright 2016 Springer Nature; ref ( 22 ), under a Creative Commons Attribution 4.0 License (CC BY), copyright 2021 Springer Nature. (D) These shapes can then be read off using single-molecule methods, including fluorescence, atomic force microscopy, and nanopore techniques. Image adapted from ref ( 23 ). Copyright 2019 American Chemical Society.

One potential strategy to circumvent these pitfalls is to rely on the programmable three-dimensional structure of DNA as opposed to its primary sequence ( Figure ​ Figure2 2 c,d). DNA nanotechnology harnesses the specific base-pairing properties of the nitrogenous bases to create arbitrary two- and three-dimensional shapes. 24 It is possible to generate well-defined, custom objects at the nanoscale using these methods. Information can thus be stored in the 3D structures of these assemblies instead of in the sequence, with readout relying on imaging techniques, such as super resolution imaging, 22 or using single-molecule nanopore measurements. 23 , 25 The structure-based strategy may reduce the number of DNA sequences that must be synthesized by allowing for the erasing and rewriting of data through simple self-assembly. These structure-based methods also eliminate the need for next-generation sequencing, which remains among the most time-consuming aspects of DNA data storage. Because DNA nanotechnology-based approaches capitalize on the self-assembly of DNA sequences, the resulting structures are inherently reconfigurable, which enables data erasing and rewriting without further synthesis. 26 Moreover, the dynamic nature of these assemblies can be exploited to perform data operations, 13 , 27 which allows DNA data storage to integrate directly into the field of DNA computation.

In this review, we provide a detailed description of the two aforementioned methods, which we will refer to as “sequence-based” and “structure-based” DNA data storage. A comparison between them that highlights both the similarities and differences in these approaches will provide an overview of the state of the art in DNA data storage. Finally, we also highlight the exciting potential applications of DNA data storage and manipulation, including archival storage, barcoding, cryptography, 11 and DNA computing. Despite the hurdles that must be surmounted to implement DNA data storage, it is important to remember that DNA plays an irreplaceable role in biological systems. As such, DNA will never become obsolete as a data storage medium. We posit that the fundamental nature of DNA, in combination with the high density and low energy cost of DNA data storage, will continue to fuel research in this rapidly growing domain.

2. Sequence-Based DNA Data Storage Methods

2.1. from encoding to data writing in dna data storage.

Any digital data (files of any kind such as text and pictures) can be represented as a sequence of bits (i.e., zeros and ones). One possible data storage approach is to use a set of DNA sequences of 60–200 nt in length. The limitations in sequence length arise from the chemical synthesis of DNA; producing DNA strands longer than a few hundred nucleotides (nt) introduces a significant number of errors into the sequence.

Once properly encoded, data are written on synthetic DNA sequences ( Figure ​ Figure3 3 ). Organic chemistry has presented us with a large set of techniques for synthesizing DNA and, as previously mentioned, strands up to 200 nt in length can be readily synthesized. The synthesis is typically performed using phosphoramidite chemistry, which is a four-step cyclic reaction involving the addition of the desired nucleotide to a growing oligonucleotide chain immobilized on a solid support ( Figure ​ Figure3 3 A,B). 28 The use of a solid support enables extensive parallel synthesis, as well as automation of the chemical process, which will be fundamental to the adoption of DNA for data storage applications. 29 , 30 While there are many advantages to phosphoramidite synthesis, it is worth noting that it requires the use of anhydrous solvents, which produce toxic waste.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0003.jpg

An overview of chemical and enzymatic strategies to synthesize custom DNA sequences. (A) Phosphoramidite synthesis—the most widely used chemical strategy for the synthesis of DNA—involves the sequential addition of nucleotides to a growing chain anchored on a solid support. Protecting groups are employed to ensure that no more than one nucleotide is added at each step and are then subsequently removed via chemical deblocking. (B) Deblocking can also be performed by electrochemistry. Reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY-NC) from ref ( 31 ). Copyright 2021 AAAS. (C) Enzymatic methods relying on T4rnl ligase or TdT can also be used to specifically add bases to a growing oligonucleotide in aqueous environments, which eliminates the need for organic solvents. Image reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 32 ). Copyright 2021 Elsevier B.V.

An alternative to chemical synthesis is enzyme-based methods, but they are still in their infancy. So far, only tiny amounts of data (hundreds of bits) have been stored using enzymatic synthesis versus data consisting of billions of bits using phosphoramidite synthesis. The concept of enzymatic DNA synthesis arose from the discovery of specific DNA polymerases, and this approach is expected to become both cheaper and faster than phosphoramidite synthesis for data storage applications. 40 A major limitation, however, is DNA polymerase’s need for a template strand. To create a user-defined DNA sequence as in the chemical method, enzymes capable of extending the 3′ end of the ssDNA in a template-independent manner, such as polynucleotide phosphorylase (PNPase), T4 RNA ligase, and terminal deoxynucleotidyl transferase (TdT, Figure ​ Figure3 3 D), are required. 32 In particular, the use of TdT, a template-independent polymerase, to synthesize DNA oligonucleotides was shown to be a promising alternative to chemical synthesis. 33 , 16 Among others, Lee et al. 16 reported on a technique for enzymatic synthesis and digital coding that was based on template-independent polymerase TdT and nanopore reading. This strategy allowed the archiving of information in DNA without mandatory single-base precision, as well as cost reduction due to miniaturization and enzyme recycling. Moreover, the synthesis of 1000-nucleotide-long strands with homopolymeric stretches enabled a reduction of the synthesis time ( Figure ​ Figure3 3 D). Palluk et al. 28 , 33 also described an oligonucleotide synthesis strategy that uses TdT and demonstrated that TdT–dNTP conjugates can quantitatively extend a primer by a single nt in 10–20 s. Crucially, this scheme can be iterated to write a user-defined sequence. Compared with chemical synthesis, which is undertaken in organic solvents, the enzymatic synthesis is compatible with aqueous conditions.

Both chemical and enzymatic syntheses are severely limited by the low speed of these processes. 29 , 39 Achievement of the necessary parallel writing capabilities while maintaining a realistic infrastructure footprint requires maximization of the number of different sequences that can be synthesized per unit area, simultaneously, on a single platform. The most space-efficient way to increase synthesis density is to reduce the area over which each unique sequence is grown (the feature size), the distance between features (the pitch), or both. To this end, photomask arrays have proven to generate high oligonucleotide densities; 34 however, this technique relies on a series of bespoke photolithographic masks to synthesize a defined set of sequences, that is, masks must be created for each set of desired sequences. An alternative method uses electrode arrays and leverages the scaling and production roadmap of the semiconductor industry, where features as small as 5 nm are now common. For example, Nguyen et al. 35 produced an electrode array and demonstrated independent electrode-specific control of DNA synthesis with electrode sizes and pitches that enabled a synthesis density of 25 million oligonucleotides/cm 2 ( Figure ​ Figure3 3 C). Finally, the printing synthesis method has rapidly become the most applied method (also thanks to commercial technological platforms, such as Agilent and Twist).

The sequences to be synthesized are defined by the encoding process, which maps the data to a set of DNA sequences so that a corresponding decoder can reconstruct the information, even though the writing, reading, and storage of the DNA introduces errors. 9 , 14 , 36 , 7 , 37 , 29 , 38 , 16 , 39 − 41

DNA storage systems overcome these errors without losing data by capitalizing on both physical and logical redundancy. Physical redundancy is achieved by creating many, sometimes inaccurate, copies of each sequence, which enables a consensus to be reached when the data is read. Some errors cannot be resolved using physical redundancy alone. Logical redundancy guarantees reconstruction even when errors occur. While physical redundancy occurs automatically during the synthesis process—many copies of each sequence are always produced—it is fundamental to apply dedicated algorithms to include logical redundancies in the initial encoding. Moreover, encoding and decoding are strictly connected. The algorithms that encode the data to be stored add redundancy in a principled way so that a decoding algorithm can reconstruct the data from noisy reads ( Figure ​ Figure3 3 A).

During the 2010s, extensive innovations in algorithm development have enabled reliable storage of data even under significant errors. Grass et al. 4 used modern error-correcting codes in the context of DNA storage, and a variety of different schemes have been proposed. 4 , 42 − 50 , 36 , 14 While physical and logical redundancy lower the storage density of DNA, recent works have proposed to raise it by expanding the DNA alphabet using composite natural letters 7 , 51 , 52 or chemically modified nucleotides. 53

2.2. Storage and Degradation Issues

Despite DNA’s long-term stability in well-controlled environments such as ancient bone, with storage durations as long as several hundred thousand years, 54 , 55 both aqueous solutions and dried DNA only exhibit a half-life on the order of months to a few years under ambient conditions. 56 Therefore, considerations for the physical storage of data-encoding DNA are crucial for realizing its potential for long-term data storage. Without appropriate protection, DNA (and thus the data encoded within) is degraded by multiple mechanisms, including strand breaks, nucleotide mutations, strand cross-linking by UV, oxidation, hydrolysis, alkylation, or mechanical stress, all of which are due to environmental factors. Among those, hydrolysis is the dominating decay pathway in a data storage context. 57 , 58 Thus, all applicable DNA storage approaches focus on protecting the DNA from moisture and oxygen with either microscopic (i.e., on the level of individual molecules) or macroscopic (i.e., on the level of individual pools) containers. Examples of microscopic containers include encapsulation within silica particles; 56 , 59 − 61 embedding in alkaline salt, 62 polymer, 63 sugar, 64 or silk protein 65 matrices; and coprecipitation with calcium phosphates 66 imitating bone. In the latter category, dried or lyophilized DNA is stored on filter paper 64 within hermetically sealed capsules with inert atmosphere 57 , 58 , 67 , 68 or, as is common in biological practice, simply frozen in aqueous solutions and stored at −20 or −80 °C. 69

Generally, all storage approaches trade long-term stability with a decrease in storage density by 1–3 orders of magnitude, caused by the low loading ratio between DNA and carrier (see Table 1 ). Additionally, the required time and cost for protection can be a distinguishing factor for DNA data storage systems, albeit less so for long-term storage applications. 69 The size of a single DNA pool is an important consideration for the design of DNA storage media, as index sizes for random access, constraints of PCR, and required physical redundancy for retrieval imply an upper limit on the number of pooled oligos. 69 , 6 This represents the maximum data that can be stored within a single macroscopic storage container, and has been estimated to lie between a few TB up to a few hundred TB. 56 , 6 , 70 We compared the storage densities and half-lives of micro- and macroscopic storage approaches in Table 1 by using the largest model pool size for which random access has been demonstrated at 5.5 TB. 71

Current approaches towards data encoding in DNA, such as the use of altered DNA topology 23 , 72 and third-generation sequencing platforms, present new challenges to data storage, as those approaches rely on oligos with multiple hundreds to thousands of nucleotides in length, compared with the few hundreds of nucleotides commonly in use for next-generation sequencing (NGS). 70 While both micro- and macroscopic storage systems are independent of sequence length, DNA decay by hydrolysis scales with the number of nucleotides per oligo and, thus, a proportional increase in the expected number of errors is anticipated. 73 Given that some types of single-site errors, such as strand breaks, may render entire oligos and the data within unreadable, the use of longer sequences further increases the need for durable storage to prevent premature data decay beyond experimental time scales. To this end, systematic studies on decay mechanisms and rates for many approaches to data encoding in DNA are missing, a critical factor regarding approaches that heavily rely on structural integrity for data retrieval.

Currently, long-term storage is only feasible within a protective material and at DNA loadings of only a few percent. Consequently, the need remains for long-term DNA data storage systems closer to DNA’s true storage density. Indeed, further improvements in the coding density toward DNA’s Shannon capacity, for example, by means of improved encoding algorithms or lowering logical redundancy, are largely overshadowed by the general loss of storage density due to the storage matrix. Conversely, the loss in encoding density yielded by encoding approaches relying on DNA topology is rendered less severe by this storage overhead, and the interplay of such approaches with denser storage systems is interesting for further research.

2.3. Random Access

As discussed above, the ability to select only a subset of DNA molecules for readout limits the current data capacity of a single pool of data-encoding DNA. This access to DNA subpools, equivalent to file-level random access, is crucial to scale DNA-based data storage up to large data capacities with no need for costly, complete sequencing of the pool. This has a major implication: an addressing system is needed to select subpools from a complex DNA mixture, with high specificity. Whereas the use of a physical substrate on which DNA can be arrayed may solve this problem, 31 this approach and similar solutions relying on the physical separation of individual oligos or oligo pools render DNA’s density advantage obsolete. Instead, two other major strategies have been developed: PCR-based addressing and direct physical separation ( Figure ​ Figure4 4 ). In PCR-based addressing systems, the high specificity of amplification via PCR is leveraged to selectively enrich a subpool over the background by using at least one address-specific primer and corresponding priming regions on the data-encoding oligos. Because of PCR’s exponential nature, a sample of the amplified pool will contain mainly the desired file with its matching priming regions, as well as nonspecific sequences as background. Demonstrated in 2015, 74 this addressing system has now been shown to scale to well above 10 10 unique sequences per reaction while only requiring about 10 copies per sequence, which is equivalent to a pool capacity on the order of terabytes. 6 , 70 , 71 Either a rigorous design of orthogonal primer sequences 6 or the use of hierarchical addressing systems would be needed to achieve the required high specificity at these scales. 71 Nonetheless, primer-based addressing systems face several constraints. First, the incorporation of random-access priming regions into each oligo decreases the available space for data-encoding bases, thereby also decreasing the storage density (currently by about 15% per address region). 71 , 75 Second, PCR-based random access irreversibly removes oligos from the pool, which necessitates potentially lossy reamplification of the entire pool after repeated data retrieval. 75 , 76 Moreover, as pool sizes and, thus, the number of sequences, become larger, the enrichment of a few copies against an ever-increasing background will at some point hit the limitations of PCR regarding processing volumes, required amplification cycles to obtain sufficient enrichment, and nonspecific amplification due to primer–payload similarity. 77 , 70 Indeed, data retrieval from a hierarchical addressing system of 5.5 TB required additional physical separation of pools via a biotin-based bead extraction between file accesses to fully remove the background carried over from PCR. 71 Lastly, PCR-based addressing is incompatible with common storage approaches, thus necessitating the removal and re-embedding of the encoding DNA into the storage matrix for each random-access operation.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0004.jpg

Overview of random access strategies to select a subpool of sequences, usually a file, from a large pool. PCR-based addressing methods leverage the high specificity of primers and the exponential amplification of PCR to enrich target sequences by using either a single or multiple PCR runs. Methods using physical separation as a tool to select sequences also rely on the high specificity of short primers or barcode sequences, but remove the desired sequences using magnetic bead extraction or fluorescence-activated sorting. Images adapted from ref ( 71 ) and reproduced with permission from ref ( 75 ). Copyright 2019 American Chemical Society and copyright 2021 Springer Nature, respectively.

As an alternative to PCR, sequence specificity has also been exploited to carry out physical separation of files in pools. As mentioned above, biotin-labeled primers can be used to address and extract specific files via streptavidin magnetic beads on the basis of file-specific random access regions in encoding oligos, similarly to PCR-based addressing. 71 , 78 This approach has two key advantages: the sample can be reused for subsequent retrievals and nonspecific binding and PCR-induced biases are circumvented. 78 Banal et al. 75 extended this concept to DNA pools encapsulated in silica particles by labeling their surface with DNA barcodes to facilitate random access via fluorescently labeled probes and fluorescence sorting. While this represents a scalable random access scheme compatible with long-term storage, it is likely that the DNA barcodes on silica particles would decay much faster than the data-encoding DNA within so that random access ceases to function even if the data itself may still be intact.

All random access approaches aim at facilitating file-level control in large pools of DNA-encoded files while under the constraints of specificity, scalability, and storage density. Such scaling to large pools is highly desirable because it retains DNA’s high storage density compared with the physical separation of smaller pools using storage approaches (see Section 2.2 ). Currently, the highest demonstrated data capacity for random access is on the order of terabytes of data. 6 , 12 , 71 While this does not appear to be a hard limit, 6 it is likely unpractical to scale random access by PCR-based addressing indefinitely because of the aforementioned difficulty of orthogonal primer design and the requirement for many amplification cycles given the associated impact of PCR bias. 20 Whether any practical limit of PCR-based random access exists in real-life applications remains to be seen, however. As an alternative, hierarchical storage systems combining high-level access to isolated subpools with file-level random access within such subpools appear more suited to allow for random access at the data capacities envisioned for DNA data storage. The first steps in this direction have been taken, such as labeling DNA-embedding polymer disks with QR codes or automated retrieval of individual DNA pools in a digital microfluidic device, 56 , 63 but the trade-off between storage density, data longevity, and ease of automated data access requires further work.

Beyond random access, other file operations such as encryption with genomic keys, 79 erasure on the basis of obfuscation, 80 and rewriting by chemical modification or PCR 81 , 82 are also supported by sequence-based DNA data storage. As recently reviewed elsewhere, 83 these approaches highlight the versatility of file operations supported by DNA as a storage medium.

2.4. Reading

While the readout of data encoded in DNA is rarely done in its application as an archival storage system, 69 the complete and error-free retrieval of stored data must be guaranteed within a defined set of storage and sequencing conditions in order for DNA data storage to have any commercial relevance. As a result, the choice of sequencing platform has a marked impact on the design and feasibility of sequence-based DNA data storage. Currently, readout of the DNA sequences needed for data decoding relies heavily on established technologies for DNA sequencing in life science applications, most prominently sequencing-by-synthesis (SBS) as commercialized by Illumina.( Figure ​ Figure5 5 A,B). 84 , 85 As an alternative, sequencing using protein nanopores, commercialized by Oxford Nanopore Technologies, has been used because of its ease of implementation, automation, and portability ( Figure ​ Figure5 5 C). 70 , 81 , 86 Nanopore sequencing uses electrical readouts rather than fluorescence detection to identify each base of a DNA strand as it moves through a biological nanopore. Contrary to SBS, it is therefore also able to identify modified and unnatural nucleotides such that the readout of data encoded using an expanded molecular alphabet is possible. 53 , 87

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0005.jpg

Overview of next-generation sequencing technologies presently used in DNA data storage. (A) Illumina sequencing generates clusters of identical single-stranded oligonucleotides. As the complement is synthesized using spectrally distinct, fluorescently tagged nucleotides, the identity of each base along the strand can be determined through the color of emission. (B) Oxford Nanopore measurements do not require fluorescent dye molecules. As the oligonucleotide passes through the protein pore, the three-dimensional shape of each base will modulate the ionic current, which results in a current–time trace that corresponds to the specific sequence. Images adapted with permission from ref ( 85 ). Copyright 2016 Springer Nature.

While nanopore sequencing improves upon several limitations of SBS for DNA data storage, as reviewed by Ceze et al., 12 two key constraints of the technology are its high error rate and the required sequence length. The high error rate of nanopore sequencing (∼10% per nt in the single read), 70 , 88 compared with the nearly negligible rate of errors introduced by SBS (∼0.5% per nt), 70 necessitates the clustering of sequence information, and thus, higher sequencing coverage and additional postprocessing of sequencing data. 70 , 86 Moreover, sufficient pore utilization for high sequencing throughput can only be realized for long fragments (>1 kb). 86 , 88 Therefore, the readily available oligo libraries with a length of only a few hundred nucleotides per sequence must be combined into longer assemblies to be suitable for nanopore sequencing. This process, usually performed via Gibson assembly or overlap extension PCR, 70 , 81 , 86 reintroduces several difficult-to-automatize steps into the sequencing workflow, which calls the approach’s claims of improved portability and ease of automation over SBS into question. These constraints currently render nanopore sequencing more challenging and slower than SBS. 12 Accordingly, the largest data size retrieved using the technology is currently about 1.67 MB, compared with around 200 MB for SBS. 70 , 86

The use of both state-of-the-art SBS and rapidly developing nanopore sequencing for DNA data storage highlights the current trade-off between sequencing accuracy and cost, as well as implications for future scalability. To this end, the development of solid-state nanopores for the determination of DNA structures including their sequence, with the potential of increased accuracy and throughput by avoiding enzymes limiting the translocation rate, holds promise for data storage applications.

2.5. Decoding and Error Correction

In addition to the errors during DNA sequencing discussed in the previous section, errors are also introduced during the synthesis, storage, and amplification steps of DNA data storage, which presents challenges regarding data decoding. While amplification and SBS-based platforms mainly introduce substitution errors (reading a C instead of a G, for example), synthesis dominantly causes deletions (e.g., missing a base) at a final rate of around 0.2–1% per nt. 20 , 62 , 70 Insertions (addition of extra bases) are uncommon and usually occur at less than 0.1% per nt, mainly because of synthesis. 20 , 62 In addition to biases in amplification efficiency, storage mainly contributes to shifts in the copy number distribution of the sequences, which leads to the unrecoverable loss of individual sequences over time, e.g., 8% after 94% of the DNA has decayed (i.e., four half-lives). 20 , 25 This means that, in general, sequence information is never recovered error-free. As the decoding of the stored data directly depends on this sequence information, both the loss of individual sequences and the introduction of errors into these sequences pose a risk on error-free decoding. While an increase of physical redundancy to cluster sequence information alleviates this problem, doing so is undesirable and inefficient because it drastically lowers the information density. 20 As considerations for cost and automation limit most of the potential for reducing error rates within the data storage workflow, sufficient redundancy must instead be implemented at the sequence level. Therefore, the presence of errors in DNA storage necessitates the use of principled coding/decoding algorithms. The goal of a good encoder/decoder pair is to enable perfect reconstruction from noisy data by introducing a minimal amount of logical redundancy. Error-correcting schemes tailored to DNA data storage consider that the written sequences are relatively short and typically stored in a spatially disordered manner. The optimal coding schemes depend on the noise profile of the storage system. Reliance on logical redundancy introduced by a combination of modern error-correction codes is sufficient for low error rates. However, both for low and large error rates, dominated by deletion errors, one also uses physical redundancy to recover the original information. 42

An error-correcting code maps an original message to a larger one, which introduces redundancy. If this message is then sent over a noisy channel, thereby introducing random errors, these errors can be detected or corrected. A simple example of an error-correction code was used by Goldman et al., 9 where each part of the information was written on four subsequent DNA sequences. Thus, the loss of sequences could be corrected if fewer than four subsequent sequences were lost. This coding scheme, however, was ill-suited for the used DNA channel because it had a low effective information rate, i.e., number of information bits per total number of encoded bits, and did not recover the whole message. In contrast, good error-correcting codes ensure data recovery with minimal redundancy. The maximal information rate that an error-correcting code can achieve is theoretically bounded. 89 This bound is known as the channel capacity and depends on the characteristics of the noisy channel. This means that the parameters of a good error-correcting code depend on the rates and type of errors. For example, the Reed–Solomon code can correct up to e erasures and s substitutions with 2 s + e additional symbols. 90

In 2015, a DNA data storage that used an error-correcting scheme, which enabled the recovery of full data, was realized by Grass et al. 4 Its encoding/decoding algorithm is explained in Figure ​ Figure6 6 . It uses an outer code that can correct for the loss of sequences, adds an index for each sequence to be able to retrieve the order of the sequences that are lost during storage, and uses an inner error-correcting code that can correct nucleotide errors within sequences. Following the original introduction of the inner-outer encoding scheme, the vast majority of subsequent works used such a scheme for DNA data storage. 14 , 42 , 44 , 70 In general, the outer code applies on the level of the original information, whereas the inner code protects single sequences or indices. However, different codes were used for the outer and inner codes. A Reed–Solomon code, 4 , 44 , 70 Fountain codes, 14 and LDPC (low-density parity check) code were used as an outer code. 91 As an inner code, a Reed–Solomon code was used by Grass et al. and Organick et al. 4 , 70 Blawat et al. proposed to protect the index separately with a bit-correcting code (BCH) as the inner code. 44 The inner–outer coding scheme works well for moderate error rates of 1–2% and substitutions. However, it cannot correct large error rates that are dominated by insertions and deletions. This is because no inner codes exist that work sufficiently well on short sequences in these noisy setups. 92 Here, the original message can be recovered by additionally exploiting the physical redundancy. For example, in Antkowiak et al., 42 the noisy sequences were first clustered by similarity, then the information on multiple erroneous copies was combined to construct a sequence with fewer errors. This was achieved by an alignment step within the clusters and subsequent majority voting. This resulting sequence could then be sent through the usual decoding steps. Recent works have explored the development of efficient clustering methods tailored to DNA data storage, as well as efficient encoding schemes that allow the recovery of a sequence from multiple noisy reads. 93 − 96 Such codes could then be used as an inner code. This has led to a better understanding of efficient use of physical redundancy in DNA data storage. However, at this moment an optimal encoding/decoding scheme for long-term DNA storage, or even components of it, remains unknown. For example, the capacity of deletion and insertion channels or reconstruction from multiple reads with the combination of different errors are not fully understood yet. Also, coding for these short unordered sequences remains challenging for very high error rates. Furthermore, different synthesis and sequencing techniques might motivate different approaches. For these reasons, error correction for DNA remains an active topic of research.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0006.jpg

Inner–Outer Code. Encoding . The original information is first encoded with an outer code that introduces redundancy and protects against the loss of sequences. In Grass et al. 4 the original information was first grouped into blocks of multiple sequences (light blue). Then, each row was encoded with a Reed–Solomon code that adds redundancy (yellow). The columns correspond to single DNA sequences. These are labeled with a unique index (purple). Each column is then encoded with an inner code that adds logical redundancy on the level of each sequence (green). In general, the inner and outer codes need not add the redundancy separate from the original data, but instead return a modified longer word. Decoding . The original information from the set of noisy sequences (errors marked in red) is retrieved by first decoding the inner code. This removes most errors within the sequences. For large error rates dominated by insertions and deletions, this step may be preceded by a clustering and alignment step that generates sequences with fewer errors from multiple noisy copies. The sequences are ordered by their index. The ordered sequences are then decoded by the outer code. Here, lost sequences correspond to erasures and erroneous sequences to substitutions. These are corrected by the outer code.

2.6. Limitations of DNA Data Storage

2.6.1. issues related to cost.

DNA has become a promising tool for next-generation data storage since it provides high data capacity and storage density 78 and it is possible to store it in multiple ways 37 over significant time periods. 4 However, in order to make DNA data storage standard, some limitations must be overcome. Arguably, the most important limit to the development of DNA data storage is cost, especially in comparison with standard storage processes. Often, synthesis costs for DNA data storage are undisclosed; 12 however, it is possible to draw some conclusions about them. The synthesis of DNA oligos for data storage was column-based and was developed in the 1980s. Since then, this process has been fully automated, and now it allows the synthesis of 96–384 oligos simultaneously. The costs of this procedure range between 0.05 to 0.15 USD per nucleotide. 11 Array-based synthesis processes were developed in the 1990s. They lowered the costs because of their high-throughput nature, with an average price per nucleotide down to 10 –4 USD. Thus, if a conservative estimate of 1 bit/nucleotide of encoded data is assumed, each terabyte of digital data would cost 800 million USD, on average. 12 , 97 In comparison, tape storage costs 7–8 orders of magnitude less, i.e., about 16 USD/TB of data, with prices decreasing by 10% every year ( Figures ​ Figures7 7 A,B). 12 , 98 Considering this enormous disparity in cost between DNA data storage and magnetic tape, the outlook for DNA storage solutions initially appears dismal. That being said, DNA data storage has the potential to drop significantly in cost over time because of several key features. For example, optimized error-correcting codes could lower the cost 97 , 14 by increasing the overall efficiency of the storage process by means of accuracy reduction. 12 By capitalizing on error-correcting codes, it may be possible to work with cheaper, albeit less reliable, synthesis processes if it is assumed that any synthetic errors can be identified and corrected for upon readout, thereby leading to an overall reduction in cost. In 2020, Antkowiak et al. proposed that synthesis costs will drop to around 10 6 USD/TB (i.e., 2–3 orders of magnitude reduction) as a result of improved synthesis strategies, including large parallelization, optimization of reagents, and combination of nonvolatile DNA-based memories with logical operations ( Figures ​ Figures7 7 B–E). 42 In addition, Antkowiak et al. estimated the marginal costs of the chemical synthesis of DNA. With the use of photolithography to synthesize 10 000 copies of each oligo, with a nucleotide reagent cost of 100 USD/g and a logical density equal to 1 bit/nucleotide, the cost of 1 TB of data stored in DNA would be ∼10 –2 USD, with a chemical yield of 100%. Even if this chemical yield is impossible to achieve in industrial conditions, DNA data storage will be competitive against tape storage (20 USD/TB cost) even at 0.1% chemical yield. In the latter case, the cost of photolithographic DNA storage would be ∼10 USD/TB, and synthesis conditions would be similar to the one used in surface chemistry (1000× reagent excess), which demonstrates that an optimization of chemical DNA synthesis processes is compatible with DNA data storage applications. Thus, Antkowiak et al. proved that the combination of synthesis processes that produce lower quality DNA oligos (i.e., photolithographic synthesis) and appropriate error-correction codes allows a major cost decrease in DNA data archives. 42 Regarding costs, there is also an important advantage with respect to traditional storage technologies that is worth mentioning. In fact, DNA storage systems’ maintenance costs are expected to be lower than the ones of silicon devices in contemporary data centers. 97

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0007.jpg

(A) Cost trend of hard disk drives (HDD), NAND flash-based storage devices, linear tape-open tape cartridges (LTO tape), and optical Blu-ray (BD-RE). Image has been reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 99 ). Copyright 2018 AIP Publishing LLC. (B) Cost comparison between DNA synthesis for data storage and LTO tape storage. (C–E) Comparison of different DNA synthesis platforms and their characteristic traits. (C) Printing technology is primarily used by Twist and Agilent. (D) Electrochemical synthesis is employed by Custom Array. (E) Antkowiak et al. used light-directed synthesis. (C–E) Images reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 42 ). Copyright 2020 Springer Nature.

A strategy toward decreasing the costs of stored DNA data may be the enzymatic synthesis of DNA strands. 72 This synthesis could, in principle, decrease the costs of reagents even if the required enzymes are still rather expensive. It occurs in aqueous environments and it yields longer strands; however, error rates need to be assessed. A brief review of the principal trends in enzymatic synthesis is provided in section 2.1 . The costs of enzymatic synthesis have been estimated by Jensen et al. for a template-independent enzymatic oligonucleotides synthesis (TiEOS) method. 100 The total costs of synthesizing 1000 strands of 1000 nucleotide length would be 136 USD with recycled TdT, 2700 USD by phosphoramidite technique, and 136 000 USD if a fresh stock of TdT was introduced at every cycle. Thus, the costs of the enzymatic synthesis would be 1 order of magnitude lower than the phosphoramidite technique if the TdT was recycled. 100 The combination of advanced error-correcting codes and synchronization algorithms could possibly achieve lower costs of enzymatic DNA synthesis, as recently reported by Tang et al. This strategy allowed the enhancement of the coding rate to more than log 2 3 per unit time and avoidance of deletions. 45 In the future, automation 39 of the reading, writing, and operative procedures, as well as the future developments of microfluidics, may forward DNA data storage toward a reduction of its economic costs. 12

2.6.2. Issues Related to the Process Time Scales

Besides economic costs, automation could possibly lead to a reduction of the time costs for DNA data storage, as well. Indeed, the time requirements for the process are another limiting factor in the development of DNA data storage. For example, the reading speed is much lower than standard silicon-based storage media. 97 This could be detrimental, especially when the only possible alternative to retrieve a file would be to read the entire database: it would be a very slow process. For these reasons, DNA data storage systems have been proposed for long-term archival purposes 97 that need infrequent reading, while future investigations will be needed to fully realize random access. 78 , 75 , 70

Conversely, in regards to nanopore reads of labeled DNA, each label is read in [10 –1 ; 10 1 ] ms. 101 , 21 , 102

The writing speed of DNA data storage is lower than that of standard technologies, too. The current writing speed for DNA archives is in the order of kilobytes/second, thus a reading/writing cycle has a significant cost in terms of time. 8 It is estimated that DNA data storage will need writing speeds in the order of gigabytes/second to be comparable with commercial cloud storage systems in around 10 years. This means DNA data storage must fulfill a gap of 6 orders of magnitude in regard to the writing (i.e., synthesis) and a gap of 2–3 orders of magnitude in regard to the reading (i.e., sequencing). 12

In order to enhance the read/write speed of DNA data storage, one of the goals should be to make it suitable for frequent data reads and modifications. This is another pivotal reason for the investigations about synthetic polymers as data storage tools, together with the mentioned high cost of DNA. 97

While writing and reading operations regarding DNA-stored data need to be improved, when it comes to preservation time, DNA is better than current storage technologies. Indeed, the maximum preservation time of information is 50 years for digital memories and 500 years for paper, while it is millennia for inorganic matrix-encapsulated DNA.

In conclusion, DNA data storage presents both advantages and disadvantages with respect to traditional storage methods regarding costs. It is also for this reason that research interest is growing in this field.

3. Structure-Based DNA Data Storage

3.1. dna nanotechnology versus synthetic dna sequence for digital data storage.

DNA nanotechnology may also be employed to overcome the limitations illustrated above in synthesis and reading. Because of the self-assembled nature of DNA nanostructures ( Figure ​ Figure8 8 ), it is possible to significantly reduce the synthetic demand and to eliminate the need for next-generation sequencing for DNA data storage. DNA nanotechnology leverages the unparalleled molecular recognition motifs of the nitrogenous bases to create arbitrary two- and three-dimensional structures from the self-assembly of user-defined DNA strands. 24 , 103 Through careful design of the sequences of these strands, which can be easily synthesized in an automated manner or even purchased from commercial vendors, exquisite control over their final assembly can be realized, thereby enabling the construction of nanoscale shapes and patterns. The main approaches in structural DNA nanotechnology can be divided into three groups: DNA origami, DNA tile assembly, and wireframe DNA structures, 104 all of which have been extensively reviewed elsewhere. 103 Among these, DNA origami is the most widely used method for the construction of DNA-based data storage structures at the nanoscale. Importantly, all of these bottom-up approaches enable the production of asymmetric patterns, which is a key criterion for data storage applications: instead of encoding information directly into the sequence of bases, data may be stored in the three-dimensional shape of these assemblies.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0008.jpg

DNA nanostructures are data storage architectures. (A) DNA origami leverages the specific base-pairing motifs of DNA to create arbitrary structures. When a long scaffold strand (several thousand nucleotides in length) is combined with hundreds of short “staple” strands, complementary regions on the different strands will hybridize, thereby folding the scaffold into a desired conformation. These structures can then be examined using (B) atomic force microscopy or (C) electron microscopy, for example. (D) Data can be written onto DNA origami sheets through the site-specific addition of proteins; the data may be read using AFM. (E) Nanoparticles can also be controllably positioned on DNA origami with nanometer-scale resolution, which enables data writing with cryo-EM readout. (A) Image reproduced with permission from ref ( 108 ). Copyright Springer Nature 2021. (B) Image reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 109 ). Copyright 2019 AAAS. (C) Image reproduced with permission from ref ( 110 ). Copyright 2020 Springer Nature. (D) Image reproduced with permission from ref ( 111 ). Copyright 2010 Springer Nature. (E) Image reproduced with permission from ref ( 112 ). Copyright 2010 Wiley-VCH.

Because of the noncovalent nature of DNA nanostructures, they can be reconfigured using established strategies, including strand displacement, 26 thermal annealing, 105 and pH changes. 106 The reversible Watson–Crick base pairing means that, unlike data encoded directly into the primary DNA sequence, data storage platforms based on DNA nanostructures can be “erased” and “rewritten” multiple times without requiring any laborious chemical synthesis, which decreases the synthetic demand and cost associated with these methods. 25 Additionally, the reconfigurable nature of these constructs enables their use in data operations and computation, analogous to existing computer memory systems. Because each bit is formed through self-assembly, it is also possible to encrypt information by initially omitting a key element from the assembly mixture; only upon addition of the correct “password” molecule can the DNA-based data be “read.”

Compared with encoding data within the nucleotide sequence itself, data storage based on DNA nanotechnology has one major drawback: data storage density. While data written directly into the DNA sequence theoretically allows 1 exabyte (or 1 billion gigabytes) to be stored in every cubic millimeter of DNA, 107 the data density that has been attained so far using DNA secondary structure is much lower because it requires ∼100 base pairs per bit. 25 That being said, this density is still approximately 3 orders of magnitude higher than current hard drive technologies, with further improvements conceivable through the optimization of the 3D DNA structure. Considering the advantages of encoding information into the secondary structure—including ease of readout, synthetic simplicity, and reconfigurability—this is a minor obstacle and one that may be mitigated through the careful design of DNA nanostructures.

3.2. DNA Nanostructure-Based Information Storage Platforms: Assembly and Readout

When comparing DNA nanostructure data storage to traditional sequence-based methods, the major differences lie in the reading and writing steps. In particular, standard DNA data storage requires slow and costly DNA synthesis, while DNA nanostructures already store molecular data in two- and three-dimensional objects. In fact, the assembly of DNA origami is, itself, a molecular information encoding process, wherein the long scaffold strand is folded with hundreds of short “staples” to form a predetermined structure ( Figure ​ Figure8 8 ). The size and morphology of the resulting structures can be assessed using various ensemble and single-molecule characterization methods, thereby enabling the readout of information stored in the shape and structure of these nanoscale assemblies. The use of this suite of techniques (described in detail in the following sections) has two major advantages: (1) Depending on the design and the physical attributes of the data storage structure, it may be possible to perform more than one type of characterization. Comparing the results of different readout methodologies may allow for the identification of systematic biases in each modality, which generates a feedback cycle wherein structures may be improved upon and recharacterized. (2) The identification of larger structures (on the order of approximately tens of nanometers) de facto requires lower resolution than the differentiation of single bases, thereby facilitating the use of less precise techniques without sacrificing accuracy. Additionally, because the single-molecule readout methods used for the assessment of DNA nanostructures are also used in DNA sequencing, these techniques are constantly improving: in this way, the advancement of sequence-based DNA data storage also supports the growth of alternative, structure-based approaches.

3.2.1. Gel Electrophoresis

A first and very simple method to read data is the use of gel electrophoresis, which remains one of the key methods to differentiate DNA nanostructures of different shapes and sizes, as well as to assess their yield. Through the formation of DNA nanostructures with prescribed differences in size, it is possible to encode information and then read this out using the discrete bands formed on a gel. To this end, simple structures involving hairpins, loops, or G-quadruplexes placed along linear DNA backbones can also be used to store digital data. For example, Halvorsen and Wong used the change between a closed loop structure (“1”) to a linear structure (“0”)—which have different elution times by gel electrophoresis—as a binary switch. The authors used electrophoresis to demonstrate the readout of an 11 byte ASCII message. 113 The creation of many loops of different sizes, each distinguishable by gel electrophoresis, offers a greater number of possible bits in each lane ( Figure ​ Figure9 9 A). 114 The formation of loop structures is not the only operation of DNA nanostructures that can be directly probed using gel electrophoresis. In an alternative approach, five single-stranded nucleotides were annealed together to form an assembly with three addressable overhangs; when complementary strands to each of these overhangs were introduced, the site changed from a “0” (single-stranded) to a “1” (double-stranded) state, which could then be reversed using strand displacement. 115 These examples highlight the simple and inexpensive nature of gel electrophoresis as a readout platform, especially when compared with optical, electrochemical, and AFM-based methods. However, the relatively long read times and low data capacity of these methodologies limit their applicability. Gel electrophoresis, being a bulk measurement, also requires substantial quantities of DNA for readout relative to single-molecule methods like AFM, electron microscopies, and nanopore techniques.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0009.jpg

Examples of DNA nanostructures for digital information storage. (A) The folding of DNA origami into loop structures upon binding of a biomolecule target generates a shift in the assembly’s electrophoretic mobility. Image adapted with permission under a Creative Commons Attribution 4.0 license (CC BY) from ref ( 114 ). Copyright 2017 Oxford University Press. (B) The association of different DNA sequences to carbon nanotubes produces an array of morphologies and, therefore, can be used to produce barcodes. Image adapted from ref ( 116 ). Copyright 2019 American Chemical Society. (C). Data strings based on regions of varying fluorescence intensities along a DNA nanotube can be read out using single-molecule fluorescence microscopy. Image adapted from ref ( 117 ). Copyright 2021 American Chemical Society.

3.2.2. Fluorescence

Bulk fluorescence measurements can read out data encoded into DNA nanostructures. In an early example, DNA strands were used as “molecular memory” by transitioning thermally between a hairpin structure (unwritten state) and a duplex structure (written state). 118 The oligonucleotides were appended with fluorophore/quencher pairs; as the thermal cycling occurs, the fluorescence output reversibly switches between two defined states to produce a binary signal. Unfortunately, because this process is performed in solution, the whole memory is erased simultaneously, which highlights the need for alternative strategies that enable spatial addressability. To this end, single-molecule fluorescence methods may be used instead to read out DNA origami breadboards appended with fluorophores. In one approach, termed “polychromic address multiplexing,” DNA origami was separated into spatially resolved “cells,” each of which contained a set of fluorophores appended to DNA. Some of these linkers contain photocleavable groups, which enables the disruption of energy transfer processes between adjacent dyes, thus resulting in a fluorescence change. The switch between two possible intensity values provides the binary logic in this system. 119 Through the use of single-molecule total internal reflection fluorescence (TIRF) microscopy, it is possible to decode fluorescent barcodes assembled on DNA nanostructures. 120 Pan et al. utilized this diffraction-limited imaging technique to devise a method to group fluorophores into bright (“on”) lengths along a DNA origami rod. 121 Such bright spots were separated by dark (“off”) regions to create geometric barcodes using only one color of emitter ( Figure ​ Figure9 9 C). Another tactic used a DNA origami “breadboard,” which was divided into a grid of pixels or an “indexed matrix of digital information.” Each specific location on the origami represents a bit, with the presence (“1”) or absence (“0”) of a docking site for a fluorophore encoding binary information. 22 Docking sites are located using DNA points accumulation for imaging in nanoscale topography (DNA-PAINT), a form of super resolution fluorescence imaging that relies on transient binding of short DNA strands to prepositioned sites on an origami structure. 122 In this example, unique data patterns are created by selecting which staple strands within the origami possess data domains. This approach also uses error-correction algorithms that enable message recovery even when individual docking sites are missing. Unlike DNA sequencing, which requires multiple reads to reach a consensus, this tactic can read 750 origami to reach a 100% probability of full data retrieval, which means that only femtomoles of material are needed.

3.2.3. Atomic Force Microscopy

Early examples of DNA origami were reported in the mid 2000s and involved the assembly of 2D arrays to form various images, including the letters of the alphabet, 123 a nanoscale Mona Lisa, 124 and a map of the Americas. 125 Atomic force microscopy (AFM) was used to “read-out” images formed by DNA origami, and this remains a key technique for the study of DNA-based nanomaterials. 126 AFM measurements detect differences in height over a sample surface, without affecting the sample, thus rendering this method ideally suited to reading out three-dimensional patterns on DNA origami. Binary information can be written by precisely placing nanoparticles or proteins at defined positions on a DNA breadboard. In the context of DNA data storage, Zhang et al. demonstrated in 2019 127 a “DNA braille” system, which was prepared by patterning biotinylated overhangs onto DNA origami. The data in this system are encrypted; only when streptavidin is added and binds to biotin does the pattern become readable by AFM. The decryption time for this method is 1–2 h, including sample processing, imaging, and readout—this time could be reduced by using high-speed AFM methods and fully automated image analysis algorithms. Similarly, Fan et al. used AFM to decode information stored in DNA domino arrays. 127 The use of DNA overhangs bearing streptavidin enables the use of strand displacement reactions to controllably erase and rewrite data on the DNA origami surface, 128 thereby underlining the advantages of DNA nanotechnology as an information storage platform. AFM is also suitable to look at DNA positioned on other types of nanomaterials; for example, it was found that condensing DNA strands onto carbon nanotubes creates height differences that were observable by AFM ( Figure ​ Figure9 9 B). Control of the patterning of these protrusions, which interestingly do not rely on DNA hybridization, may allow for the production of two-dimensional barcodes on carbon nanotubes. 116

3.2.4. Electron Microscopy

Relying on similar principles, the decoding of DNA nanostructures can also be achieved using electron microscopy (EM). DNA itself can be difficult to visualize using EM because of insufficient electron density-related contrast, and therefore, often requires staining. As such, EM is better suited to the examination of hybrid structures, wherein the DNA is used to create “barcodes” made of gold nanoparticles, 129 for example. Different barcodes can then be used to track the cellular uptake of various nanostructures because EM allows for the identification of subcellular compartments. EM exhibits some of the same advantages and pitfalls of AFM: while these techniques allow for high-resolution two- and three-dimensional images to be formed of DNA nanostructures, they are time-consuming and expensive, as well as relatively low-throughput. As cryo-EM and liquid-cell EM techniques continue to improve, the direct imaging of biomolecules might offer an alternative in the future with better resolution on the single-molecule level even without the use of staining or nanoparticles.

3.2.5. Nanopore Measurements

More recently, through the use of long DNA backbones as in DNA origami, the organization of DNA protrusions has been used to produce three-dimensional DNA barcodes 21 or hard drives that may be read using solid-state nanopores. Nanopore methods require no labeling for readout, which makes them an attractive alternative to fluorescence. Briefly, an electric field is applied across a nanoscale hole (made from glass or Si 3 N 4 , for example), which causes molecules to translocate through this nanopore. As the analyte passes, it modulates the ionic current signal because of its 3D shape blocking the pore—in this way, the structure of the DNA nanoconstruct is translated directly into an electrical signal ( Figure ​ Figure10 10 ). The resulting current–time traces can then be analyzed using automated methods, which allows for rapid data decoding.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0010.jpg

DNA data storage structures relying on nanopore readout. (A) An encrypted “DNA hard drive,” wherein readout may only occur once the correct molecular “keywords” have been added. Streptavidin molecules (gray circle in inset) partially block the nanopore as they translocate, which causes a momentary decrease in the current. Image reproduced from ref ( 25 ). Copyright 2020 American Chemical Society. (B) Multilevel barcoding is achievable by exploiting DNA junctions with different sizes, which create current drops of variable magnitude. Image reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 102 ). Copyright 2021 Wiley-VCH. (C) A DNA barcode with “structural colors” can also be formed by closely packing structural units, which therefore read as one protrusion. These units may be based on either monovalent streptavidin or a DNA cuboid. (D) Nanopore microscope can be used to detect up to 10 structural colors within the same DNA data string. The correct identification of the “color” was verified using fluorescence microscopy, wherein fluorescently labeled (5′-fluorescein) structural units were used. (C,D) Images reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 130 ). Copyright 2022 Springer Nature.

The use of nanopores to read out digital information encoded in DNA nanostructures was demonstrated by Bell and Keyser, who fabricated “DNA barcodes” to capture proteins. 21 The authors used conical quartz nanopores with diameters of ∼14 nm for a 3-bit barcode that could be assigned with 94% accuracy. Now, these quartz nanopores can read out DNA hairpins along a carrier strand with a density of approximately 1 bit per 30 nm—ca. 3 times the data density of conventional hard drives. 25 One of the major benefits of this method is their high speed: a single “DNA hard drive” can be read out on the millisecond time scale using a quartz nanopore because of the superior signal-to-noise ratio when compared with DNA sequencing. Solid-state nanopores combined with DNA nanotechnology have since been used to save and encrypt a grayscale image. 102 Streptavidin-labeled scaffolds can also be used to create a secure data storage system that requires the correct molecular “keywords” to decode the data within the structure ( Figure ​ Figure10 10 A). Multilevel storage architectures have been achieved using different DNA junction sizes to create a quaternary encoding system ( Figure ​ Figure10 10 B). 102 Increased storage density beyond binary barcodes can also be achieved by creating blocks of repeating structural units that appear as a single protrusion within the nanopore, which creates “structural colors” to generate up to 10 data levels. 130 Compared with fluorescence, sequencing, or gel electrophoresis-based strategies, single-molecule nanopore measurements require less material and enable faster data reading; through a combination of this technology with deep learning methods, 131 real-time nanopore data analysis is attainable.

Another important feature is random access, as demonstrated in 2021 by Bošković et al. 101 In their work, random access of DNA barcodes was performed by exploiting a modified PCR method to increase the number of the target DNA nanostructures. Indeed, DNA structural barcodes were annealed as short oligonucleotides containing protrusions on single-stranded DNA (ssDNA) scaffolds to form digital bits at precise locations. In these structures, DNA nicks were ligated to favor the copy of the barcode by PCR. Each of these structures had a noncomplementary end, which acted as a barcode-specific primer template for the random access of data.

3.2.6. Alternative Approaches and Polymer Chemistries

The use of double-stranded DNA as a storage medium was also exploited in recent work by Tabatabaei et al. 72 on DNA punch cards. This macromolecular storage technology was used to encode the information in the sequence of bases of the DNA strands by using their sugar–phosphate backbone, i.e., topologically. Indeed, a pattern of nicking positions was precisely realized on the backbone of native dsDNA, and here, information was encoded by means of absence (i.e., 0) or presence (i.e., 1) of nicks. On the basis of enzymatic modification of DNA, nicks enable adding several functionalities to the storage system, for example, single-bit random access, pooling, and in-memory computation. However, the DNA punch cards system was able to store only up to 14 kB of digital information. Therefore, additional research is foreseen toward scaling its costs.

DNA as a natural polymer is not the only solution for data storage technologies. Therefore, researchers started to look for alternative molecular storage platforms based on synthetic polymers. Synthetic polymers can be used to increase stability against chemical degradation while offering a wide range of base modifications. Although alternative DNA bases have been introduced, synthetic polymers could be prepared using a set of monomers with a wider set of codes which expands the alphabet for data encoding. 97 First experiments reading single-stranded synthetic biopolymers indicate that the reading step can be performed with biological nanopores without the use of an enzyme slowing down the translocation. 132 As an example, Cao et al. used informational biopolymers composed of a backbone of poly(phosphodiesters) with dideoxyadenosine at both ends, and engineered-aerolysin nanopores. The results suggest a path to single-bit resolution at least in short polymers, however machine learning and training are needed for the successful readout. The study suggests an alternative way to store information with high density. The idea to use the backbone of an organic polymer to store digital information is similar to the approach discussed for DNA nanostructures.

Apart from DNA, other organic molecules have been recently proposed. 133 , 134 Two interesting examples are the use of peptide sequences for data storage, as reported by Ng et al. 133 and the use of urethanes as reported by Dahlhauser et al. 134 Unfortunately in both these cases, reading required the use of mass spectroscopy, with the consequent limitation in terms of costs and speed. Recent advances in nanopore-based readout of short peptide sequences 135 may speed up developments in this area. 53

3.3. DNA Nanotechnology for Molecular Computation

The storage of data in DNA is undoubtedly an exciting possible solution to our ever-expanding data storage needs. This technology may lead to future hybrid electronic–biomolecular computing systems in which some portion of the burden of data storage is supported by DNA encoding, which raises the question: “Can more of the computer system’s functions be carried out using DNA?” By reducing the time overhead of conversion to a digital format and directly undertaking data processing tasks with DNA-based computation, it may be possible to create molecular computing systems that are more efficient than conventional electronic analogues. Because of the noncovalent nature of DNA nanostructures, these materials are primed for use in molecular computation. A working prototype for a DNA computer was developed by Adleman in 1994, 136 wherein he used a separation-based approach to calculate a Hamiltonian path in a graph with seven summits. This problem was particularly suited to a molecular computing approach because it is an NP-complete problem; while verification of a putative solution has a complexity that is linear with respect to the number of nodes, the path space search is exponential in complexity with respect to the same. In a DNA computer, however, each DNA molecule plays the part of a separate processor, which enables many parallel operations to be carried out in a small reaction volume. This strategy greatly accelerates the initial path search, as statistics predict that DNA constructs corresponding to every possible path should be produced upon mixing. The task is then reduced to one of selection and filtering by removing invalid paths. The Adleman experiment acted as a proof of concept: in practice, the process was more time- and labor-intensive than a conventional digital approach. Nonetheless, the possibility of a DNA-based computer inspired researchers to further develop Adleman’s method and to devise advanced and powerful general DNA computing solutions. Early experimental and theoretical work examining the possibilities of DNA computation was focused on this parallelization and the benefits that it offered with regard to efficiently solving other NP-complete problems. 137 , 138 Recent work on DNA computation has moved away from such problems toward recreating deterministic logical operations, for example, addition 139 and multiplication, 140 with definite outcomes. Su et al. produced DNA logic cascades, which allows the buildup of a full adder, a 4:1 multiplexer, and then, they combined these with other logic circuits to produce a DNA arithmetic logic unit (ALU): the foundation of general-purpose processors. 139 These applications demonstrate the methods that can be used to mitigate error in DNA computation, which arise from the leeway and tolerance of mismatch inherent in sequence-specific DNA hybridization.

Larger engineered DNA nanostructures show great promise for use in biomolecular computation as well as small origami structures. Robust, rigid DNA tiles with programmable “sticky ends” have been made using double-crossover (DX), 141 triple-crossover (TX), 142 and single-stranded tile (SST) 143 motifs and used for a variety of algorithmic self-assembly experiments. This is facilitated by the logical equivalence of these tiles with Wang tiles, which are theoretical constructs with specified interactions that can simulate a Turing machine. A correctly designed, self-assembling set of these tiles is theoretically able to perform any computation that can be carried out by a conventional computer. Past applications of this idea include the design of a set of TX tiles that carry out a cumulative XOR operation, a set of DX tiles that self-assemble into a Sierpinski triangle, and impressively, a set of 355 SSTs that can be used to produce a variety of cellular automata capable of carrying out a number of computational tasks ( Figure ​ Figure11 11 ). 144 Particularly interesting in the latter example is the ability to controllably reintroduce indeterminism by including a plurality of tiles that could fill a given niche and leaving the ultimately realized pattern up to competition. This brings about a marriage of the benefits of deterministic logic and the power of indeterministic computing to solve combinatorial problems, thereby highlighting the utility of DNA nanotechnology not only for data storage but also for molecular computation.

An external file that holds a picture, illustration, etc.
Object name is nn2c06748_0011.jpg

Tile-based computations and algorithmic self-assembly. (A) Self-assembly by SSTs. From a seed, tiles attach to the frontier of a growing SST lattice according to interaction rules determined by their exposed recognition sequences. (B) An iterated Boolean circuit mimicking the function of a computation to determine whether or not a binary number is a multiple of 310. A long enough lattice will settle into one or another fixed pattern corresponding to the calculation result. (C) The result of four “multiple of 3” tilings. The numbers at the left mark the experiment number. The tilings correctly determine which input numbers have a factor of 3. (A–C) Images adapted with permission from ref ( 144 ). Copyright 2019 Springer Nature. (D) A Sierpinski triangle created by a cumulative XOR computation performed by DNA tiles. Sierpinski’s triangle is a fractal pattern, and the self-assembly rule that creates it is Turing complete. Images reproduced with permission under a Creative Commons Attribution 4.0 License (CC BY) from ref ( 145 ). Copyright 2004 PLoS Biology.

4. Conclusions and Outlook

DNA data storage—both the sequence- and structure-based versions—offers the possibility of storing digital information at very high data density. This promise has led a large number of actors (public and private institutions, corporations, etc.) to invest on the quest for advanced methods and experiments. Although great advances have been made toward DNA data storage, it is not yet competitive against conventional storage technologies. Significant challenges need to be overcome, in particular regarding writing speed and, hence, cost. While stored data size has been markedly increased, the current record for DNA digital data storage is still around 200 MB, with single synthesis runs lasting about 24 h. 8 , 12 Achieving the storage of TBs of data at a low cost is unattainable with the current techniques. Toward this goal, great efforts on the development of encoding schemes, writing and reading processes, and storage procedures are presently being made. 146 , 9 , 81 , 44 , 74 , 14

As the chemical and enzymatic processes for making sequence-defined nucleic acids continue to improve, the cost and time associated with writing DNA-based information is continually decreasing. These improvements are particularly important for sequence-based storage, but they importantly reduce costs for structure-based approaches, as well. Additionally, as alternative chemistries emerge, including unnatural nucleotides 147 and small molecules that can modulate the structure of DNA, 148 the parameter space for structure- and sequence-based DNA data storage is continually expanding. Importantly, these chemistries not only widen the breadth of materials that can be produced, but also may further extend the lifetime of DNA sequences, as these modifications render DNA less recognizable to enzymes.

For data readout, DNA sequencing is rapidly advancing, but current methods would be incompatible with unnatural monomer units, which limits the scope of the methods. Furthermore, all current DNA sequencing techniques require molecular machines like polymerases, which set fundamental limits for the throughput per enzyme, thereby meaning there is an upper threshold on the rate of sequencing even with massive parallelization. Emerging, rapid approaches to establish polymer sequence or three-dimensional structure one molecule at the time will improve the competitiveness of DNA data storage. Both natural and chemically modified oligonucleotides, as well as hybrid nanostructures involving DNA and quantum dots or nanoparticles, may be read out using solid-state nanopores. The versatility of this methodology, which hinges on the possibility of finely tuning nanopore size, makes this an attractive avenue for the future characterization of both pure DNA and composite materials. Through the use of quantum dots or fluorescent dyes, nanopore readout may be also combined with optical techniques to reduce the readout error rate without requiring enzymes to slow translocation. 149 While the use of higher order nanostructures or composite nanomaterials does sacrifice data density, the advantages of these methods are expected to outweigh this drawback. In terms of synthesis, DNA nanotechnology greatly simplifies assembly procedures and produces structures that can easily be reconfigured. Indeed, computation is a natural extension for DNA nanotechnology, especially considering the vast library of naturally evolved enzymes that nature uses to copy, change, and repair genetic information. The interface between these natural systems and DNA nanotechnology is an active area of research, which generates other possibilities for DNA data storage that leverage nature’s evolved machinery. We foresee that DNA nanostructures made for information storage will find audiences in cryptography, steganography, and other fields 4 , 58 , 116 and that combining DNA data storage with data analysis techniques such as neural networks will afford opportunities in a growing number of sectors. 127

Because of the long-term stability of DNA under appropriate storage conditions, we predict that archival storage will be the most valuable application for DNA data storage. In this cold storage setting, information would be infrequently accessed from a relatively static DNA database. Considering that long-term, archival storage 97 operates over long time scales—decades, centuries, and possibly millennia—this application requires only infrequent access to the stored information, which substantially reduces the impact of reading costs and long read times associated with DNA data storage. While the long-term stability of DNA, itself, is firmly established, further studies on the lifetimes of noncovalently assembled DNA nanostructures will need to be conducted to ensure that data stored in these formats are not compromised over time. Specifically, encapsulation and retrieval of DNA nanostructures in silica beads and other matrices should be examined, as well as the readability of DNA nanomaterials after prolonged freezing. It is also important to mention that the preservation of DNA digital archives can be implemented using not only in vitro substrates but also in vivo approaches. 150 − 153 As three-dimensional nucleic acid nanostructures have also successfully been produced inside cells, 154 there is potentially important synergy between in vitro/in vivo DNA nanotechnology and data storage, which remains, as of yet, unexplored.

Even in the context of archival storage, a DNA database, like its electronic analogues, would benefit greatly from dynamic properties that allow data to be erased, rewritten, and updated. For example, in-storage file operations and computations, as well as the ability to repeatedly access DNA databases, would reduce DNA synthesis costs and abrogate the need to store multiple copies of archives. In this area, DNA nanostructures may present advantages over traditional sequence-based storage methods, as the reconfiguration of these supramolecular moieties is firmly established, though rarely in the context of information storage. The implementation of dynamic properties and a full characterization of the kinetics of these processes would bring DNA-based storage systems one step closer to practical viability. 78

A combination of sequence- and structure-based approaches could represent a significant advancement to overcome the various hurdles associated with DNA data storage. For this field to reach its full potential, cooperation between scientists from a range of research areas will be essential to produce the advanced chemical techniques, instrumentation, characterization methods, and automated analysis tools that are required. As the wide range of topics, from mathematics to polymer chemistry, shows, data storage based on polymers will demand multidisciplinary consortia that ideally design the whole process from data encoding to decoding with a bottom-up approach.

Author Contributions

¶ The authors contributed equally to this work.

The research leading to these results has received funding from the European Union under the Horizon 2020 Program, FET-Open: DNA-FAIRYLIGHTS, Grant Agreement No. 964995.

The authors declare no competing financial interest.

  • Gu M.; Li X.; Cao Y. Optical Storage Arrays: A Perspective for Future Big Data Storage . Light Sci. Appl. 2014, 3 ( 5 ), e177. 10.1038/lsa.2014.58. [ CrossRef ] [ Google Scholar ]
  • Carmean D.; Ceze L.; Seelig G.; Stewart K.; Strauss K.; Willsey M. DNA Data Storage and Hybrid Molecular-Electronic Computing . Proceedings of the IEEE 2019, 107 ( 1 ), 63–72. 10.1109/JPROC.2018.2875386. [ CrossRef ] [ Google Scholar ]
  • Hilbert M.; López P. The World’s Technological Capacity to Store, Communicate, and Compute Information . Science 2011, 332 ( 6025 ), 60–65. 10.1126/science.1200970. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Grass R. N.; Heckel R.; Puddu M.; Paunescu D.; Stark W. J. Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes . Angew. Chem., Int. Ed. 2015, 54 ( 8 ), 2552–2555. 10.1002/anie.201411378. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dabney J.; Knapp M.; Glocke I.; Gansauge M. T.; Weihmann A.; Nickel B.; Valdiosera C.; García N.; Pääbo S.; Arsuaga J. L.; Meyer M. Complete Mitochondrial Genome Sequence of a Middle Pleistocene Cave Bear Reconstructed from Ultrashort DNA Fragments . Proc. Natl. Acad. Sci. U. S. A. 2013, 110 ( 39 ), 15758–15763. 10.1073/pnas.1314445110. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Organick L.; Chen Y. J.; Dumas Ang S.; Lopez R.; Liu X.; Strauss K.; Ceze L. Probing the Physical Limits of Reliable DNA Data Retrieval . Nat. Commun. 2020, 11 ( 1 ), 1–8. 10.1038/s41467-020-14319-8. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Anavy L.; Vaknin I.; Atar O.; Amit R.; Yakhini Z. Data Storage in DNA with Fewer Synthesis Cycles Using Composite DNA Letters . Nat. Biotechnol. 2019, 37 ( 10 ), 1229–1236. 10.1038/s41587-019-0240-x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hao Y.; Li Q.; Fan C.; Wang F. Data Storage Based on DNA . Small Struct 2021, 2 ( 2 ), 2000046. 10.1002/sstr.202000046. [ CrossRef ] [ Google Scholar ]
  • Goldman N.; Bertone P.; Chen S.; Dessimoz C.; Leproust E. M.; Sipos B.; Birney E. Towards Practical, High-Capacity, Low-Maintenance Information Storage in Synthesized DNA . Nature 2013, 494 ( 7435 ), 77–80. 10.1038/nature11875. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meiser L. C.; Nguyen B. H.; Chen Y. J.; Nivala J.; Strauss K.; Ceze L.; Grass R. N. Synthetic DNA Applications in Information Technology . Nat. Commun. 2022, 13 ( 1 ), 1–13. 10.1038/s41467-021-27846-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meiser L. C.; Antkowiak P. L.; Koch J.; Chen W. D.; Kohll A. X.; Stark W. J.; Heckel R.; Grass R. N. Reading and Writing Digital Data in DNA . Nat. Protoc 2020, 15 ( 1 ), 86–101. 10.1038/s41596-019-0244-5. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ceze L.; Nivala J.; Strauss K. Molecular Digital Data Storage Using DNA . Nat. Rev. Genet 2019, 20 ( 8 ), 456–466. 10.1038/s41576-019-0125-3. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Song X.; Reif J. Nucleic Acid Databases and Molecular-Scale Computing . ACS Nano 2019, 13 ( 6 ), 6256–6268. 10.1021/acsnano.9b02562. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Erlich Y.; Zielinski D. DNA Fountain Enables a Robust and Efficient Storage Architecture . Science 2017, 355 ( 6328 ), 950–954. 10.1126/science.aaj2038. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Caruthers M. H. A Brief Review of DNA and RNA Chemical Synthesis . Biochem. Soc. Trans. 2011, 39 ( 2 ), 575–580. 10.1042/BST0390575. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee H. H.; Kalhor R.; Goela N.; Bolot J.; Church G. M. Terminator-Free Template-Independent Enzymatic DNA Synthesis for Digital Information Storage . Nat. Commun. 2019, 10 , 2383. 10.1038/s41467-019-10258-1. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee H.; Wiegand D. J.; Griswold K.; Punthambaker S.; Chun H.; Kohman R. E.; Church G. M. Photon-Directed Multiplexed Enzymatic DNA Synthesis for Molecular Digital Data Storage . Nat. Commun. 2020, 11 , 5246. 10.1038/s41467-020-18681-5. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kubista M.; Andrade J. M.; Bengtsson M.; Forootan A.; Jonák J.; Lind K.; Sindelka R.; Sjöback R.; Sjögreen B.; Strömbom L.; Ståhlberg A.; Zoric N. The Real-Time Polymerase Chain Reaction . Mol. Aspects Med. 2006, 27 ( 2–3 ), 95–125. 10.1016/j.mam.2005.12.007. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shendure J.; Balasubramanian S.; Church G. M.; Gilbert W.; Rogers J.; Schloss J. A.; Waterston R. H. DNA Sequencing at 40: Past, Present and Future . Nature 2017, 550 ( 7676 ), 345. 10.1038/nature24286. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Heckel R.; Mikutis G.; Grass R. N. A Characterization of the DNA Data Storage Channel . Sci. Rep 2019, 9 ( 1 ), 1–12. 10.1038/s41598-019-45832-6. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bell N. A. W.; Keyser U. F. Digitally Encoded DNA Nanostructures for Multiplexed, Single-Molecule Protein Sensing with Nanopores . Nat. Nanotechnol 2016, 11 ( 7 ), 645–651. 10.1038/nnano.2016.50. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dickinson G. D.; Mortuza G. M.; Clay W.; Piantanida L.; Green C. M.; Watson C.; Hayden E. J.; Andersen T.; Kuang W.; Graugnard E.; Zadegan R.; Hughes W. L. An Alternative Approach to Nucleic Acid Memory . Nat. Commun. 2021, 12 ( 1 ), 2371. 10.1038/s41467-021-22277-y. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen K.; Kong J.; Zhu J.; Ermann N.; Predki P.; Keyser U. F. Digital Data Storage Using DNA Nanostructures and Solid-State Nanopores . Nano Lett. 2019, 19 ( 2 ), 1210–1215. 10.1021/acs.nanolett.8b04715. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Seeman N. C.; Sleiman H. F. DNA Nanotechnology . Nat. Rev. Mater. 2018, 3 , 17068. 10.1038/natrevmats.2017.68. [ CrossRef ] [ Google Scholar ]
  • Chen K.; Zhu J.; Bošković F.; Keyser U. F. Nanopore-Based Dna Hard Drives for Rewritable and Secure Data Storage . Nano Lett. 2020, 20 ( 5 ), 3754–3760. 10.1021/acs.nanolett.0c00755. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang D. Y.; Seelig G. Dynamic DNA Nanotechnology Using Strand-Displacement Reactions . Nat. Chem. 2011, 3 ( 2 ), 103–113. 10.1038/nchem.957. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Song T.; Eshra A.; Shah S.; Bui H.; Fu D.; Yang M.; Mokhtar R.; Reif J. Fast and Compact DNA Logic Circuits Based on Single-Stranded Gates Using Strand-Displacing Polymerase . Nat. Nanotechnol 2019, 14 ( 11 ), 1075–1081. 10.1038/s41565-019-0544-5. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Palluk S.; Arlow D. H.; de Rond T.; Barthel S.; Kang J. S.; Bector R.; Baghdassarian H. M.; Truong A. N.; Kim P. W.; Singh A. K.; Hillson N. J.; Keasling J. D. De Novo DNA Synthesis Using Polymerasenucleotide Conjugates . Nat. Biotechnol. 2018, 36 ( 7 ), 645–650. 10.1038/nbt.4173. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kosuri S.; Church G. M. Large-Scale de Novo DNA Synthesis: Technologies and Applications . Nat. Methods 2014, 11 ( 5 ), 499–507. 10.1038/nmeth.2918. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • LeProust E. M.; Peck B. J.; Spirin K.; McCuen H. B.; Moore B.; Namsaraev E.; Caruthers M. H. Synthesis of High-Quality Libraries of Long (150mer) Oligonucleotides by a Novel Depurination Controlled Process . Nucleic Acids Res. 2010, 38 ( 8 ), 2522–2540. 10.1093/nar/gkq163. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Xu C.; Ma B.; Gao Z.; Dong X.; Zhao C.; Liu H. Electrochemical DNA Synthesis and Sequencing on a Single Electrode with Scalability for Integrated Data Storage . Sci. Adv. 2021, 7 , abk0100. 10.1126/sciadv.abk0100. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yoo E.; Choe D.; Shin J.; Cho S.; Cho B. K. Mini Review: Enzyme-Based DNA Synthesis and Selective Retrieval for Data Storage . Comput. Struct Biotechnol J. 2021, 19 , 2468–2476. 10.1016/j.csbj.2021.04.057. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barthel S.; Palluk S.; Hillson N. J.; Keasling J. D.; Arlow D. H. Enhancing Terminal Deoxynucleotidyl Transferase Activity on Substrates with 3′ Terminal Structures for Enzymatic De Novo DNA Synthesis . Genes (Basel) 2020, 11 ( 1 ), 102. 10.3390/genes11010102. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pawloski A. R.; McGall G.; Kuimelis R. G.; Barone D.; Cuppoletti A.; Ciccolella P.; Spence E.; Afroz F.; Bury P.; Chen C.; Chen C.; Pao D.; Le M.; McGee B.; Harkins E.; Savage M.; Narasimhan S.; Goldberg M.; Rava R.; Fodor S. P. A. Photolithographic Synthesis of High-Density DNA Probe Arrays: Challenges and Opportunities . Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures 2007, 25 ( 6 ), 2537. 10.1116/1.2794325. [ CrossRef ] [ Google Scholar ]
  • Nguyen B. H.; Takahashi C. N.; Gupta G.; Smith J. A.; Rouse R.; Berndt P.; Yekhanin S.; Ward D. P.; Ang S. D.; Garvan P.; Parker H. Y.; Carlson R.; Carmean D.; Ceze L.; Strauss K. Scaling DNA Data Storage with Nanoscale Electrode Wells . Sci. Adv. 2021, 7 ( 48 ), 1–7. 10.1126/sciadv.abi6714. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang Y.; Kong L.; Wang F.; Li B.; Ma C.; Chen D.; Liu K.; Fan C.; Zhang H. Information Stored in Nanoscale: Encoding Data in a Single DNA Strand with Base64 . Nano Today 2020, 33 , 100871. 10.1016/j.nantod.2020.100871. [ CrossRef ] [ Google Scholar ]
  • Newman S.; Stephenson A. P.; Willsey M.; Nguyen B. H.; Takahashi C. N.; Strauss K.; Ceze L. High Density DNA Data Storage Library via Dehydration with Digital Microfluidic Retrieval . Nat. Commun. 2019, 10 ( 1 ), 1706. 10.1038/s41467-019-09517-y. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Erlich Y. A Vision for Ubiquitous Sequencing . Genome Res. 2015, 25 ( 10 ), 1411–1416. 10.1101/gr.191692.115. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Takahashi C. N.; Nguyen B. H.; Strauss K.; Ceze L. Demonstration of End-to-End Automation of DNA Data Storage . Sci. Rep 2019, 9 ( 1 ), 1–6. 10.1038/s41598-019-41228-8. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Choi H.; Choi Y.; Choi J.; Lee A. C.; Yeom H.; Hyun J.; Ryu T.; Kwon S. Purification of Multiplex Oligonucleotide Libraries by Synthesis and Selection . Nat. Biotechnol. 2022, 40 ( 1 ), 47–53. 10.1038/s41587-021-00988-3. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang Y.; Wang M.; Wang J.; Liu J.. An Adaptive Data Redundancy Strategy in Cloud Storage . In 2019 IEEE 2nd International Conference on Electronic Information and Communication Technology (ICEICT) ; IEEE, 2019; pp 40–45. [ Google Scholar ]
  • Antkowiak P. L.; Lietard J.; Darestani M. Z.; Somoza M. M.; Stark W. J.; Heckel R.; Grass R. N. Low Cost DNA Data Storage Using Photolithographic Synthesis and Advanced Information Reconstruction and Error Correction . Nat. Commun. 2020, 11 , 5345. 10.1038/s41467-020-19148-3. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nguyen T. T.; Cai K.; Schouhamer Immink K. A.; Kiah H. M. Capacity-Approaching Constrained Codes With Error Correction for DNA-Based Data Storage . IEEE Trans Inf Theory 2021, 67 ( 8 ), 5602–5613. 10.1109/TIT.2021.3066430. [ CrossRef ] [ Google Scholar ]
  • Blawat M.; Gaedke K.; Hütter I.; Chen X.-M.; Turczyk B.; Inverso S.; Pruitt B. W.; Church G. M. Forward Error Correction for DNA Data Storage . Procedia Comput. Sci. 2016, 80 , 1011–1022. 10.1016/j.procs.2016.05.398. [ CrossRef ] [ Google Scholar ]
  • Tang Y.; Farnoud F.. Correcting Deletion Errors in DNA Data Storage with Enzymatic Synthesis . In 2021 IEEE Information Theory Workshop (ITW) ; IEEE, 2021; pp 1–6. [ Google Scholar ]
  • Lu X.; Kim S. Design of Nonbinary Error Correction Codes with a Maximum Run-Length Constraint to Correct a Single Insertion or Deletion Error for DNA Storage . IEEE Access 2021, 9 , 135354–135363. 10.1109/ACCESS.2021.3116245. [ CrossRef ] [ Google Scholar ]
  • Press W. H.; Hawkins J. A.; Jones S. K.; Schaub J. M.; Finkelstein I. J. HEDGES Error-Correcting Code for DNA Storage Corrects Indels and Allows Sequence Constraints . Proc. Natl. Acad. Sci. U. S. A. 2020, 117 ( 31 ), 18489–18496. 10.1073/pnas.2004821117. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dong Y.; Sun F.; Ping Z.; Ouyang Q.; Qian L. DNA Storage: Research Landscape and Future Prospects . Natl. Sci. Rev. 2020, 7 ( 6 ), 1092–1107. 10.1093/nsr/nwaa007. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosseini M.; Pratas D.; Pinho A. A Survey on Data Compression Methods for Biological Sequences . Information 2016, 7 ( 4 ), 56. 10.3390/info7040056. [ CrossRef ] [ Google Scholar ]
  • Vishwakarma R. High Density Data Storage In Dna Using An Efficient Message Encoding Scheme . International Journal of Information Technology Convergence and Services 2012, 2 ( 2 ), 41–46. 10.5121/ijitcs.2012.2204. [ CrossRef ] [ Google Scholar ]
  • Choi Y.; Ryu T.; Lee A. C.; Choi H.; Lee H.; Park J.; Song S. H.; Kim S.; Kim H.; Park W.; Kwon S. High Information Capacity DNA-Based Data Storage with Augmented Encoding Characters Using Degenerate Bases . Sci. Rep 2019, 9 , 6582. 10.1038/s41598-019-43105-w. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ren Y.; Zhang Y.; Liu Y.; Wu Q.; Su J.; Wang F.; Chen D.; Fan C.; Liu K.; Zhang H. DNA-Based Concatenated Encoding System for High-Reliability and High-Density Data Storage . Small Methods 2022, 6 ( 4 ), 2101335. 10.1002/smtd.202101335. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tabatabaei S. K.; Pham B.; Pan C.; Liu J.; Chandak S.; Shorkey S. A.; Hernandez A. G.; Aksimentiev A.; Chen M.; Schroeder C. M.; Milenkovic O. Expanding the Molecular Alphabet of DNA-Based Data Storage Systems with Neural Network Nanopore Readout Processing . Nano Lett. 2022, 22 ( 5 ), 1905–1914. 10.1021/acs.nanolett.1c04203. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Allentoft M. E.; Collins M.; Harker D.; Haile J.; Oskam C. L.; Hale M. L.; Campos P. F.; Samaniego J. A.; Gilbert T. P. M.; Willerslev E.; Zhang G.; Scofield R. P.; Holdaway R. N.; Bunce M. The Half-Life of DNA in Bone: Measuring Decay Kinetics in 158 Dated Fossils . Proceedings of the Royal Society B: Biological Sciences 2012, 279 ( 1748 ), 4724–4733. 10.1098/rspb.2012.1745. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van der Valk T.; Pečnerová P.; Díez-del-Molino D.; Bergström A.; Oppenheimer J.; Hartmann S.; Xenikoudakis G.; Thomas J. A.; Dehasque M.; Sağlıcan E.; Fidan F. R.; Barnes I.; Liu S.; Somel M.; Heintzman P. D.; Nikolskiy P.; Shapiro B.; Skoglund P.; Hofreiter M.; Lister A. M.; Götherström A.; Dalén L. Million-Year-Old DNA Sheds Light on the Genomic History of Mammoths . Nature 2021, 591 ( 7849 ), 265–269. 10.1038/s41586-021-03224-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antkowiak P. L.; Koch J.; Nguyen B. H.; Stark W. J.; Strauss K.; Ceze L.; Grass R. N. Integrating DNA Encapsulates and Digital Microfluidics for Automated Data Storage in DNA . Small 2022, 18 , 2107381. 10.1002/smll.202107381. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bonnet J.; Colotte M.; Coudy D.; Couallier V.; Portier J.; Morin B.; Tuffet S. Chain and Conformation Stability of Solid-State DNA: Implications for Room Temperature Storage . Nucleic Acids Res. 2010, 38 ( 5 ), 1531–1546. 10.1093/nar/gkp1060. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Coudy D.; Colotte M.; Luis A.; Tuffet S.; Bonnet J. Long Term Conservation of DNA at Ambient Temperature. Implications for DNA Data Storage . PLoS One 2021, 16 ( 11 ), e0259868. 10.1371/journal.pone.0259868. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen W. D.; Kohll A. X.; Nguyen B. H.; Koch J.; Heckel R.; Stark W. J.; Ceze L.; Strauss K.; Grass R. N. Combining Data Longevity with High Storage Capacity—Layer-by-Layer DNA Encapsulated in Magnetic Nanoparticles . Adv. Funct. Mater. 2019, 29 , 1901672. 10.1002/adfm.201901672. [ CrossRef ] [ Google Scholar ]
  • Paunescu D.; Puddu M.; Soellner J. O. B.; Stoessel P. R.; Grass R. N. Reversible DNA Encapsulation in Silica to Produce ROS-Resistant and Heat-Resistant Synthetic DNA “Fossils . Nat. Protoc 2013, 8 ( 12 ), 2440–2448. 10.1038/nprot.2013.154. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Koch J.; Gantenbein S.; Masania K.; Stark W. J.; Erlich Y.; Grass R. N. A DNA-of-Things Storage Architecture to Create Materials with Embedded Memory . Nat. Biotechnol. 2020, 38 ( 1 ), 39–43. 10.1038/s41587-019-0356-z. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kohll A. X.; Antkowiak P. L.; Chen W. D.; Nguyen B. H.; Stark W. J.; Ceze L.; Strauss K.; Grass R. N. Stabilizing Synthetic DNA for Long-Term Data Storage with Earth Alkaline Salts . Chem. Commun. 2020, 56 ( 25 ), 3613–3616. 10.1039/D0CC00222D. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Choi Y.; Bae H. J.; Lee A. C.; Choi H.; Lee D.; Ryu T.; Hyun J.; Kim S.; Kim H.; Song S. H.; Kim K.; Park W.; Kwon S. DNA Micro-Disks for the Management of DNA-Based Data Storage with Index and Write-Once–Read-Many (WORM) Memory Features . Adv. Mater. 2020, 32 ( 37 ), 2001249. 10.1002/adma.202001249. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Organick L.; Nguyen B. H.; McAmis R.; Chen W. D.; Kohll A. X.; Ang S. D.; Grass R. N.; Ceze L.; Strauss K. An Empirical Comparison of Preservation Methods for Synthetic DNA Data Storage . Small Methods 2021, 5 ( 5 ), 2001094. 10.1002/smtd.202001094. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liu Y.; Zheng Z.; Gong H.; Liu M.; Guo S.; Li G.; Wang X.; Kaplan D. L. DNA Preservation in Silk . Biomater Sci. 2017, 5 ( 7 ), 1279–1292. 10.1039/C6BM00741D. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Antkowiak P. L.; Koch J.; Rzepka P.; Nguyen B. H.; Strauss K.; Stark W. J.; Grass R. N. Anhydrous Calcium Phosphate Crystals Stabilize DNA for Dry Storage . Chem. Commun. 2022, 58 ( 19 ), 3174–3177. 10.1039/D2CC00414C. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Clermont D.; Santoni S.; Saker S.; Gomard M.; Gardais E.; Bizet C. Assessment of DNA Encapsulation, a New Room-Temperature DNA Storage Method . Biopreserv Biobank 2014, 12 ( 3 ), 176–183. 10.1089/bio.2013.0082. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fabre A. L.; Luis A.; Colotte M.; Tuffet S.; Bonnet J. High DNA Stability in White Blood Cells and Buffy Coat Lysates Stored at Ambient Temperature under Anoxic and Anhydrous Atmosphere . PLoS One 2017, 12 ( 11 ), e0188547. 10.1371/journal.pone.0188547. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Matange K.; Tuck J. M.; Keung A. J. DNA Stability: A Central Design Consideration for DNA Data Storage Systems . Nat. Commun. 2021, 12 ( 1 ), 1358. 10.1038/s41467-021-21587-5. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Organick L.; Ang S. D.; Chen Y. J.; Lopez R.; Yekhanin S.; Makarychev K.; Racz M. Z.; Kamath G.; Gopalan P.; Nguyen B.; Takahashi C. N.; Newman S.; Parker H. Y.; Rashtchian C.; Stewart K.; Gupta G.; Carlson R.; Mulligan J.; Carmean D.; Seelig G.; Ceze L.; Strauss K. Random Access in Large-Scale DNA Data Storage . Nat. Biotechnol. 2018, 36 ( 3 ), 242–248. 10.1038/nbt.4079. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tomek K. J.; Volkel K.; Simpson A.; Hass A. G.; Indermaur E. W.; Tuck J. M.; Keung A. J. Driving the Scalability of DNA-Based Information Storage Systems . ACS Synth. Biol. 2019, 8 ( 6 ), 1241–1248. 10.1021/acssynbio.9b00100. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tabatabaei S. K.; Wang B.; Athreya N. B. M.; Enghiad B.; Hernandez A. G.; Fields C. J.; Leburton J.-P.; Soloveichik D.; Zhao H.; Milenkovic O. DNA Punch Cards for Storing Data on Native DNA Sequences via Enzymatic Nicking . Nat. Commun. 2020, 11 ( 1 ), 1742. 10.1038/s41467-020-15588-z. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mikutis G.; Schmid L.; Stark W. J.; Grass R. N. Length-Dependent DNA Degradation Kinetic Model: Decay Compensation in DNA Tracer Concentration Measurements . AIChE J. 2019, 65 ( 1 ), 40–48. 10.1002/aic.16433. [ CrossRef ] [ Google Scholar ]
  • Hossein Tabatabaei Yazdi S. M.; Gabrys R.; Milenkovic O. Portable and Error-Free DNA-Based Data Storage . Sci. Rep 2017, 7 , 5011. 10.1038/s41598-017-05188-1. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Banal J. L.; Shepherd T. R.; Berleant J.; Huang H.; Reyes M.; Ackerman C. M.; Blainey P. C.; Bathe M. Random Access DNA Memory Using Boolean Search in an Archival File Storage System . Nat. Mater. 2021, 20 ( 9 ), 1272–1280. 10.1038/s41563-021-01021-3. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen Y. J.; Takahashi C. N.; Organick L.; Bee C.; Ang S. D.; Weiss P.; Peck B.; Seelig G.; Ceze L.; Strauss K. Quantifying Molecular Bias in DNA Data Storage . Nat. Commun. 2020, 11 , 3264. 10.1038/s41467-020-16958-3. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Winston C.; Organick L.; Ward D.; Ceze L.; Strauss K.; Chen Y.-J. Combinatorial PCR Method for Efficient, Selective Oligo Retrieval from Complex Oligo Pools . ACS Synth. Biol. 2022, 11 ( 5 ), 1727–1734. 10.1021/acssynbio.1c00482. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lin K. N.; Volkel K.; Tuck J. M.; Keung A. J. Dynamic and Scalable DNA-Based Information Storage . Nat. Commun. 2020, 11 ( 1 ), 2981. 10.1038/s41467-020-16797-2. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Grass R. N.; Heckel R.; Dessimoz C.; Stark W. J. Genomic Encryption of Digital Data Stored in Synthetic DNA . Angew. Chem., Int. Ed. 2020, 59 ( 22 ), 8476–8480. 10.1002/anie.202001162. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kim J.; Bae J. H.; Baym M.; Zhang D. Y. Metastable Hybridization-Based DNA Information Storage to Allow Rapid and Permanent Erasure . Nat. Commun. 2020, 11 , 5008. 10.1038/s41467-020-18842-6. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tabatabaei Yazdi S. M. H.; Yuan Y.; Ma J.; Zhao H.; Milenkovic O. A Rewritable, Random-Access DNA-Based Storage System . Sci. Rep 2015, 5 , 1–10. 10.1038/srep14138. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mayer C.; McInroy G. R.; Murat P.; van Delft P.; Balasubramanian S. An Epigenetics-Inspired DNA-Based Data Storage System . Angew. Chem. 2016, 128 ( 37 ), 11310–11314. 10.1002/ange.201605531. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang Y.; Ren Y.; Liu Y.; Wang F.; Zhang H.; Liu K. Preservation and Encryption in DNA Digital Data Storage . ChemPlusChem. 2022, 87 ( 9 ), e202200183. 10.1002/cplu.202200183. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ari Ş.; Arikan M.. Next-Generation Sequencing: Advantages, Disadvantages, and Future . In Plant Omics: Trends and Applications ; Springer International Publishing: Cham, 2016; pp 109–135. [ Google Scholar ]
  • Goodwin S.; McPherson J. D.; McCombie W. R. Coming of Age: Ten Years of next-Generation Sequencing Technologies . Nat. Rev. Genet 2016, 17 ( 6 ), 333–351. 10.1038/nrg.2016.49. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lopez R.; Chen Y. J.; Dumas Ang S.; Yekhanin S.; Makarychev K.; Racz M. Z.; Seelig G.; Strauss K.; Ceze L. DNA Assembly for Nanopore Data Storage Readout . Nat. Commun. 2019, 10 ( 1 ), 1–9. 10.1038/s41467-019-10978-4. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang Y.; Zhang S.; Jia W.; Fan P.; Wang L.; Li X.; Chen J.; Cao Z.; Du X.; Liu Y.; Wang K.; Hu C.; Zhang J.; Hu J.; Zhang P.; Chen H.-Y.; Huang S. Identification of Nucleoside Monophosphates and Their Epigenetic Modifications Using an Engineered Nanopore . Nat. Nanotechnol 2022, 17 , 976. 10.1038/s41565-022-01169-2. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Deamer D.; Akeson M.; Branton D. Three Decades of Nanopore Sequencing . Nat. Biotechnol. 2016, 34 ( 5 ), 518–524. 10.1038/nbt.3423. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shannon C. E. A Mathematical Theory of Communication . Bell Syst. Technol. J. 1948, 27 , 379–423. 10.1002/j.1538-7305.1948.tb01338.x. [ CrossRef ] [ Google Scholar ]
  • Roth R. Introduction . In Introduction to Coding Theory ; Cambridge University Press: Cambridge, 2006; pp 1–25. [ Google Scholar ]
  • Chandak S.; Ji H.; Tatwawadi K.; Lau B.; Mardia J.; Kubit M.; Neu J.; Griffin P.; Wootters M.; Weissman T.. Improved Read/Write Cost Tradeoff in DNA-Based Data Storage Using LDPC Codes . In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ; IEEE, 2019; pp 147–156. [ Google Scholar ]
  • Shomorony I.; Heckel R. Information-Theoretic Foundations of DNA Data Storage . Foundations and Trends in Communications and Information Theory 2022, 19 ( 1 ), 1–106. 10.1561/0100000117. [ CrossRef ] [ Google Scholar ]
  • Cheraghchi M.; Gabrys R.; Milenkovic O.; Ribeiro J. Coded Trace Reconstruction . IEEE Trans Inf Theory 2020, 66 ( 10 ), 6084–6103. 10.1109/TIT.2020.2996377. [ CrossRef ] [ Google Scholar ]
  • Chrisnata J.; Kiah H. M.; Yaakobi E.. Optimal Reconstruction Codes for Deletion Channels . arXiv , April 13, 2020, 2004.06032, ver. 1 . 10.48550/arXiv.2004.06032. [ CrossRef ]
  • Gabrys R.; Yaakobi E.. Sequence Reconstruction over the Deletion Channel . In IEEE Transactions on Information Theory , Vol. 64 ; Institute of Electrical and Electronics Engineers Inc., 2018; pp 2924–2931. [ Google Scholar ]
  • Sabary O.; Yaakobi E.; Yucovich A.. The Error Probability of Maximum-Likelihood Decoding over Two Deletion/Insertion Channels . In 2020 IEEE International Symposium on Information Theory (ISIT) ; IEEE, 2020; pp 763–768. [ Google Scholar ]
  • Rutten M. G. T. A.; Vaandrager F. W.; Elemans J. A. A. W.; Nolte R. J. M. Encoding Information into Polymers . Nat. Rev. Chem. 2018, 2 ( 11 ), 365–381. 10.1038/s41570-018-0051-5. [ CrossRef ] [ Google Scholar ]
  • Fontana R. E.; Decad G. M. Moore’s Law Realities for Recording Systems and Memory Storage Components: HDD, Tape, NAND, and Optical . AIP Adv. 2018, 8 ( 5 ), 056506. 10.1063/1.5007621. [ CrossRef ] [ Google Scholar ]
  • Jensen M. A.; Davis R. W. Template-Independent Enzymatic Oligonucleotide Synthesis (TiEOS): Its History, Prospects, and Challenges . Biochemistry 2018, 57 ( 12 ), 1821–1832. 10.1021/acs.biochem.7b00937. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bošković F.; Ohmann A.; Keyser U. F.; Chen K. DNA Structural Barcode Copying and Random Access . Small Struct 2021, 2 ( 5 ), 2000144. 10.1002/sstr.202000144. [ CrossRef ] [ Google Scholar ]
  • Zhu J.; Ermann N.; Chen K.; Keyser U. F. Image Encoding Using Multi-Level DNA Barcodes with Nanopore Readout . Small 2021, 17 ( 28 ), 2100711. 10.1002/smll.202100711. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pinheiro A. v.; Han D.; Shih W. M.; Yan H. Challenges and Opportunities for Structural DNA Nanotechnology . Nat. Nanotechnol 2011, 6 ( 12 ), 763–772. 10.1038/nnano.2011.187. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Agarwal N. P.; Matthies M.; Joffroy B.; Schmidt T. L. Structural Transformation of Wireframe DNA Origami via DNA Polymerase Assisted Gap-Filling . ACS Nano 2018, 12 ( 3 ), 2546–2553. 10.1021/acsnano.7b08345. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sobczak J. P. J.; Martin T. G.; Gerling T.; Dietz H. Rapid Folding of DNA into Nanoscale Shapes at Constant Temperature . Science 2012, 338 ( 6113 ), 1458–1461. 10.1126/science.1229919. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rizzuto F. J.; Platnich C. M.; Luo X.; Shen Y.; Dore M. D.; Lachance-Brais C.; Guarné A.; Cosa G.; Sleiman H. F. A Dissipative Pathway for the Structural Evolution of DNA Fibres . Nat. Chem. 2021, 13 ( 9 ), 843–849. 10.1038/s41557-021-00751-w. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bornholt J.; Lopez R.; Carmean D.; Ceze L.; Seelig G.; Strauss K. A DNA-Based Archival Storage System . IEEE Micro 2017, 37 , 98–104. 10.1109/MM.2017.70. [ CrossRef ] [ Google Scholar ]
  • Dey S.; Fan C.; Gothelf K. v.; Li J.; Lin C.; Liu L.; Liu N.; Nijenhuis M. A. D.; Saccà B.; Simmel F. C.; Yan H.; Zhan P. DNA Origami . Nature Reviews Methods Primers 2021, 1 ( 1 ), 13. 10.1038/s43586-020-00009-8. [ CrossRef ] [ Google Scholar ]
  • Jun H.; Zhang F.; Shepherd T.; Ratanalert S.; Qi X.; Yan H.; Bathe M. Autonomously Designed Free-Form 2D DNA Origami . Sci. Adv. 2019, 5 ( 1 ), eaav0655. 10.1126/sciadv.aav0655. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yao G.; Zhang F.; Wang F.; Peng T.; Liu H.; Poppleton E.; Šulc P.; Jiang S.; Liu L.; Gong C.; Jing X.; Liu X.; Wang L.; Liu Y.; Fan C.; Yan H. Meta-DNA Structures . Nat. Chem. 2020, 12 ( 11 ), 1067–1075. 10.1038/s41557-020-0539-8. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Voigt N. v.; Tørring T.; Rotaru A.; Jacobsen M. F.; Ravnsbæk J. B.; Subramani R.; Mamdouh W.; Kjems J.; Mokhir A.; Besenbacher F.; Gothelf K. V. Single-Molecule Chemical Reactions on DNA Origami . Nat. Nanotechnol 2010, 5 ( 3 ), 200–203. 10.1038/nnano.2010.5. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pal S.; Deng Z.; Ding B.; Yan H.; Liu Y. DNA-Origami-Directed Self-Assembly of Discrete Silver-Nanoparticle Architectures . Angew. Chem. 2010, 122 ( 15 ), 2760–2764. 10.1002/ange.201000330. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Halvorsen K.; Wong W. P. Binary DNA Nanostructures for Data Encryption . PLoS One 2012, 7 ( 9 ), e44212. 10.1371/journal.pone.0044212. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chandrasekaran A. R.; Levchenko O.; Patel D. S.; Macisaac M.; Halvorsen K. Addressable Configurations of DNA Nanostructures for Rewritable Memory . Nucleic Acids Res. 2017, 45 ( 19 ), 11459–11465. 10.1093/nar/gkx777. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shin J. S.; Pierce N. A. Rewritable Memory by Controllable Nanopatterning of DNA . Nano Lett. 2004, 4 ( 5 ), 905–909. 10.1021/nl049658r. [ CrossRef ] [ Google Scholar ]
  • Zhang Y.; Li F.; Li M.; Mao X.; Jing X.; Liu X.; Li Q.; Li J.; Wang L.; Fan C.; Zuo X. Encoding Carbon Nanotubes with Tubular Nucleic Acids for Information Storage . J. Am. Chem. Soc. 2019, 141 ( 44 ), 17861–17866. 10.1021/jacs.9b09116. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pan V.; Wang W.; Heaven I.; Bai T.; Cheng Y.; Chen C.; Ke Y.; Wei B. Monochromatic Fluorescent Barcodes Hierarchically Assembled from Modular DNA Origami Nanorods . ACS Nano 2021, 15 ( 10 ), 15892–15901. 10.1021/acsnano.1c03796. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Takinoue M.; Suyama A. Hairpin-DNA Memory Using Molecular Addressing . Small 2006, 2 ( 11 ), 1244–1247. 10.1002/smll.200600237. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mottaghi M. D.; Dwyer C. Thousand-Fold Increase in Optical Storage Density by Polychromatic Address Multiplexing on Self-Assembled DNA Nanostructures . Adv. Mater. 2013, 25 ( 26 ), 3593–3598. 10.1002/adma.201301141. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lin C.; Jungmann R.; Leifer A. M.; Li C.; Levner D.; Church G. M.; Shih W. M.; Yin P. Submicrometre Geometrically Encoded Fluorescent Barcodes Self-Assembled from DNA . Nat. Chem. 2012, 4 ( 10 ), 832–839. 10.1038/nchem.1451. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Choudhary A.; Maffeo C.; Aksimentiev A. Multi-Resolution Simulation of DNA Transport through Large Synthetic Nanostructures . Phys. Chem. Chem. Phys. 2022, 24 ( 5 ), 2706–2716. 10.1039/D1CP04589J. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schnitzbauer J.; Strauss M. T.; Schlichthaerle T.; Schueder F.; Jungmann R. Super-Resolution Microscopy with DNA-PAINT . Nat. Protoc 2017, 12 ( 6 ), 1198–1228. 10.1038/nprot.2017.024. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wei B.; Dai M.; Yin P. Complex Shapes Self-Assembled from Single-Stranded DNA Tiles . Nature 2012, 485 ( 7400 ), 623–626. 10.1038/nature11075. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tikhomirov G.; Petersen P.; Qian L. Fractal Assembly of Micrometre-Scale DNA Origami Arrays with Arbitrary Patterns . Nature 2017, 552 ( 7683 ), 67–71. 10.1038/nature24655. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rothemund P. W. K. Folding DNA to Create Nanoscale Shapes and Patterns . Nature 2006, 440 ( 7082 ), 297–302. 10.1038/nature04586. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Platnich C. M.; Rizzuto F. J.; Cosa G.; Sleiman H. F. Single-Molecule Methods in Structural DNA Nanotechnology . Chem. Soc. Rev. 2020, 49 ( 13 ), 4220–4233. 10.1039/C9CS00776H. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang Y.; Wang F.; Chao J.; Xie M.; Liu H.; Pan M.; Kopperger E.; Liu X.; Li Q.; Shi J.; Wang L.; Hu J.; Wang L.; Simmel F. C.; Fan C. DNA Origami Cryptography for Secure Communication . Nat. Commun. 2019, 10 ( 1 ), 5469. 10.1038/s41467-019-13517-3. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Numajiri K.; Kimura M.; Kuzuya A.; Komiyama M. Stepwise and Reversible Nanopatterning of Proteins on a DNA Origami Scaffold . Chem. Commun. 2010, 46 ( 28 ), 5127–5129. 10.1039/c0cc00044b. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang P.; Rahman M. A.; Zhao Z.; Weiss K.; Zhang C.; Chen Z.; Hurwitz S. J.; Chen Z. G.; Shin D. M.; Ke Y. Visualization of the Cellular Uptake and Trafficking of DNA Origami Nanostructures in Cancer Cells . J. Am. Chem. Soc. 2018, 140 ( 7 ), 2478–2484. 10.1021/jacs.7b09024. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bošković F.; Keyser U. F.. Nanopore Microscope Identifies RNA Isoforms with Structural Colours . Nat. Chem. 2022. 10.1038/s41557-022-01037-5. [ PubMed ] [ CrossRef ]
  • Misiunas K.; Ermann N.; Keyser U. F. QuipuNet: Convolutional Neural Network for Single-Molecule Nanopore Sensing . Nano Lett. 2018, 18 ( 6 ), 4040–4045. 10.1021/acs.nanolett.8b01709. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cao C.; Krapp L. F.; Al Ouahabi A.; König N. F.; Cirauqui N.; Radenovic A.; Lutz J. F.; Peraro M. D. Aerolysin Nanopores Decode Digital Information Stored in Tailored Macromolecular Analytes . Sci. Adv. 2020, 6 ( 50 ), eabc2661. 10.1126/sciadv.abc2661. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ng C. C. A.; Tam W. M.; Yin H.; Wu Q.; So P. K.; Wong M. Y. M.; Lau F. C. M.; Yao Z. P. Data Storage Using Peptide Sequences . Nat. Commun. 2021, 12 ( 1 ), 1–10. 10.1038/s41467-021-24496-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dahlhauser S. D.; Moor S. R.; Vera M. S.; York J. T.; Ngo P.; Boley A. J.; Coronado J. N.; Simpson Z. B.; Anslyn E. v. Efficient Molecular Encoding in Multifunctional Self-Immolative Urethanes . Cell Rep. Phys. Sci. 2021, 2 ( 4 ), 100393. 10.1016/j.xcrp.2021.100393. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brinkerhoff H.; Kang A. S. W.; Liu J.; Aksimentiev A.; Dekker C. Multiple Rereads of Single Proteins at Single–Amino Acid Resolution Using Nanopores . Science 2021, 374 , 1509. 10.1126/science.abl4381. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Adleman L. M. Molecular Computation of Solutions to Combinatorial Problems . Science 1994, 266 ( 5187 ), 1021–1024. 10.1126/science.7973651. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ogasawara S.; Fujimoto K. Solution of a SAT Problem on a Photochemical DNA Computer . Chem. Lett. 2005, 34 ( 3 ), 378–379. 10.1246/cl.2005.378. [ CrossRef ] [ Google Scholar ]
  • Lipton R. J. DNA Solution of Hard Computational Problems . Science 1995, 268 ( 5210 ), 542–545. 10.1126/science.7725098. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Su H.; Xu J.; Wang Q.; Wang F.; Zhou X. High-Efficiency and Integrable DNA Arithmetic and Logic System Based on Strand Displacement Synthesis . Nat. Commun. 2019, 10 ( 1 ), 1–8. 10.1038/s41467-019-13310-2. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liu H.; Wang J.; Song S.; Fan C.; Gothelf K. v. A DNA-Based System for Selecting and Displaying the Combined Result of Two Input Variables . Nat. Commun. 2015, 6 , 1–7. 10.1038/ncomms10089. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Winfree E.; Liu F.; Wenzler L. A.; Seeman N. C. Design and Self-Assembly of Two-Dimensional DNA Crystals . Nature 1998, 394 ( 6693 ), 539–544. 10.1038/28998. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mao C.; LaBean T. H.; Reif J. H.; Seeman N. C. Logical Computation Using Algorithmic Self-Assembly of DNA Triple-Crossover Molecules . Nature 2000, 407 ( 6803 ), 493–496. 10.1038/35035038. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yin P.; Hariadi R. F.; Sahu S.; Choi H. M. T.; Sung H. P.; LaBean T. H.; Reif J. H. Programming DNA Tube Circumferences . Science 2008, 321 ( 5890 ), 824–826. 10.1126/science.1157312. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Woods D.; Doty D.; Myhrvold C.; Hui J.; Zhou F.; Yin P.; Winfree E. Diverse and Robust Molecular Algorithms Using Reprogrammable DNA Self-Assembly . Nature 2019, 567 ( 7748 ), 366–372. 10.1038/s41586-019-1014-9. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rothemund P. W. K.; Papadakis N.; Winfree E. Algorithmic Self-Assembly of DNA Sierpinski Triangles . PLoS Biol. 2004, 2 ( 12 ), e424. 10.1371/journal.pbio.0020424. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Church G. M.; Gao Y.; Kosuri S. Next-Generation Digital Information Storage in DNA . Science 2012, 337 ( 6102 ), 1628–1628. 10.1126/science.1226355. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoshika S.; Leal N. A.; Kim M.-J.; Kim M.-S.; Karalkar N. B.; Kim H.-J.; Bates A. M.; Watkins N. E.; SantaLucia H. A.; Meyer A. J.; DasGupta S.; Piccirilli J. A.; Ellington A. D.; SantaLucia J.; Georgiadis M. M.; Benner S. A. Hachimoji DNA and RNA: A Genetic System with Eight Building Blocks . Science 2019, 363 ( 6429 ), 884–887. 10.1126/science.aat0971. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Avakyan N.; Greschner A. A.; Aldaye F.; Serpell C. J.; Toader V.; Petitjean A.; Sleiman H. F. Reprogramming the Assembly of Unmodified DNA with a Small Molecule . Nat. Chem. 2016, 8 ( 4 ), 368–376. 10.1038/nchem.2451. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li W.; Zhou J.; Maccaferri N.; Krahne R.; Wang K.; Garoli D. Enhanced Optical Spectroscopy for Multiplexed DNA and Protein-Sequencing with Plasmonic Nanopores: Challenges and Prospects . Anal. Chem. 2022, 94 ( 2 ), 503–514. 10.1021/acs.analchem.1c04459. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen W.; Han M.; Zhou J.; Ge Q.; Wang P.; Zhang X.; Zhu S.; Song L.; Yuan Y. An Artificial Chromosome for Data Storage . Natl. Sci. Rev. 2021, 8 ( 5 ), 1–9. 10.1093/nsr/nwab028. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Farzadfard F.; Lu T. K. Genomically Encoded Analog Memory with Precise in Vivo DNA Writing in Living Cell Populations . Science 2014, 346 ( 6211 ), 1256272. 10.1126/science.1256272. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yang L.; Nielsen A. A. K.; Fernandez-Rodriguez J.; McClune C. J.; Laub M. T.; Lu T. K.; Voigt C. A. Permanent Genetic Memory with > 1-Byte Capacity . Nat. Methods 2014, 11 ( 12 ), 1261–1266. 10.1038/nmeth.3147. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Burrill D. R.; Silver P. A. Making Cellular Memories . Cell 2010, 140 ( 1 ), 13–18. 10.1016/j.cell.2009.12.034. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Han D.; Qi X.; Myhrvold C.; Wang B.; Dai M.; Jiang S.; Bates M.; Liu Y.; An B.; Zhang F.; Yan H.; Yin P. Single-Stranded DNA and RNA Origami . Science 2017, 358 ( 6369 ), aao2648. 10.1126/science.aao2648. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Open access
  • Published: 02 April 2024

DNA methylation remodeling and the functional implication during male gametogenesis in rice

  • Xue Li 1   na1 ,
  • Bo Zhu 1   na1 ,
  • Feng Zhao 1 ,
  • Qian Liu 1 ,
  • Jiahao Wang 1 ,
  • Miaomiao Ye 1 ,
  • Siyuan Chen 1 ,
  • Junwei Nie 3 ,
  • Lizhong Xiong 1 ,
  • Yu Zhao 1 ,
  • Changyin Wu 1 &
  • Dao-Xiu Zhou   ORCID: orcid.org/0000-0002-1540-0598 1 , 4  

Genome Biology volume  25 , Article number:  84 ( 2024 ) Cite this article

673 Accesses

Metrics details

Epigenetic marks are reprogrammed during sexual reproduction. In flowering plants, DNA methylation is only partially remodeled in the gametes and the zygote. However, the timing and functional significance of the remodeling during plant gametogenesis remain obscure.

Here we show that DNA methylation remodeling starts after male meiosis in rice, with non-CG methylation, particularly at CHG sites, being first enhanced in the microspore and subsequently decreased in sperm. Functional analysis of rice CHG methyltransferase genes CMT3a and CMT3b indicates that CMT3a functions as the major CHG methyltransferase in rice meiocyte, while CMT3b is responsible for the increase of CHG methylation in microspore. The function of the two histone demethylases JMJ706 and JMJ707 that remove H3K9me2 may contribute to the decreased CHG methylation in sperm. During male gametogenesis CMT3a mainly silences TE and TE-related genes while CMT3b is required for repression of genes encoding factors involved in transcriptional and translational activities. In addition, CMT3b functions to repress zygotic gene expression in egg and participates in establishing the zygotic epigenome upon fertilization.

Collectively, the results indicate that DNA methylation is dynamically remodeled during male gametogenesis, distinguish the function of CMT3a and CMT3b in sex cells, and underpin the functional significance of DNA methylation remodeling during rice reproduction.

DNA cytosine methylation is a hallmark for repression of transposable elements (TE) and related sequences in complex genomes such as flowering plants and vertebrates. In flowering plants, DNA cytosine methylation occurs in the context of CG, CHG and CHH (where H is A, C, or T) sequences. CG methylation is maintained during cell division by DNA methyltransferase1 (MET1), which recognizes and methylates hemi-methylated CG sites in the newly replicated DNA. Non-CG (i.e. CHG and CHH) methylation in heterochromatin is maintained by plant-specific Chromomethylases3 (CMT3, at CHG sites) and CMT2 (at CHH and CHG sites), which bind to the histone methylation mark H3K9me2, while CHH methylation in euchromatin regions is maintained by Domains Rearranged Methyltransferase2 (DRM2) guided by related siRNA [ 1 ]. DRM2 also mediates de novo DNA methylation regardless of sequence contexts [ 1 ]. All three methylation sites are found in TE and TE-like sequences and in about 10–20% of genes depending on plant species. DNA methylation in positive regulatory sequences impairs gene transcription and causes gene silencing, but enhances gene activity when occurring in repressive DNA elements [ 2 ]. CG methylation is also common within the transcribed regions of genes, where is associated with gene activity [ 3 ]. In rice, DNA methylation displays some differences in methylation levels and genomic distribution compared with Arabidopsis. For instance, rice mCHH is enriched at euchromatin regions and many genes are found to be methylated at CHH sites, especially around TSS [ 4 , 5 ]. There are also quite a number of genes methylated at CHG sites which are related to allelic-specific expression in rice hybrids [ 6 ].

Epigenetic marks are reprogrammed in the gametes and after fertilization. In mammals, there are two distinct phases of epigenetic reprogramming to prevent inheritance of ancestral epigenetic signatures. The first phase consists of a genome-wide erasure of DNA methylation in the primordial germ cells (PGCs), the gamete precursors, followed by the reestablishment of epigenetic signatures to enable gamete maturation and function [ 7 , 8 ]. Global DNA methylation is once again erased post-fertilization in early embryos, followed by another round of global de novo methylation during embryo development [ 9 , 10 , 11 ]. In contrast with mammals, DNA methylation is not globally erased in gametes in flowering plants [ 12 , 13 , 14 , 15 , 16 , 17 ]. Unlike mammalian germ lines that are defined already at an early stage of embryogenesis, thus before meiosis, the plant male and female sexual lineages initiate as diploid meiocytes from somatic cells, which give rise to haploid microspores after meiosis. The male microspores subsequently undergo mitosis, producing the vegetative and generative cells. The generative cell is further divided to produce two sperm cells in the mature pollen grain, the male gametophyte. Previous studies showed that plant sperm DNA methylation is remodeled and shows variation relative to somatic tissues [ 12 , 14 , 16 , 17 , 18 , 19 ]. However, it remains unclear whether the remodeling process is initiated in male meiocytes before meiosis or at the subsequent steps after meiosis and whether DNA methylation remodeling is required for male gametogenesis.

In this work, we analyzed DNA methylation in rice male meiocyte, microspore and sperm and studied the function of a set of chromatin regulators during the process. The results indicate that DNA methylation remodeling starts after meiosis of the male meiocytes and that non-CG methylation, particularly at CHG sequences, is dynamically remodeled during rice male gametogenesis. The work reveals distinct function of DNA methyltransferases and histone demethylase in the remodeling of CHG methylation and suggests that the CHG methylation remodeling has functional significance for male gametogenesis and fertilization.

Non-CG methylation is dynamically remodeled during rice male gametogenesis

To study DNA methylation dynamics during male gametogenesis in rice, we manually isolated male meiocytes, unicellular microspores and sperms of the Zhonghua11 (ZH11) variety as previously described [ 17 , 20 , 21 , 22 ]. About 400 meiocytes, 300 unicellular microspores and 100 sperm cells were collected for bisulfite sequencing (BS-seq) analysis using a protocol developed for small numbers of cells (Additional file 1 : Fig. S1a, b) [ 23 ]. Data with two biological replicates were obtained (Additional file 1 : Fig. S1c, Additional file 2 : Table S1). Violin plots of the BS-seq data revealed that the overall CG methylation (mCG) levels (especially in TE and TE-related genes, TEG) gradually increased during male gametogenesis (Fig.  1 a), whereas CHG methylation (mCHG) levels were first increased in unicellular microspore (UM) but subsequently decreased in sperm (S) to the lowest levels (Fig.  1 a). Density plots confirmed the mCHG variations between microspore (UM) and meiocyte (Me) and between sperm (S) and microspores (UM) (Fig.  1 b). A similar trend of CHH methylation (mCHH) variation was also observed (Fig.  1 b). Scanning of differentially methylated regions (DMRs, defined within 100-bp windows, see Methods) between microspore and meiocyte (UM-Me) detected more hyper than hypo CG (876 hyper versus 646 hypo), CHG (10,720 hyper versus 3,764 hypo), and CHH (31,501 hyper versus 20,983 hypo) DMRs, indicating a clear gain of non-CG methylation in microspore (Fig.  1 c, d). In sperm relative to microspore, there were more hypo- than hyper-DMRs at non-CG, especially CHG context (30,455 hypo- compared to 1,727 hyper-DMRs), confirming a clear decrease of mCHG in sperm. The mCHG levels in sperm were the lowest when compared with those in egg and zygote or somatic tissues of the same rice variety (Additional file 1 : Fig. S2a, see below), which could be also observed in other rice varieties (Additional file 1 : Fig. S2b) [ 4 , 16 , 17 , 24 ]. Analysis of sex cell methylomes obtained from the Nipponbare (NIP) variety indicated that sperm mCHG and mCHH levels were lower than pollen vegetative cell levels, but comparable to central cells of the female gametophyte (Additional file 1 : Fig. S2b) [ 16 , 24 ]. About 2/3 (5,221/7,775) of the increased mCHG (hyper DMRs) in microspore (relative to meiocyte) were erased in sperm, the other 1/3 (2,554/7,775) microspore hyper DMRs remained hyper-methylated in sperm (Additional file 1 : Fig. S3a, clusters A and B). These two clusters of the microspore hyper DMRs were located mainly in genic and intergenic regions (Additional file 1 : Fig. S3b), and showed enrichment of euchromatin histone marks (H3K36me3, H3K9ac) (Additional file 1 : Fig. S3d). By contrast, many sperm hypo DMRs (12,588) that showed no change between microspore and meiocyte (Additional file 1 : Fig. S3a), which corresponded mainly to TE and TEG (Additional file 1 : Fig. S3b), and were enriched for the heterochromatin mark H3K9me2 (Additional file 1 : Fig. S3d). The data indicate that during male gametogenesis mCHG and mCHH at both euchromatin and heterochromatin loci are dynamically remodeled to the lowest levels in sperm.

figure 1

DNA cytosine methylation in rice meiocytes, unicellular microspores, sperms. a Violin plots showing overall cytosine methylation levels (mCG, mCHG, and mCHH) in transposable elements (TE), transposable gene (TEG) and protein coding gene (Gene) of rice Zhonghua 11 (ZH11) meiocyte (Me), unicellular microspore (UM) and sperm (S). Values of the methylomes are averages from the two replicates. The average methylation levels (white dots) and median values (black bars) are indicated. b Density plot showing the frequency distribution of fractional methylation difference between the indicated samples. c Numbers of differentially methylated regions (DMRs) of between the indicated comparisons, distributed in TE (> 500 bp), TEG, gene, and intergenic regions. DMRs located in TE (red), gene (light green), intergenic region (pink), and TEG (yellow) are shown.  d Genome browser screenshots of mCG, mCHG, and mCHH in meiocytes (Me), unicellular microspore (UM), sperm (S). Differentially methylated regions are grey colored

From cluster A of the microspore hyper DMRs, 315 genes were identified. These genes are enriched for translation, ribonucleoprotein complex, and cellular protein metabolic functions, implying that mCHG in protein translation and RNA-binding pathway genes was particularly dynamic during meiocyte to sperm development (Additional file 1 : Fig. S3e, Additional file 4 : Table S3). Several genes showed lower expression in microspore than meiocyte and/or sperm (Additional file 1 : Fig. S4a), suggesting that hypermethylation might play a role in their repression in microspore.

CMT3a and CMT3b function during male gametogenesis

CMT3a is a major CHG methyltransferase gene expressed at high levels throughout the sporophytic development in rice [ 25 , 26 ]. However, its expression became low or undetectable in sperm in several rice varieties (Fig.  2 a). By contrast, CMT3b expression was low or undetectable in vegetative tissues/organs but showed expression in reproductive cells including meiocyte, microspore, and sperm (Fig.  2 a). To study the function of CMT3 genes during male gametogenesis, we produced cmt3a and cmt3b knockout (KO) plants in the ZH11 background using the CRISPR technique and two independent lines for each gene were obtained (Additional file 1 : Fig. S5a). The cmt3a mutants produced mainly defective pollens and were infertile (Fig.  2 b). Cytology sections revealed that the cmt3a pollen development was arrested likely at the bicellular microspore stage (Additional file 1 : Fig. S5b).

figure 2

Effects of cmt3a and cmt3b mutations on DNA methylation in meiocyte, microspore and sperm. a Transcript levels in FPKM of rice CMT3a and CMT3b in seedling (Se), roots (Ro), meiocyte (Me), unicellular microspore (UM), sperm (S), egg (E), zygote (Z), endosperm nuclei (En, 1.5 days after fertilization) and globular embryo (GE, 3 days after fertilization) from RNA-seq data. The sperm (Kit-S) in Kitaake background was reported by Anderson et al., (2013). b The pollen grains of wild type and cmt3a and cmt3b mutants were I2-KI stained. Bars = 50 μm. c Violin plots comparing overall cytosine methylation levels of wild type and cmt3a and cmt3b mutant meiocyte (Me), unicellular microspore (UM) and sperm (S). The average methylation levels (white dots) and median values (black bars) in transposable elements (TE) are shown. Values of the methylomes are averages from the two replicates. d Number of differential methylated regions (DMR) in cmt3a and cmt3b relative to wild type. Relative portions in TE (> 500 bp), TEG, gene, and Intergenic regions are indicated by different colors. e Venn diagrams showing overlapping of hypo-CHG DMRs in cmt3a and cmt3b meiocyte (left) and sperm (right) relative to wild type cells. f Box plots of DNA methylation levels of hypo-CHG DMRs in meiocyte (Me) versus microspore (UM) (upper) and sperm (S) relative to microspore (UM) (lower) in wild type, cmt3a (3a) and cmt3b (3b) cells. The significance was calculated with multiple comparison tests. Different letters on top of the bars indicate a significant difference ( p  < 0.05). g Genome Browser screen captures showing high CHG methylation sites in microspore relative to meiocyte and sperm decreased in cmt3b mutants (highlighted by grey)

By contrast, the cmt3b mutants showed no clear plant morphological phenotype. To check the effects of cmt3 mutations on DNA methylation during male gametogenesis, we performed BS-seq of meiocyte, unicellular microspore, and sperm isolated from two independent CRISPR/Cas9-free lines of cmt3a and/or cmt3b at T3-4 generation (as well as a tissue culture-regenerated wild type line) (Additional file 2 : Table S1). Violin plots of the data revealed that mCHG was almost absent from cmt3a meiocyte (Fig.  2 c, Additional file 1 : Fig. S6a), consistent with the CMT3a loss-of-function effects in somatic tissues [ 25 ]. To a lesser extent, mCG was also reduced in cmt3a meiocyte. However, cmt3a sperm mCHG (as well as mCHH) levels became higher than the mutant meiocyte (Fig.  2 c), suggesting that additional activities partially restored mCHG and/or ectopically mediated mCHH in the mutant sperm. DRM2 and CMT2 being highly expressed in rice sperm (Additional file 1 : Fig. S7a), the residual mCHG level in cmt3a sperm could be maintained by RdDM or CMT2. This hypothesis is supported by the observation that the drm2/cmt2 mutations also reduced the mCHG levels of those loci in leaves (Additional file 1 : Fig. S7b).

By contrast, the cmt3b mutation led to a clear loss of overall mCHG in microspore and sperm but the mutation effect was less clear in meiocyte (Fig.  2 c, Additional file 1 : Fig. S6a). The cmt3b mutation also resulted in some increases of mCHH in sperm. Density plots confirmed the observations (Additional file 1 : Fig. S6b). The increases of mCHH in cmt3a/b sperm might be of indirect effects to compensate mCHG loss in the mutants. Nearly all of the cmt3b hypo-CHG DMRs in meiocyte and sperm overlapped with those of cmt3a (Fig.  2 d, e), indicating that CMT3b functioned to maintain mCHG on a fraction of the CMT3a targets. The cmt3b mutation resulted in a large number (33,412) of hypo-CHG DMRs in microspore (Fig.  2 d), and diminished the mCHG difference between microspore and sperm observed in wild type (Additional file 1 : Fig. S6c). In fact, the methylation levels of hyper-CHG DMRs in wild type microspore versus meiocyte were decreased to the meiocyte levels in cmt3b microspore, and the methylation levels of the hypo-CHG DMRs in wild type sperm versus microspore were decreased to the sperm levels in cmt3b microspore (Fig.  2 f, g). For the three clusters of DMRs shown in Additional file 1 : Fig. S3a, the cmt3b mutation largely reduced the mCHG levels in microspore (Additional file 1 : Fig. S3c). In addition, the 315 genes (from cluster A) showed higher mCHG in exons than introns in microspore (Additional file 1 : Fig. S4b), while the cmt3b mutation reduced the mCHG levels from both exons and introns, suggesting that CMT3b may preferentially target gene exons in microspore (Additional file 1 : Fig. S4a, b). The analysis indicates that CMT3b is required for the increase of mCHG in microspore.

Histone demethylases JMJ706 and JMJ707 reduce CHG methylation

As CMT3a is not or lowly expressed in sperm, mCHG diluted by mitosis may not be maintained in sperm. Alternatively, active DNA demethylation may be involved, as mutations of DNA demethylases locally remodeled DNA methylation in sperm [ 17 ]. Since mCHG is linked to H3K9me2 through a positive feedback loop [ 27 , 28 ], we investigated whether H3K9me2 demethylases were also involved in the decease of mCHG in sperm. JMJ706 was shown to function as a H3K9 demethylase in rice [ 29 ]. JMJ707 is closely related to JMJ706 [ 29 ], of which JMJ707 showed high expression in sperm (Additional file 1 : Fig. S8a). To test whether the genes were involved in mCHG during male gametogenesis, we produced jmj706 and jmj707 double knockout (KO) plants and obtained two independent lines (Additional file 1 : Fig. S8b). The KO lines ( j67 ) showed a reduced pollen viability and seed setting rate (Additional file 1 : Fig. S8c, d). We analyzed the DNA methylome of male meiocyte and sperm of the mutant lines (Cas9-free at T3-4 generation) and found that the mutations had no drastic effect on the overall methylation in the cells (Fig.  3 a, Additional file 1 : Fig. S9a). However, the methylation levels of the hypo-DMRs in wild type meiocyte versus microspore were increased in j67 meiocyte (Fig.  3 b, c, d). Similarly, the methylations levels of the hypo- DMRs in wild type sperm versus microspore were augmented in j67 sperm (Fig.  3 b, c, d). The jmj706/7 mutations clearly elevated mCHG levels of cluster B DMRs in meiocyte and cluster C in sperm (Additional file 1 : Fig. S3c). There was no overlap between cmt3b and jmj706/7 -affected DMRs (Additional file 1 : Fig. S9b). The analysis indicated that JMJ706/707 play a role to reduce mCHG at a set of genomic loci (mainly TE or TEG) in male meiocyte and sperm.

figure 3

Effects of jmj706/707 mutations on DNA methylation in male sex cells. a Comparison of overall TE methylation levels in wild type and jmj706/707 mutant meiocyte (Me) and sperm (S). Values of the methylomes are averages from the two replicates. The average methylation levels (white dots) and median values (black bars) are shown. b Density plots of CHG methylation differences between jmj706/707 mutant and wild-type meiocyte (upper) and sperm (lower) (black lines). The red traces are density plots confined to the hypo DMRs between meiocyte and microspore (Me-UM) (upper) or between sperm and microspore (S-UM) (lower). c Box plots showing DNA methylation levels of 50-bp hypo-CHG methylation regions in wild type meiocyte (Me) (upper) and sperm (S) (lower) relative to microspore (UM) and in jmj706/707 meiocyte (j67-Me) and sperm (j67-S). The significance was calculated with multiple comparison tests. Different letters on top of the bars indicate a significant difference (p < 0.05). d Genome browser screen shots showing low CHG methylation sites in meiocyte (upper) or sperm (lower) relative to microspore but elevated in j67 mutants. Grey illustrates differentially methylated regions

Effect of the cmt3a/b and jmj706/7 mutations on sexual lineage-specific methylation

It is shown that Arabidopsis male sex cells show sexual-lineage-specific methylation (SLM) or sexual-lineage-hypermethylation (SLH) [ 18 ]. Using the published methods [ 18 ], we identified 555 SLH loci in rice meiocyte, microspore and sperm relative to somatic cells (seedling) (Additional file 1 : Fig. S10a). The cmt3b mutations reduced the SLH levels in all three male sex cell types, especially in microspore (Additional file 1 : Fig. S10b, c). By contrast, the jmj706/7 had no clear effect on SLH in the sex cells (Additional file 1 : Fig. S10b). Further analysis could divide the 555 SLH into 340 canonical SLH and 215 SLM loci (Additional file 1 : Fig. S11a) [ 18 ]. The canonical SLH loci corresponded mainly to TEs, while the SLM loci were located mainly in genes (body and promoter regions) (Additional file 1 : Fig. S11b), suggesting that SLM mainly targets genic regions during male gametophyte and sperm development. In total, 132 genes were targeted by SLM, which are enriched for translational function (Additional file 1 : Fig. S11c, Additional file 5 : Table S4). The 132 genes appeared to be repressed in sperm compared to meiocyte or microspore (Additional file 1 : Fig. S11d, e). The cmt3b mutation reduced SLM and increased expression of some of the genes in sperm (Additional file 1 : Fig. S11d, e), suggesting that CMT3b may be involved in SLM and repression of some of the genes in sperm.

Function of CMT3a and CMT3b in egg and zygote DNA methylation

Unlike in sperm, CMT3a is highly expressed in egg and zygote. CMT3b transcripts are detected in Egg and zygote (Fig.  2 a). To study CMT3 function in egg and zygote, we compared wild type, cmt3a and/or cmt3b egg and zygote methylomes by BS-seq analysis (Additional file 2 : Table S1). In wild type, egg and zygote mCHG levels were higher than sperm (Fig.  4 a, Additional file 1 : Fig. S12a). Because cmt3a was infertile, we only analyzed cmt3a egg methylome. As in male meiocyte and somatic tissues [ 25 ], the cmt3a mutation eliminated almost all mCHG in egg (Fig.  2 c and Fig.  4 a, Additional file 1 : Fig. S12a). The cmt3b mutation had a limited effect on overall mCHG in egg, but caused a clear decrease of mCHG in zygote (Fig.  4 a, Additional file 1 : Fig. S12a). The cmt3b mutation produced more hypo-CHG DMRs (23,460) in zygote than egg (13,249) or sperm (17,576) (Fig.  4 b). About 24% (5,623/23,460) of the hypo-DMRs in cmt3b zygote overlapped with the hyper-DMRs in wild type zygote versus sperm (Fig.  4 c). In fact, the cmt3b mutation reduced the methylation differences of the DMRs between wild type zygote and sperm (Z-S) or egg (Z-E) (Fig.  4 d, e). Together, the data indicate that CMT3b participates in reestablishing mCHG methylation at a subset of the Z-S and Z-E DMRs in the zygote.

figure 4

Effect of cmt3a and cmt3b mutations on DNA methylation in zygote and/or egg cells. a Comparison of overall TE methylation levels in sperm (S), egg (E) and zygote (Z) of wild type and cmt3a, cmt3b mutants. Values of the methylomes are averages from the two replicates. The average methylation levels (white dots) and median values (black bars) are shown. b Number of differential methylated regions (DMR) in the indicated mutant cells relative to wild type. Different colors indicate the distribution of DMR in TE (red), gene (light green), intergenic region (pink), and TEG (yellow).  c Venn diagrams showing overlapping of hyper-CHG DMRs in zygote relative to sperm (Z-S) (upper) or egg (Z-E) (lower) and hypo-CHG DMRs in cmt3b zygote relative to wild type zygote (3bZ-Z). d Box plots showing DNA methylation levels of Z-S (upper) or Z-E (lower) hyper-CHG DMR of the indicated cell type. The significance was calculated with multiple comparison tests. Different letters on top of the bars indicate a significant difference ( p  < 0.05). e Screenshots of CHG methylation levels of 5 representative genes in the indicated cell

Function of rice CMT3b in gene expression in reproductive cells

To study the cmt3 mutation effects on gene expression in sex cells, we first performed RNA-seq of wild type and cmt3b male meiocyte and sperm. Three replicates (two replicates for WT sperm) were performed (Additional file 1 : Fig. S13a, Additional file 3 : Table S2). PCA analysis indicated a high reproducibility of the replicates (Additional file 1 : Fig. S13b). The WT meiocyte transcriptome showed high correlation with previously published rice meiocyte transcriptomes ( r  > 0.85) (Additional file 1 : Fig. S13c). In cmt3b meiocyte, 2,259 and 1,639 genes were respectively up- and downregulated (Fig.  5 a). Similar numbers of differentially expressed genes (DEGs) were detected in the mutant sperm (Fig.  5 a). Upregulated genes in cmt3b meiocyte were enriched for gene transcription function, while upregulated genes in cmt3b sperm were mainly enriched for translational function (Additional file 1 : Fig. S14a, b). A small number of upregulated DEGs were found to associate with hypo-DMRs in the mutant cells (Fig.  5 b, c; Additional file 6 : Table S5). The analysis suggests that CMT3b plays a role in shutdown of transcriptional activity in meiocyte for preparation of meiosis and may contribute to the low translational activity in sperm [ 30 ].

figure 5

The cmt3b mutation affected non-TE gene expression in meiocyte and sperm. a Number of differentially expressed non-TE genes (pink) and TE-related genes (TEG) (red) in cmt3b mutant meiocyte and sperm relative to wild type (padj < 0.01, FC > 2). The highly expressed genes (TPM > 10) in sperm were filtrated for comparison. b Number of upregulated genes overlapping with hypo-CHG methylated genes (DMG) detected in cmt3b meiocyte and sperm relative to the wild type cells. c Genome browser screenshots of the methylation and expression levels of representative genes of the 148 (left) and 122 (right) genes shown in ( b )

In parallel, we analyzed the egg and zygote transcriptomes of wild type and cmt3a/b plants. Because cmt3a was infertile, we only analyzed cmt3a egg transcriptome. PCA analysis indicated high levels of reproducibility of the replicates. The cmt3b egg and zygote transcriptomes were close to, but distinct from, the wild type (Additional file 1 : Fig. S13b). By contrast, the cmt3a egg transcriptome was largely distal from the wild type (Additional file 1 : Fig. S13b), consistent with the drastic effect of cmt3a mutation on mCHG in egg. There were in total 4,868 upregulated genes (> two-fold, Q < 0.01), of which 1,648 were hypomethylated at CHG sites in cmt3a egg (Additional file 1 : Fig. S12b, c). Interestingly, among the up-regulated genes, 1,461 were TEGs, of which 982 (67.2%) were hypomethylated at CHG sites in cmt3a egg (Additional file 1 : Fig. S12b, c). The analysis indicates that CMT3a-mediated mCHG is required mainly for TEGs repression in egg, consistent with previous results showing that cmt3a mutation resulted in burst of TE expression [ 26 ]. The cmt3b mutation resulted in totally 3,022 upregulated genes (> two-fold, Q < 0.01) in egg, of which few were TEGs and only 120 (including 26 TEG) were identified as hypo-CHG methylated genes in the mutant egg (Additional file 1 : Fig. S12b, c), indicating that unlike cmt3a , the cmt3b mutation de-repressed mainly non-TE-related genes, which was likely independent of a clear loss of mCHG. However, about 40% of the upregulated genes in cmt3b overlapped with those detected in cmt3a eggs (Additional file 1 : Fig. S12d), suggesting that both CMT3 genes are required for gene (mainly non-TEGs) repression in egg. The cmt3b mutation resulted in upregulation of 2,179 and downregulation of 1,870 genes in zygote (Additional file 1 : Fig. S12b). Similar to that observed in cmt3b egg, relatively few upregulated genes were TEGs or showed hypo mCHG in cmt3b zygote (Additional file 1 : Fig. S12c).

CMT3b represses zygotic genes in egg cells

In wild type zygote we identified 1804 down- and 2628 upregulated (> two-fold, Q < 0.01) genes relative to egg (Fig.  6 a). Among the upregulated genes, 416 overlapped with previously identified genes expressed in rice zygotes (Fig.  6 b) [ 31 ]. The 416 genes were enriched for chromatin replication and cell division functions (Fig.  6 c), consistent with zygotic genome activation to promote zygote cell division in plants [ 32 ]. Within the 416 zygotic genes, 83 were upregulated in cmt3b egg. By contrast, although the cmt3a mutation caused a larger number of upregulated genes in egg, only 39 were among the 416 zygotic genes (Fig.  6 d). Among the 83 zygotic genes upregulated in cmt3b egg were those encode histones, chromatin proteins (HMGs, SMC2, TOP2), DNA methyltransferases (MET1 and CMT3a), transcription factors (E2F, HAP3, NAC), and cell division-related proteins (Fig.  6 e, f, g, Additional file 7 : Table S6). The analysis suggested that CMT3b has a function to repress zygote gene expression program in egg.

figure 6

The cmt3b mutation de-represses zygote-expressed genes in egg. a Number of differentially expressed genes in wild type zygote relative to egg (FC > 2, p  < 0.01). b Overlaps of high zygotic expression gene relative to egg in NIP, DJ and ZH11 varieties (Anderson et al., 2017; Zhou et al., 2021). c GO enrichment of zygote-expressed genes detected in three cultivars. d Transcript heatmaps of the 416 zygote-expressed genes in cmt3b egg (3b-E) and zygote (3b-Z) compared with wild type egg (E) and zygote (Z). e Transcript heatmaps of several representative zygote genes upregulated in cmt3b egg. f GO enrichment of the 83 upregulated genes in cmt3b egg shown in ( d ). g Integrative Genomics Viewer screenshots of six examples of the genes described in ( d )

Non-CG methylation dynamics during rice male gametogenesis

Previous results showed that DNA methylation in plants is partially remodeled or reconfigured in male and female gametes [ 12 , 13 , 14 , 15 , 16 , 17 , 19 ]. In this work, we provided evidence that non-CG (mainly CHG) methylations are dynamically remodeled during rice male gametogenesis with the highest levels observed in microspore and the lowest levels in sperm. It is known that during male germline development there is a cell cycle arrest [ 33 ], which could account for the reduced mCHG and mCHH in sperm. Unlike in Arabidopsis meiocyte that has higher mCG but lower mCHH than in somatic tissues [ 18 ], DNA methylation levels in rice male meiocyte were similar to somatic tissues (Additional file 1 : Fig. S15), suggesting that the remodeling process likely starts after meiosis in rice. Although the underlying function of the dynamic remodeling of mCHG (i.e. first increased in microspore then reduced in sperm) remains to be further explored, association of substantial numbers of genes with hypo mCHG in meiocyte and sperm relative to microspore of hundreds of genes (enriched for translational function) suggest that the remodeling is linked to gene expression dynamics during male gametogenesis. This is supported by the identification SLM, which targets mainly genes that are repressed in sperm. Alternatively, the remodeling may associate with chromatin changes during male gametogenesis.

The increase of mCHG observed in microspore was unexpected, as the methylation levels would be reduced after the meiotic cell divisions. The enhanced mCHG may have functional significance in microspore development. It remains to be studied whether the increased mCHG is related to chromatin reorganization of the haploid genome that was shown to be activated before the bicellular microspore stage in cereals [ 34 ]. As hyper-methylated genes are enriched for ribonucleoprotein complex (such as mRNA-binding) and translation proteins in microspore, the mCHG increase may be also related to repression of genes involved in transcriptional and translational activities that are shutdown during meiosis or male gametogenesis [ 26 ].

The decrease of mCHG in sperm may be required for sperm development and/or reshaping the sperm chromatin that lacks the compact silent center (CSC) found in egg and zygote chromatin 3D structures [ 21 ], and contains specific histone variants and modification patterns [ 35 , 36 , 37 , 38 , 39 ]. Alternatively, the decreased mCHG might facilitate the decondensation of the sperm chromatin upon fusion with the egg cell nucleus after fertilization, allowing initiation of transcription from the paternal genome [ 40 ].

The remodeling of non-CG methylation in rice sperm seemed to differ from Arabidopsis sperm where mCHH is lost or largely reduced [ 12 ]. This may be due to a different mCHH landscape in the rice genome, which is mainly scattered in genic regions [ 4 , 5 ]. In fact, in rice sperm mCHH was found to be reduced in genic regions, but appeared to be enhanced in TE-rich pericentromeric regions (Additional file 1 : Fig. S16a, b). The increased mCHH at pericentromeric region in sperm may be indirectly promoted by DNA demethylation at TE and repetitive sequence in the pollen vegetative cell in rice [ 16 ]. Alternatively, siRNAs produced by microspore may be inherited into sperm, in which they target heterochromatin LTR retrotransposon silencing through the RdDM pathway. In addition, the reduced genic mCHG and mCHH may also involve DNA demethylases that were shown to function in rice sperm [ 17 ].

Distinct functions of CMT3a /b in reproductive cells

CMT3a is the major CHG methyltransferase in rice. CMT3a expression was detected in meiocyte, but barely in sperm (Fig.  2 a), which may contribute to the low mCHG levels in sperm cells. However, CMT3a-mediated mCHG appeared essential for post-meiotic development, as the cmt3a mutation that eliminated mCHG in meiocyte stopped the pollen development at bicellular microspore stage. The expression pattern and mutation effects indicate that CMT3b plays an important role in increasing mCHG in the microspore in addition to complementing CMT3a for mCHG maintenance in meiocyte and sperm. However, the present data suggest that CMT3b preferentially targets protein-coding genes for methylation and is required for repression of transcriptional and/or translational genes during male gametogenesis.

The present work indicated CMT3a/b also have distinct functions in egg. CMT3a plays the major role in silencing TEGs, whereas CMT3b is required for repression of zygote gene expression program in egg. The repression of cell vision genes by CMT3b in egg is consistent with previous results showing that rice and maize egg cells are almost devoid of transcripts encoding histone proteins [ 31 , 41 ]. The cmt3b effects on zygote mCHG and gene expression indicate that CMT3b participates in zygotic genome reprograming by reestablishing mCHG methylation and by regulating gene expression. In conclusion, the work reveals non-CG DNA methylation dynamics during male gametogenesis and distinguishes CMT3a/b functions in mCHG and gene expression in reproductive cells in rice.

Conclusions

The work shows that during male gametogenesis mCHG level is first enhanced in microspores but subsequently reduced to the lowest level in sperm. CMT3a is the main CHG methyltransferase in somatic cells and meiocyte to silence TE and TE-related genes, while CMT3b is required for the surge of mCHG in microspore to repress transcription and translation-related genes, and is involved in the establishment of the zygotic epigenome after fertilization. The histone H3K9me2 demethylases JMJ706 and JMJ707 contribute to the reduction of mCHG in sperm. This study reveals a dynamic remodeling of mCHG during male gametogenesis, which has functional significance for pollen development and fertilization.

Plant materials and growth conditions

Rice variety Zhonghua11 (ZH11) ( Oryza sativa spp. japonica ) was used for transformation of CMT3a , CMT3b , and JMJ706/707 CRISPR/Cas9 vectors in this study. Single-guide RNAs (sgRNA) of CRISPR/Cas9 system were designed as previously reported [ 42 ]. The sgRNA target sequences of CMT3a , CMT3b , and JMJ706/707 as well as primers for genotyping are listed in Additional file 8: Table S7. Mutations in CMT3a , CMT3b and JMJ706/707 were decoded by DSDecodeM ( http://skl.scau.edu.cn/dsdecode/ ) [ 43 , 44 ]. Cas9-free transgenic plants from T3-T4 segregating populations were utilized for phenotypic analysis and isolation of reproductive cells. For field growth, germinated rice seedlings were planted in Wuhan from May to October. For greenhouse growth, germinated rice seedlings were planted in soil-filled boxes under a 14-h light/10-h dark cycle at temperatures of 32 °C (in light) and 26 °C (in dark).

Sperm, meiocyte and unicellular microspore isolation

Rice sperm, meiocyte and unicellular microspore were isolated from anthers of Cas9-free transgenic plants and tissue culture-regenerated wild type plants. For sperm isolation, mature anthers were soaked in 45% (w/v) sucrose and then transferred into 15% (w/v) sucrose to release sperm pairs. For isolation of meiocytes, panicles of about 5 cm length were chosen and middle parts of the panicles were collected. Intact anthers were carefully selected using dissecting needles, placed onto an RNAase-free slide, and suspended in a PBS solution containing 1% proteinase inhibitors. The outer wall of anthers was then removed by capillary to release meiocytes. Isolated meiocytes were checked under a light microscope (Additional file 1 : Fig. S1a). For isolation of unicellular microspores, panicles of about 8 cm length were harvested and the middle parts of the panicles were collected. Anthers were dissected out and made a hole with a capillary tube in the same solution as used for meiocyte isolation. Squeeze gently from the opposite side of the hole to release unicellular microspore. Isolated microspores were checked under a light unicellular microscope (Additional file 1 : Fig. S1b). All isolated cells were collected by a micromanipulator system (Eppendorf, TransferMan® 4r).

Egg and unicellular zygote isolation

Rice eggs and unicellular zygotes were isolated from ovules before and after fertilization of Cas9-free transgenic plants and tissue culture-regenerated wild type [ 17 , 21 ]. Briefly, ovaries of non-pollinated and pollinated florets (about 6.5 h after pollination) were manually collected in RNAase-free water under a dissection microscope. The collected ovules were transferred to a solution containing 0.53 M mannitol and 1% proteinases inhibitors to release eggs or zygotes. All isolated cells were stained by FDA (Fluoresceinc diacetate, Sangon, 596–09-8) and collected by a micromanipulator system (Eppendorf, TransferMan® 4r).

RNA-seq and BS-seq library construction

For RNA-seq library construction, mRNA isolated from different types of cells were reverse transcribed and amplified by using a Single Cell Full Length mRNA Amplification Kit (Vazyme, Cat.# N712) according to manufacturer’s instruction. cDNAs were sheared into 200–400 bp DNA fragments followed by purification using VAHTS DNA Clean Beads (Vazyme, Cat.# N411). The rest steps were performed using the TruePrep DNA Library Prep Kit V2 for Illumina (Vazyme, Cat.# TD502). About 3000 sperm cells and 400 meiocytes were used to construct the transcriptome libraries. Fifty cells of egg or unicellular zygote were used for each replicate of the transcriptome library construction. BS-seq libraries were constructed using reported protocol [ 23 ]. About 300–400 cells were pooled for each replicate for sperm, meicoyte and unicellular microspore bisulfite seq library construction and fifty cells of egg or unicellular zygote were used to construct bisulfite seq library.

Semi-thin section

Proper florets were gathered and 50% FAA (50 ml absolute ethanol, 10 ml 37% formaldehyde solution, 5 ml glacial acetic acid, add double distilled water to 100 ml) was used to fix samples. Semi-thin embedding were performed using Herau Kulzer Technovit 7100 resin. Briefly, the materials were first transferred to a gradient alcohol solution (70% for 4 h at 4° C, 85% for 1 h at room temperature, 95% for 1 h at room temperature, 100% for 2 h at room temperature), then immersed separately in a pre-infiltration solution (equal parts of 96% or absolute ethanol and base liquid Technovit 7100) and an infiltration solution (1 g hardener I dissolved in 100 ml base liquid) for two hours. The treated materials were embedded in a 65° C film stand with polymerisation solution (1 ml hardener is added with the help of a pipette and stirred into 15 ml of preparation solution) and then sliced.

RNA-seq data analysis

First, raw RNA-seq data were cleaned using fastp software to remove connectors and filter low-quality reads [ 45 ]. Clean read was then matched to the MSU7.0 rice reference genome using hisat2 software [ 46 ]. FeatureCounts were used for quantitative analysis and the differentially expressed genes (Fold change > 2, q < 0.01) were calculated by DESeq2 [ 47 ].

BS-seq data analysis

Fastp software was used to remove connectors and filter low-quality reads from the acquired raw BS-seq data. Clean reads were mapped to the MSU7.0 rice genome. Bismark software was used to match, deduplicate and extract methylation sites [ 48 ]. To obtain more loci for analysis, we combined the two biological replicates. Duplicates were removed and uniquely mapped reads were retained and each cytosine covered by at least five reads for further analysis. To avoid methylation being strongly influenced by single cytosine sites, the methylation level of each cytosine is calculated separately and then averaged for all cytosines to represent the methylation level for each bin (bin methylation level = (sum of individual cytosine methylations) / (number of cytosines within 100 bp).

To identify differential methylated regions (DMRs), the whole genome was divided into 100-bp bins. Bins that contained at least five cytosines each and every cytosine with at least a five-fold coverage were retained, absolute methylation difference of 0.5, 0.3, and 0.1 for CG, CHG, and CHH, respectively, and P values < 0.01 (Fisher’s exact test) were considered as DMRs. Neighboring DMRs within 200 bp were merged.

TE gene and gene Annotation information was downloaded from the Rice Genome Annotation project ( http://rice.uga.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/all.locus_brief_info.7.0 ). Differentially methylated genes (DMG) were identified using bedtools software [ 49 ], filtrated by > 80% of overlap between DMRs and genes.

Density plots show the frequency distribution of DNA methylation differences between 50-bp window of two samples with at least 20 informative sequenced cytosines in both samples and 70% CG, 30% CHG, or 10% CHH methylation in either of the samples as previously described [ 16 ].

Identification of sexual lineage-specific methylation loci

The previously reported method [ 18 ] was used to identify SLH and SLM in rice male sex cells. Briefly, Average sex cell mCG and mCHH levels within 100 bp bins were calculated from meiocytes, microspores and sperm, and average sex cell mCHG levels were calculated from meiocytes and microspores. Fractional methylation in 100 bp windows across the genome was compared between an average of selected sex cells (SexAV) and somatic tissues (Seedling) (Diff = SexAV—Seedling). The total methylation level was significantly different (Fisher's exact test p -value < 0.01), and the methylation level of all sex cell replicates was higher than that of all somatic tissues and selected windows meeting the following criterion: Diff_CG > 0 & Diff_CHG > 0.05 & Diff_CHH > 0.1 & (Diff_CG + Diff_CHG + Diff_CHH) > 0.4.

The refined list of SLHs (555 loci) was then separated into two groups based on the level of CHH/G methylation in somatic tissues: 1) SLMs with CHH and CHG methylation lower than 0.05 and 0.1, respectively, in somatic tissues (215 loci). 2) canonical SLHs with CHH methylation higher than 0.05 or CHG methylation higher than 0.1, in somatic tissues (340 loci) [ 18 ].

Availability of data and materials

Genes sequence data from this article can be found in the Rice Genome Annotation Project website under the following accession numbers: CMT3a, LOC_Os10g01570, CMT3b, LOC_Os03g12570, JMJ706, LOC_Os10g42690, JMJ707, LOC_Os02g46930. All high throughput data in support of the finding of this study deposited to the Gene Expression Omnibus (GEO) under the accession number GSE235680 [ 50 ]. The transcription profiles of reported rice sperm were downloaded from Gene Expression Omnibus (accession no. GSE50777) [ 51 ]. The rice central cells and vegetative cells BS-seq data was under the accession number GSE89789 [ 52 ] and GSE126791 [ 53 ]. The rice cmt2 and drm2 mutant BS-seq data was under the accession number GSE138705 [ 54 ]. No other scripts and software were used other than those mentioned in the Methods section.

Law JA, Jacobsen SE. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet. 2010;11:204–20.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Kim MY, Zilberman D. DNA methylation as a system of plant genomic immunity. Trends Plant Sci. 2014;19:320–6.

Article   CAS   PubMed   Google Scholar  

Lloyd JPB, Lister R. Epigenome plasticity in plants. Nat Rev Genet. 2022;23:55–68.

Tan F, Zhou C, Zhou Q, Zhou S, Yang W, Zhao Y, Li G, Zhou DX. Analysis of chromatin regulators reveals specific features of rice DNA methylation pathways. Plant Physiol. 2016;171:2041–54.

Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science. 2010;328:916–9.

Ma X, Xing F, Jia Q, Zhang Q, Hu T, Wu B, Shao L, Zhao Y, Zhang Q, Zhou DX. Parental variation in CHG methylation is associated with allelic-specific expression in elite hybrid rice. Plant Physiol. 2021;186:1025–41.

Guo F, Yan L, Guo H, Li L, Hu B, Zhao Y, Yong J, Hu Y, Wang X, Wei Y, et al. The transcriptome and DNA methylome landscapes of human primordial germ cells. Cell. 2015;161:1437–52.

Seisenberger S, Andrews S, Krueger F, Arand J, Walter J, Santos F, Popp C, Thienpont B, Dean W, Reik W. The dynamics of genome-wide DNA methylation reprogramming in mouse primordial germ cells. Mol Cell. 2012;48:849–62.

Bergman Y, Cedar H. DNA methylation dynamics in health and disease. Nat Struct Mol Biol. 2013;20:274–81.

Li C, Fan Y, Li G, Xu X, Duan J, Li R, Kang X, Ma X, Chen X, Ke Y, et al. DNA methylation reprogramming of functional elements during mammalian embryonic development. Cell Discov. 2018;4:41.

Article   PubMed   PubMed Central   Google Scholar  

Wang X, Bhandari RK. DNA methylation dynamics during epigenetic reprogramming of medaka embryo. Epigenetics. 2019;14:611–22.

Calarco JP, Borges F, Donoghue MT, Van Ex F, Jullien PE, Lopes T, Gardner R, Berger F, Feijó JA, Becker JD, Martienssen RA. Reprogramming of DNA methylation in pollen guides epigenetic inheritance via small RNA. Cell. 2012;151:194–205.

Ibarra CA, Feng X, Schoft VK, Hsieh TF, Uzawa R, Rodrigues JA, Zemach A, Chumak N, Machlicova A, Nishimura T, et al. Active DNA demethylation in plant companion cells reinforces transposon methylation in gametes. Science. 2012;337:1360–4.

Hsieh PH, He S, Buttress T, Gao H, Couchman M, Fischer RL, Zilberman D, Feng X. Arabidopsis male sexual lineage exhibits more robust maintenance of CG methylation than somatic tissues. Proc Natl Acad Sci U S A. 2016;113:15132–7.

Li C, Xu H, Fu FF, Russell SD, Sundaresan V, Gent JI. Genome-wide redistribution of 24-nt siRNAs in rice gametes. Genome Res. 2020;30:173–84.

Kim MY, Ono A, Scholten S, Kinoshita T, Zilberman D, Okamoto T, Fischer RL. DNA demethylation by ROS1a in rice vegetative cells promotes methylation in sperm. Proc Natl Acad Sci U S A. 2019;116:9652–7.

Zhou S, Li X, Liu Q, Zhao Y, Jiang W, Wu A, Zhou DX. DNA demethylases remodel DNA methylation in rice gametes and zygote and are required for reproduction. Mol Plant. 2021;14:1569–83.

Walker J, Gao H, Zhang J, Aldridge B, Vickers M, Higgins JD, Feng X. Sexual-lineage-specific DNA methylation regulates meiosis in Arabidopsis. Nat Genet. 2018;50:130–7.

Liu Q, Ma X, Li X, Zhang X, Zhou S, Xiong L, Zhao Y, Zhou D-X. Paternal DNA methylation is remodeled to maternal levels in rice zygote. Nat Commun. 2023;14:6571.

Collado-Romero M, Alós E, Prieto P. Unravelling the proteomic profile of rice meiocytes during early meiosis. Front Plant Sci. 2014;5:356.

Zhou S, Jiang W, Zhao Y, Zhou DX. Single-cell three-dimensional genome structures of rice gametes and unicellular zygotes. Nat Plants. 2019;5:795–800.

Jiang P, Lian B, Liu C, Fu Z, Shen Y, Cheng Z, Qi Y. 21-nt phasiRNAs direct target mRNA cleavage in rice male germ cells. Nat Commun. 2020;11:5191.

Clark SJ, Smallwood SA, Lee HJ, Krueger F, Reik W, Kelsey G. Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq). Nat Protoc. 2017;12:534–47.

Park K, Kim MY, Vickers M, Park JS, Hyun Y, Okamoto T, Zilberman D, Fischer RL, Feng X, Choi Y, Scholten S. DNA demethylation is initiated in the central cells of Arabidopsis and rice. Proc Natl Acad Sci U S A. 2016;113:15138–43.

Hu D, Yu Y, Wang C, Long Y, Liu Y, Feng L, Lu D, Liu B, Jia J, Xia R, et al. Multiplex CRISPR-Cas9 editing of DNA methyltransferases in rice uncovers a class of non-CG methylation specific for GC-rich regions. Plant Cell. 2021;33:2950–64.

Cheng C, Tarutani Y, Miyao A, Ito T, Yamazaki M, Sakai H, Fukai E, Hirochika H. Loss of function mutations in the rice chromomethylase OsCMT3a cause a burst of transposition. Plant J. 2015;83:1069–81.

Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008;133:523–36.

Fang J, Jiang J, Leichter SM, Liu J, Biswal M, Khudaverdyan N, Zhong X, Song J. Mechanistic basis for maintenance of CHG DNA methylation in plants. Nat Commun. 2022;13:3877.

Sun Q, Zhou DX. Rice jmjC domain-containing gene JMJ706 encodes H3K9 demethylase required for floral organ development. Proc Natl Acad Sci U S A. 2008;105:13679–84.

Idler RK, Hennig GW, Yan W. Bioinformatic identification of novel elements potentially involved in messenger RNA fate control during spermatogenesis. Biol Reprod. 2012;87:138.

Anderson SN, Johnson CS, Chesnut J, Jones DS, Khanday I, Woodhouse M, Li C, Conrad LJ, Russell SD, Sundaresan V. The zygotic transition is initiated in unicellular plant zygotes with asymmetric activation of parental genomes. Dev Cell. 2017;43:349-358 e344.

Dresselhaus T, Jürgens G. Comparative embryogenesis in angiosperms: activation and patterning of embryonic cell lineages. Annu Rev Plant Biol. 2021;72:641–76.

Borges F, Donoghue MTA, LeBlanc C, Wear EE, Tanurdzic M, Berube B, Brooks A, Thompson WF, Hanley-Bowdoin L, Martienssen RA. Loss of Small-RNA-Directed DNA Methylation in the Plant Cell Cycle Promotes Germline Reprogramming and Somaclonal Variation. Curr Biol. 2021;31:591-600 e594.

Nelms B, Walbot V. Gametophyte genome activation occurs at pollen mitosis I in maize. Science. 2022;375:424–9.

Houben A, Kumke K, Nagaki K, Hause G. CENH3 distribution and differential chromatin modifications during pollen development in rye (Secale cereale L.). Chromosome Res. 2011;19:471–80.

Borg M, Jacob Y, Susaki D, LeBlanc C, Buendia D, Axelsson E, Kawashima T, Voigt P, Boavida L, Becker J, et al. Targeted reprogramming of H3K27me3 resets epigenetic memory in plant paternal chromatin. Nat Cell Biol. 2020;22:621–9.

Huang X, Sun MX. H3K27 methylation regulates the fate of two cell lineages in male gametophytes. Plant Cell. 2022;34:2989–3005.

Ingouff M, Rademacher S, Holec S, Soljic L, Xin N, Readshaw A, Foo SH, Lahouze B, Sprunck S, Berger F. Zygotic resetting of the HISTONE 3 variant repertoire participates in epigenetic reprogramming in Arabidopsis. Curr Biol. 2010;20:2137–43.

Buttress T, He S, Wang L, Zhou S, Saalbach G, Vickers M, Li G, Li P, Feng X. Histone H2B.8 compacts flowering plant sperm through chromatin phase separation. Nature. 2022;611:614–22.

Scholten S, Lörz H, Kranz E. Paternal mRNA and protein synthesis coincides with male chromatin decondensation in maize zygotes. Plant J. 2002;32:221–31.

Chen J, Strieder N, Krohn NG, Cyprys P, Sprunck S, Engelmann JC, Dresselhaus T. Zygotic genome activation occurs shortly after fertilization in maize. Plant Cell. 2017;29:2106–25.

He Y, Zhang T, Yang N, Xu M, Yan L, Wang L, Wang R, Zhao Y. Self-cleaving ribozymes enable the production of guide RNAs from unlimited choices of promoters for CRISPR/Cas9 mediated genome editing. J Genet Genomics. 2017;44:469–72.

Liu W, Xie X, Ma X, Li J, Chen J, Liu YG. DSDecode: A Web-Based Tool for Decoding of Sequencing Chromatograms for Genotyping of Targeted Mutations. Mol Plant. 2015;8:1431–3.

Xie X, Ma X, Liu YG. Decoding Sanger Sequencing Chromatograms from CRISPR-Induced Mutations. Methods Mol Biol. 2019;1917:33–43.

Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90.

Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15.

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.

Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27:1571–2.

Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.

Li X, Zhu B, Lu Y, Zhao F, Liu Q, Wang J, Ye M, Chen S, Nie J, Xiong L, et al: DNA methylation remodeling and the functional implication during male gametogenesis in rice. 2024, GSE235680. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/gds/?term=GSE235680

Anderson SN, Johnson CS, Jones DS, Conrad LJ, Gou X, Russell SD, Sundaresan V: Transcriptomes of isolated Oryza sativa gametes characterized by deep sequencing: evidence for distinct sex-dependent chromatin and epigenetic states before fertilization. 2013, GSE50777. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/gds/?term=GSE50777 .

Park K, Kim MY, Vickers M, Park JS, Hyun Y, Okamoto T, Zilberman D, Fischer RL, Feng X, Choi Y, Scholten S: DNA demethylation is initiated in the central cells of Arabidopsis and rice. 2016, GSE89789 . Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/gds/?term=GSE89789 .

Kim MY, Ono A, Scholten S, Kinoshita T, Zilberman D, Okamoto T, Fischer RL: DNA demethylation by ROS1a in rice vegetative cells promotes methylation in sperm. 2019, GSE126791. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/gds/?term=GSE126791 .

Hu D, Yu Y, Wang C, Long Y, Liu Y, Feng L, Lu D, Liu B, Jia J, Xia R, et al: Multiplex CRISPR-Cas9 editing of DNA methyltransferases in rice uncovers a class of non-CG methylation specific for GC-rich regions. Plant Cell 2021, GSE138705. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/gds/?term=GSE138705 .

Download references

Acknowledgements

We thank Mr. Hao Liu from the National Key Laboratory of Crop Genetic Improvement for essential help in managing the high-throughput computing clusters.

Peer review information

Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Review history

The review history is available as Additional file  9 .

The work was supported by National Natural Science Foundation of China [32070563, 31730049], the Fundamental Research Funds for the Central Universities [2662023SKPY002], and the Agence Nationale de la Recherche (LANDSREC, ANR- 21-CE20-0012–01).

Author information

Xue Li and Bo Zhu contributed equally to this work.

Authors and Affiliations

National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China

Xue Li, Bo Zhu, Feng Zhao, Qian Liu, Jiahao Wang, Miaomiao Ye, Siyuan Chen, Lizhong Xiong, Yu Zhao, Changyin Wu & Dao-Xiu Zhou

Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding, College of Agriculture, Yangzhou University, Yangzhou, 225009, China

Vazyme Biotech Co., Ltd, Nanjing, 210000, China

Institute of Plant Science Paris-Saclay (IPS2), CNRS, INRAE, Université Paris-Saclay, 91405, Orsay, France

Dao-Xiu Zhou

You can also search for this author in PubMed   Google Scholar

Contributions

XL produced the mutants, isolated the cells and participated in data production, BZ produced and analyzed the data; YL, FZ and QL participated in data analysis; JW, MY and SC participated in material production; LX, CW participated in the project design and YZ participated in supervision and management of the project; DXZ coordinated the project, analyzed the data and wrote the paper.

Corresponding author

Correspondence to Dao-Xiu Zhou .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

Junwei Nie claims competing interests, the remaining authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: figure s1..

Isolation of rice meiocytes and unicellular microspores and quality control of rice male germ cells BS-seq. Figure S2. DNA methylation levels in reproduction cells compared with somatic tissues. Figure S3. Analysis of microspore hyper DMRs relative to meiocyte and sperm. Figure S4. Analysis of mCHG genes in unicellular microspore. Figure S5. Effects of cmt 3 a and cmt 3 b in pollen development. Figure S6. CMT3b maintains high mCHG level in microspore. Figure S7. Analysis had no effect on mCHG in cmt3a mutant sperm cells. Figure S8. Production and phenotypic analysis of jmj706/707 double mutants. Figure S9. Average methylation level of TEG and genes in jmj706/707 meiocyte and sperm cells compared with wild type. Figure S10. Effect of the cmt3a/b and jmj706/7 mutations on sexual lineage-specific methylation. Fig. S11. Analyze the Canonical SLH and lineage-specific methylation (SLM). Figure S12. Effects of cmt3a and cmt3b mutations in zygote and/ or egg DNA methylation. Figure S13. Transcriptome data analysis. Figure S14. Enrichment of upregulated genes in cmt3b meiocyte and sperm. Figure S15. DNA methylation levels in rice reproductive cells and seedlings. Figure S16. DNA CHH methylation landscape in rice reproductive cells and seedlings.

Additional file 2: Table S1.

Summary of BS-seq data.

Additional file 3: Table S2.

Summary of RNA-seq data.

Additional file 4: Table S3.

315 genes annotated by Cluster A.

Additional file 5: Table S4.

132 genes annotated by SLM.

Additional file 6: Table S5.

CMT3b regulates gene in meiocyte and sperm.

Additional file 7: Table S6.

83 zygotic genes upregulated in cmt3b egg.

Additional file 8: Table S7.

Primers used in this study.

Additional file 9.

Review history.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Li, X., Zhu, B., Lu, Y. et al. DNA methylation remodeling and the functional implication during male gametogenesis in rice. Genome Biol 25 , 84 (2024). https://doi.org/10.1186/s13059-024-03222-w

Download citation

Received : 09 August 2023

Accepted : 25 March 2024

Published : 02 April 2024

DOI : https://doi.org/10.1186/s13059-024-03222-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Rice male gametogenesis
  • DNA methylation

Genome Biology

ISSN: 1474-760X

dna computing research paper

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 20 September 2023

Unlocking biomolecular intelligence

Nature Machine Intelligence volume  5 ,  page 949 ( 2023 ) Cite this article

2909 Accesses

1 Altmetric

Metrics details

Advances in DNA nanoengineering promise the development of new computing devices within biological systems, with applications in nanoscale sensing, diagnostics and therapeutics.

As nanotechnology emerged in the 1980s as a scientific field, researchers found inspiration in a bottom-up approach to designing machines, using molecules as building blocks 1 . After all, molecules and biomolecules can pack a large amount of functionality and processing power at the micro- or nanoscale. DNA has been a particularly promising candidate for designing molecular machines, as DNA molecules can be easily synthesized and modified to have specific functions and self-assembling properties. Logic circuits with DNA, based on sequence recognition and strand displacement, were demonstrated in 2006 2 . DNA origami, in which DNA units self-assemble into interesting nanomaterials with specific shapes and patterns, became a popular direction around the same time 3 . Recently, convolutional neural networks have been implemented in vitro using DNA strand-displacement circuits, which have been used to perform classification tasks 4 . As the field of DNA computing progresses, such classification capabilities might be deployed in the context of clinical diagnostics.

dna computing research paper

Various molecule-based smart devices have been proposed for applications in sensing, diagnosis and therapeutic delivery 5 . Inspiration can be found in nature, where biomolecular systems can sense dynamic environmental signals in complex environments, and, in response, activate regulatory processes that are contingent on the timing of these signals. Implementing such functionality with artificially designed molecular machines is challenging, but one approach is to make use of the concept of a finite state machine (FSM), which is a model of computation used in fields such as mathematics, engineering and biology. An FSM is a device that can transition between a distinct number of states depending on the previous state and the present inputs. This enables FSMs to store and process information that depends on the sequence of events over time.

DNA lends itself well for building molecular FSMs, as it can be modified to take on various stable molecular shapes depending on specific interactions with the environment 6 . In a research Article in this issue of Nature Machine Intelligence , Zhao et al. present an innovative approach for a DNA-based FSM that is suitable for use within living cells. The authors’ DNA FSM consists of two three-helix subunits that are connected to each other to form a six-helix nanotubular framework via four ‘locks’. These locks can be opened, with a specific combination of signals, to produce five more possible different structural configurations. The FSM can reversibly switch between the six structural states in response to temporally ordered physiologically relevant inputs. Which configuration the DNA framework is in can be observed by fluorescence patterns.

As a proof of concept, Zhao et al. show that their DNA framework FSM can complete a task in the complex environment of a living cell, targeting a ‘theranostic’ application that integrates diagnosis and therapeutics. The authors show that their DNA machine can be used as a carrier for the controlled use of the gene-editing tool CRISPR–Cas9 to specifically attack tumour cells. With the multi-stage change in configuration after a specific sequence of several molecular cues, the DNA machine can release CRISPR–Cas9 with spatiotemporal precision.

The versatility of DNA nanoengineering offers the possibility to design smart devices with a range of structures and functionalities that can respond to signals at the cellular level. The area is already poised to have a considerable impact on healthcare and other industries 7 . Work at the interface between nanotechnology and machine learning, in combination with advances in molecular computing such as those proposed by Zhao et al., could tackle major scientific and technological challenges. Such interdisciplinary research could eventually unlock biomolecular machine intelligence, with the goal of addressing the complexity of nature at cell level with scientific ingenuity.

Jones, R. Nat. Nanotechnol. 4 , 207 (2009).

Article   Google Scholar  

Seelig, G. et al. Science 314 , 1585–1588 (2006).

Rothemund, P. Nature 440 , 297–302 (2006).

Xiong, X. et al. Nat. Mach. Intell. 4 , 625–635 (2022).

Chen, Y. J. et al. Nat. Nanotechnol. 10 , 748–760 (2015).

Liu, L. et al. Sci. Adv. 8 , eabm9530 (2022). 566.

Ricci, F. & Dietz, H. Nat. Nanotechnol. 18 , 541–542 (2023).

Download references

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Unlocking biomolecular intelligence. Nat Mach Intell 5 , 949 (2023). https://doi.org/10.1038/s42256-023-00730-5

Download citation

Published : 20 September 2023

Issue Date : September 2023

DOI : https://doi.org/10.1038/s42256-023-00730-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

dna computing research paper

Princeton University

Princeton engineering, can language models read the genome this one decoded mrna to make better vaccines..

By Scott Lyon

April 8, 2024

Single strand ribonucleic acid.

Princeton researchers led by Mengdi Wang have developed a language model to home in on partial genome sequences and optimize those sequences to improve function for the development of mRNA vaccines and other therapies. Illustration from Adobe Stock.

The same class of artificial intelligence that made headlines coding software and passing the bar exam has learned to read a different kind of text — the genetic code.

That code contains instructions for all of life’s functions and follows rules not unlike those that govern human languages. Each sequence in a genome adheres to an intricate grammar and syntax, the structures that give rise to meaning. Just as changing a few words can radically alter the impact of a sentence, small variations in a biological sequence can make a huge difference in the forms that sequence encodes.

Now Princeton University researchers led by machine learning expert Mengdi Wang are using language models to home in on partial genome sequences and optimize those sequences to study biology and improve medicine. And they are already underway.

In a paper published April 5 in the journal Nature Machine Intelligence, the authors detail a language model that used its powers of semantic representation to design a more effective mRNA vaccine such as those used to protect against COVID-19.

Found in Translation

Mengdi Wang in her Princeton office.

Scientists have a simple way to summarize the flow of genetic information. They call it the central dogma of biology. Information moves from DNA to RNA to proteins. Proteins create the structures and functions of living cells.

Messenger RNA, or mRNA, converts the information into proteins in that final step, called translation. But mRNA is interesting. Only part of it holds the code for the protein. The rest is not translated but controls vital aspects of the translation process.

Governing the efficiency of protein production is a key mechanism by which mRNA vaccines work. The researchers focused their language model there, on the untranslated region, to see how they could optimize efficiency and improve vaccines.

After training the model on a small variety of species, the researchers generated hundreds of new optimized sequences and validated those results through lab experiments. The best sequences outperformed several leading benchmarks for vaccine development, including a 33% increase in the overall efficiency of protein production.

Increasing protein production efficiency by even a small amount provides a major boost for emerging therapeutics, according to the researchers. Beyond COVID-19, mRNA vaccines promise to protect against many infectious diseases and cancers.

Wang, a professor of electrical and computer engineering and the principal investigator in this study, said the model’s success also pointed to a more fundamental possibility. Trained on mRNA from a handful of species, it was able to decode nucleotide sequences and reveal something new about gene regulation. Scientists believe gene regulation, one of life’s most basic functions, holds the key to unlocking the origins of disease and disorder. Language models like this one could provide a new way to probe.

Wang’s collaborators include researchers from the biotech firm RVAC Medicines as well as the Stanford University School of Medicine.

The Language of Disease

The new model differs in degree, not kind, from the large language models that power today’s AI chat bots. Instead of being trained on billions of pages of text from the internet, their model was trained on a few hundred thousand sequences. The model also was trained to incorporate additional knowledge about the production of proteins, including structural and energy-related information.

The research team used the trained model to create a library of 211 new sequences. Each was optimized for a desired function, primarily an increase in the efficiency of translation. Those proteins, like the spike protein targeted by COVID-19 vaccines, drive the immune response to infectious disease.

Previous studies have created language models to decode various biological sequences, including proteins and DNA, but this was the first language model to focus on the untranslated region of mRNA. In addition to a boost in overall efficiency, it was also able to predict how well a sequence would perform at a variety of related tasks.

Wang said the real challenge in creating this language model was in understanding the full context of the available data. Training a model requires not only the raw data with all its features but also the downstream consequences of those features. If a program is designed to filter spam from email, each email it trains on would be labeled “spam” or “not spam.” Along the way, the model develops semantic representations that allow it to determine what sequences of words indicate a “spam” label. Therein lies the meaning.

Wang said looking at one narrow dataset and developing a model around it was not enough to be useful for life scientists. She needed to do something new. Because this model was working at the leading edge of biological understanding, the data she found was all over the place.

“Part of my dataset comes from a study where there are measures for efficiency,” Wang said. “Another part of my dataset comes from another study [that] measured expression levels. We also collected unannotated data from multiple resources.” Organizing those parts into one coherent and robust whole — a multifaceted dataset that she could use to train a sophisticated language model — was a massive challenge.

“Training a model is not only about putting together all those sequences, but also putting together sequences with the labels that have been collected so far. This had never been done before.”

The paper, “A 5′ UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions,” was published in Nature Machine Learning. Additional authors include Dan Yu, Yupeng Li, Yue Shen and Jason Zhang, from RVAC Medicines; Le Cong from Stanford; and Yanyi Chu and Kaixuan Huang from Princeton.

Related News

Composers & Computers Episode 2: That Magic Touch

Episode 2: That Magic Touch

Composers & Computers, Episode 3, Haydn Seek. There is an image of a soundwave under the series logo.

Episode 3: Haydn Seek

Composers & Computers Season 2, Episode 1, Stanley Jordan Pulls out all the stops. Sound wave image under the podcast series logo.

Episode 1: Stanley Jordan Pulls Out All the Stops

Chatbot illustration with person's hands holding a phone.

Personalizing ChatGPT can make it more offensive, researchers find

Dense rows of low crops growing in a field, with trees in the distance and a clear blue sky.

Princeton IP Accelerator funding awarded to support promising new technologies

An advanced chip taped out surrounded by a gold square surrounded by a large array of gold pins.

Built for AI, this chip moves beyond transistors for huge computational gains

dna computing research paper

Mengdi Wang

dna computing research paper

Bioengineering and Health

dna computing research paper

Data Science

Related departments and centers.

Professor writes on white board while talking with grad student.

Electrical and Computer Engineering

Six people in lab looking toward camera.

Omenn-Darling Bioengineering Institute

DNA Computing A Survey

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

share this!

April 11, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

New advances promise secure quantum computing at home

by University of Oxford

Breakthrough promises secure quantum computing at home

The full power of next-generation quantum computing could soon be harnessed by millions of individuals and companies, thanks to a breakthrough by scientists at Oxford University Physics guaranteeing security and privacy. This advance promises to unlock the transformative potential of cloud-based quantum computing and is detailed in a new study published in Physical Review Letters . The paper is titled "Verifiable blind quantum computing with trapped ions and single photons."

Quantum computing is developing rapidly, paving the way for new applications that could transform services in many areas like health care and financial services. It works in a fundamentally different way than conventional computing and is potentially far more powerful. However, it currently requires controlled conditions to remain stable and there are concerns around data authenticity and the effectiveness of current security and encryption systems.

Several leading providers of cloud-based services, like Google, Amazon, and IBM, already separately offer some elements of quantum computing. Safeguarding the privacy and security of customer data is a vital precursor to scaling up and expanding its use, and for the development of new applications as the technology advances. The new study by researchers at Oxford University Physics addresses these challenges.

"We have shown for the first time that quantum computing in the cloud can be accessed in a scalable, practical way which will also give people complete security and privacy of data, plus the ability to verify its authenticity," said Professor David Lucas, who co-heads the Oxford University Physics research team and is lead scientist at the UK Quantum Computing and Simulation Hub, led from Oxford University Physics.

Breakthrough promises secure quantum computing at home

In the new study, the researchers use an approach dubbed "blind quantum computing," which connects two totally separate quantum computing entities—potentially an individual at home or in an office accessing a cloud server—in a completely secure way. Importantly, their new methods could be scaled up to large quantum computations.

"Using blind quantum computing, clients can access remote quantum computers to process confidential data with secret algorithms and even verify the results are correct, without revealing any useful information. Realizing this concept is a big step forward in both quantum computing and keeping our information safe online," said study lead Dr. Peter Drmota, of Oxford University Physics.

The researchers created a system comprising a fiber network link between a quantum computing server and a simple device detecting photons, or particles of light, at an independent computer remotely accessing its cloud services. This allows so-called blind quantum computing over a network.

Every computation incurs a correction that must be applied to all that follow and needs real-time information to comply with the algorithm. The researchers used a unique combination of quantum memory and photons to achieve this.

Breakthrough promises secure quantum computing at home

"Never in history have the issues surrounding privacy of data and code been more urgently debated than in the present era of cloud computing and artificial intelligence ," said Professor David Lucas. "As quantum computers become more capable, people will seek to use them with complete security and privacy over networks, and our new results mark a step change in capability in this respect."

The results could ultimately lead to commercial development of devices to plug into laptops, to safeguard data when people are using quantum cloud computing services.

Researchers exploring quantum computing and technologies at Oxford University Physics have access to the state-of-the-art Beecroft laboratory facility, specially constructed to create stable and secure conditions including eliminating vibration.

Journal information: Physical Review Letters

Provided by University of Oxford

Explore further

Feedback to editors

dna computing research paper

New study finds potential targets at chromosome ends for degenerative disease prevention

5 hours ago

dna computing research paper

Scientists discover first nitrogen-fixing organelle

dna computing research paper

Traces of DNA in the stomachs of predatory snails provide new insights into the ecology of placozoans

dna computing research paper

New study reveals novel approach for combating 'resting' bacteria

dna computing research paper

Researchers develop method to extract useful proteins from beer-brewing leftovers

dna computing research paper

Scientists find blue light makes buildings more deadly to migrating birds

6 hours ago

dna computing research paper

Ant pheromones may help protect hikers and campers from ticks

dna computing research paper

Trapped in the middle: Billiards with memory framework leads to mathematical questions

dna computing research paper

What do bird dreams sound like?

dna computing research paper

Scientists develop biofortified rice to combat nutrient deficiencies

Relevant physicsforums posts, schrödinger’s cat and the qbit, completeness of eigenfunctions of hermitian operators.

2 hours ago

A particle locked inside an arrangement of Dirac delta potentials

Wave packet interpretation.

4 hours ago

Representation of Spin 1/2 quantum state

Energy and momentum of a photon in a medium.

Apr 10, 2024

More from Quantum Physics

Related Stories

dna computing research paper

Researcher: The quantum computer doesn't exist yet, but we are better understanding what problems it can solve

dna computing research paper

Blind quantum computing for everyone

Aug 11, 2017

dna computing research paper

New research area promotes both quantum computing and cognitive science

Mar 26, 2024

dna computing research paper

Unlocking quantum computing power: Automated protocol design for quantum advantage

Mar 25, 2024

dna computing research paper

Can cloud-based quantum computing really offer a quantum advantage?

Sep 22, 2023

dna computing research paper

Cloud data storage security approach taps quantum physics

Nov 14, 2023

Recommended for you

dna computing research paper

Physicists discover a novel quantum state in an elemental solid

dna computing research paper

Quantum crystal of frozen electrons—the Wigner crystal—is visualized for the first time

dna computing research paper

Team is first ever to measure qubits with ultrasensitive thermal detectors, evading Heisenberg uncertainty principle

dna computing research paper

Quantum behavior at room temperature: When laser light makes materials magnetic

dna computing research paper

Physicists track how continuous changes in dimensionality affect collective properties of a superfluid

dna computing research paper

New technique lets scientists create resistance-free electron channels

Apr 9, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

IMAGES

  1. what is dna computing? And why is DNA computing important?

    dna computing research paper

  2. Data And DNA: Encoding Digital Files Into Genetic Material Creates Serious Storage Space Digital

    dna computing research paper

  3. 😍 Dna research paper. DNA Research Paper, Term Paper Help, Research Paper Writing Service. 2019

    dna computing research paper

  4. Recominant dna technology research paper

    dna computing research paper

  5. Hand touching DNA sequence molecules structure mesh. Wireframe DNA code editable template

    dna computing research paper

  6. DNA Computing is Here and It's Incredible

    dna computing research paper

VIDEO

  1. Part 1: introduction

  2. Unit 10 Part 6 DNA Computing

  3. Liquid DNA computing Technology

  4. CATALOG

  5. Presentation on DNA Computing

  6. CATALOG: DNA Computing

COMMENTS

  1. DNA computing

    DNA computing is a branch of biomolecular computing concerned with the use of DNA as a carrier of information to make arithmetic and logic operations. Latest Research and Reviews

  2. Concept, Development and Applications of DNA Computation

    As traditional silicon-based chips approach their theoretical limits on computing power, DNA-based computation presents a promising alternative with potential advantages such as reduced size, high storage density, low consumption, long-term stability and the ability to perform in memory-computing. This review provides a summarized overview of ...

  3. Recent advances in DNA computing

    Below are some recent papers published in ACS journals that report on innovations in DNA computing. "Advances in Applications of Molecular Logic Gates". ACS Omega. Nov. 6, 2021. This review discusses recent advances in molecular logic gates, including those that incorporate DNA. The researchers describe how the gates are being used to ...

  4. Introduction to DNA computing

    Experiment, theory, and implementation are the fields of DNA computing-based research and development. Unlike the conventional binary digits, i.e., 1 and 0, data or file in DNA computing is primarily expressed using four genetic alphabets, i.e., ... His research interests are Cloud Computing and DNA Computing. He has published several papers in ...

  5. DNA Computing and Its Applications

    Today, many researchers all over the world concentrate on subjects either to improve available methods used in DNA computing or to suggest a new way to solve engineering or application problems with a DNA computing approach. This paper gives an overview of research achievements in DNA computing and touches on the achievements of improved ...

  6. DNA Computing: Principle, Construction, and Applications in Intelligent

    Demands on faster information processing speed and denser data storage are catalyzing new computation modes. DNA, as an important biomolecule that carries genetic information, has shown its potential in information processing and storage due to its predictable base pairings and nanoscale size for programmable and high-throughput coding, as well as computing.

  7. DNA Computing

    We present another significant milestone in DNA computing research, the first experiment that demonstrated that DNA computing devices can exceed the computational power of an unaided human. ... Moreover, in the paper, the authors show that an analogous result holds even for Watson-Crick automata, that is, the identity relation suffices for WK ...

  8. [The current status and future prospects of DNA computing]

    The DNA circuit, which is the basis for DNA computing, is an important technology for the regulation and processing of the molecular information. This review highlights the basic principles of DNA computing, summarizes the latest research progress, and concludes with a discussion of the challenges of DNA computing.

  9. (PDF) Propelling DNA Computing with Materials' Power: Recent

    Ini-. tially, the operating principles and functions of different logic devices (common. logic gates, advanced arithmetic and non-arithmetic logic devices, versatile. logic library, etc.) are ...

  10. Review of Research Challenges and Future of in DNA Computing

    2.3 Cellular DNA Computing Models Research Limitations. ... This paper discusses the significance of DNA in engineering and biomolecular contraptions. The territory of DNA registering holds tremendous potential to have been investigated due to its programs in numerous one of a kind fields. However, DNA registering remains at its starting phases ...

  11. DNA computing

    Computer scientists are joining forces with molecular biologists and chemists to explore the potential for computation using information-carrying biological polymers such as nucleic acids (DNA and RNA). DNA computing is a subset of molecular computing. The key feature of DNA for computing is its information content. The self-assembly properties of DNA suggest an indirect application to ...

  12. Introduction to DNA computing

    Experiment, theory, and implementation are the fields of DNA computing-based research and development. Unlike the conventional binary digits, i.e., 1 and 0, data or file in DNA computing is primarily expressed using four genetic alphabets, i.e., ... His research interests are Cloud Computing and DNA Computing. He has published several papers in ...

  13. (PDF) DNA computing: a review

    This paper will discuss evolution of cloud computing, components of cloud computing, different cloud services and types of cloud computing by reviewing over 10 research papers.

  14. (PDF) DNA Computing and Its Applications

    This paper gives an overview of research achievements in DNA computing and touches on the achievements of improved methods employed in DNA computing as well as in solving application problems. At ...

  15. DNA computing research progress and application

    In computer's word, the basic idea of DNA computing is using processing power of organic molecular information instead of digital switch components. In recent years, DNA computing has become one of the important researches which solve complex problems. In this paper, in comparison with the analysis of the development of DNA computing, we introduce the working principle and mathematical model ...

  16. Emerging Approaches to DNA Data Storage: Challenges and Prospects

    A comparison between them that highlights both the similarities and differences in these approaches will provide an overview of the state of the art in DNA data storage. Finally, we also highlight the exciting potential applications of DNA data storage and manipulation, including archival storage, barcoding, cryptography, 11 and DNA computing ...

  17. DNA methylation remodeling and the functional implication during male

    DNA cytosine methylation in rice meiocytes, unicellular microspores, sperms. a Violin plots showing overall cytosine methylation levels (mCG, mCHG, and mCHH) in transposable elements (TE), transposable gene (TEG) and protein coding gene (Gene) of rice Zhonghua 11 (ZH11) meiocyte (Me), unicellular microspore (UM) and sperm (S). Values of the methylomes are averages from the two replicates.

  18. (PDF) DNA Computing Made Simple

    Abstract. DNA computing is essential computation using biological molecules rather than traditional silicon chips. In recent years, DNA computing has been a research tool for solving complex ...

  19. Unlocking biomolecular intelligence

    Advances in DNA nanoengineering promise the development of new computing devices within biological systems, with applications in nanoscale sensing, diagnostics and therapeutics.

  20. Can language models read the genome? This one decoded mRNA to make

    Scientists have a simple way to summarize the flow of genetic information. They call it the central dogma of biology. Information moves from DNA to RNA to proteins. Proteins create the structures and functions of living cells. Messenger RNA, or mRNA, converts the information into proteins in that final step, called translation. But mRNA is ...

  21. DNA computing inspired deep networks design

    In this paper, we propose a DNA computing inspired method called DNAND for automatic neural architecture design. DNA molecules can be regarded as a set of sequences over the alphabet Ω = {A, G, C, T}, where "A, G, C, T" refer to the 4 types of nitrogenous bases: adenine, guanine, cytosine and thymine, respectively. The nitrogenous bases of two separate DNA strands are bound together to ...

  22. DNA Computing A Survey

    Researchers work on computationally intensive problems like Hamiltonian path and Traveling Salesman problem thrived the need of DNA Computing. DNA computing is a secure and efficient way to solve computationally intensive problems. [1] Now a days it is a significant area of research and technology. DNA encodes within itself huge information in a secure and efficient way hence providing a ...

  23. Ed Braun publishes in Nature AND PNAS on the same day!

    Ed's PNAS paper is entitled "A region of suppressed recombination misleads neoavian phylogenomics". Ed and his colleagues found that it is possible use computational methods to detect a strong signal of chromosomal rearrangements that occurred about 65 million years ago in birds - near the time when dinosaurs went extinct and most ...

  24. New advances promise secure quantum computing at home

    The paper is titled "Verifiable blind quantum computing with trapped ions and single photons." ... Traces of DNA in the stomachs of predatory snails provide new insights into the ecology of ...