Sequence genome assembly is the practice of building a set of complete DNA strands from their component pieces.
It can be used in a variety of applications, such as mapping genetic variation, constructing de novo genomes and finding new organisms. The process of sequencing is essential for understanding the genetic code and for improving our knowledge about the evolution of life on Earth. While sequencing has become much cheaper and faster due to technological advances, the process of assembling a sequence remains difficult and time-consuming. This article will provide a comprehensive overview of the field of sequence genome assembly, including its history, existing methods, and potential applications.
History
The history of sequence genome assembly dates back to the late 1970s with the development of the first DNA sequencing technology. This technology was used to study the structure of the smallest bacteria and virus genomes, as well as the genomes of larger organisms like the yeast Saccharomyces cerevisiae. Since then, advances in sequencing technology has enabled the development of larger and more complex genomes, including those of human, plant and animal species. In 2003, the Human Genome Project was completed, and this laid the foundation for the current understanding of human genetics.
Types of Assembly
Sequence genome assembly is broadly classified into two types: de novo assembly and reference-guided assembly.
De novo assembly: De novo assembly is the practice of building a sequence from its component parts without any prior information. It is often used to construct a new genome from scratch. This is the most difficult and time-consuming type of assembly, as there is no existing reference to guide the process.
Reference-guided assembly: This method uses a known reference genome to construct the target sequence. This type of assembly is faster and more accurate than de novo assembly, as the reference provides valuable information about the target sequence. This method is often used to map genetic variation and to construct genomes of unknown organisms.
Tools for Assembly
There are a variety of tools and software packages available for performing sequence genome assembly. The most commonly used tools include Velvet, ALLPATHS-LG, SGA, and SOAPdenovo2.
• Velvet: This popular open-source software package is widely used for de novo assembly of small genomes.
• ALLPATHS-LG: This software package for de novo assembly of larger genomes was developed by the Broad Institute in collaboration with the J. Craig Venter Institute and the University of Texas Southwestern Medical Center.
• SGA: This software package performs both de novo and reference-guided assembly.
• SOAPdenovo2: This popular software package is widely used for small genome assembly and has also been used in projects such as the 1000 Genomes Project.
Advanced Tools
In addition to these basic tools, there are also advanced tools available. These include AMOS, VelvetOptimiser, Ragout, ABySS, and Cufflinks, which are designed to produce high-quality assemblies.
• AMOS: This advanced software package is designed to produce high-quality assemblies using both de novo and reference-guided methods.
• VelvetOptimiser: This software package is designed to optimize the assembly process by automatically selecting the best parameters.
• Ragout: This software package allows for construction of complete genomes from multiple sources.
• ABySS: This comprehensive software package is capable of producing highly accurate assemblies from short reads.
• Cufflinks: This software package can be used for de novo and reference-guided assembly of larger genomes.
Applications
Sequence genome assembly is used for a variety of applications, including:
• Mapping genetic variation: This process involves comparing sequence datasets from different individuals or species to map genetic variations. This is important for understanding how mutations and changes in gene expression can lead to biological differences.
• De novo sequencing: This involves assembling a complete genome or segments of a genome from sequences generated by a sequencing process. This is useful for constructing species that have not been sequenced before.
• Finding new organisms: This involves using sequence data to identify the presence of new organisms in a sample. This process is often used to study new species of bacteria or virus present in a given environment.
• Comparative genomics: This involves comparing the genomes of different species to understand their evolutionary relationship. This process enables scientists to study an organism’s evolutionary history and its adaptations to different environments.
In conclusion, sequence genome assembly is a powerful tool for understanding genetic variation and for constructing new genomes. The field of sequence genome assembly has come a long way since it began in the late 1970s, and today there are a variety of tools available for both de novo and reference-guided assembly. These tools are used for a variety of applications, including mapping genetic variation, constructing de novo genomes, finding new organisms, and comparative genomics. As technology continues to progress, the process of sequence genome assembly will become even more accurate and efficient, and it will continue to play an essential role in understanding the evolution of life on earth.