Thursday, November 1, 2012

Sequence Technologies: Recent Advances and Implications for the Future

1.   Introduction
Sequencing in genetics is the determination of primary structure of biopolymers like nucleicacids and proteins. Several sequencing techniques have been developed and extensively commercialised. Currently there are companies world over offering sequencing services, reagents, sequencers and analytical services for research and industry as a ‘commodity’.
The recent developments in sequencing, annotation and sequence-based technologies supported by bioinformatics are leading a step-wise revolution of our knowledge base in biological sciences in entirety. These are bound to bridge the gap between classical approaches and reverse approaches in genetics and escalate our understanding of different aspects of genotype-phenotype correlation (Figure 1).
Figure 1 Classical/forward and reverse approaches in genetics (Nieduszynski and Liti, 2011).
2.   Sequencing Technologies Past, Present and Future
2.1. Protein sequencing
It was in 1902 that Fischer and Hofmeister proposed proteins are formed through peptide bonds between aminoacids in a linear structure (Lichtenthaler, 2002); it took again half a century for decrypting the aminoacid sequence. Most proteins fold into unique 3-dimensional structures, which are as important as the aminoacid sequence in their function. Edman degradation, peptide mass fingerprinting, protease digests and mass spectrometry can be used for protein sequencing, latter being the most advanced and popularly used. However thanks to the central dogma and genetic code it is much easier to infer the protein sequence when the gene encoding it is known and vice versa. Large amount of proteomic data is available presently for diverse organisms that allow researchers to predict secondary structure, efficiently identify homologous proteins by sequence alignment, construct phylogenetic trees and so on.
2.2.              Nucleicacid sequencing
The initial attempts of deciphering the nucleotide sequence of nucleic acids were on bacteriophage RNA in 1969, which eventually resulted in sequencing of the first complete gene and genome in 1972 and 1976 respectively (Adams et al., 1969; Min Jou et al., 1972; Fiers, W. et al., 1976). The major constraints at that time were difficulties in purification and large size of the polymers. The milestones in classical sequencing approaches are summarised in table 1.
Table 1 Classical Sequencing Techniques
There have been remarkable improvements in nucleicacid sequencing technologies and data-production pipelines in recent years. Today companies are able to offer overnight DNA sequencing services for 1,000s of bases read length. The trend of DNA sequencing costs as tracked by The National Human Genome Research Institute (NHGRI) for assessing improvements in DNA sequencing technologies is shown in figure 2.
Figure 2 Cost per mega-base of DNA sequence; Cost per genome 2001 – 2012 (www.genome.gov/)
The sudden and profound out-pacing of Moore's Law, an information technology indicator of excellent technical advancement, in 2008 represents transition of sequencing agencies from Sanger-based to 'second generation' or 'next-generation' DNA sequencing technologies.
2.3.              Next Generation Sequencing (NGS)
Next-generation high-throughput sequencing (HT-NGS) technologies were developed to overcome the limitations of the earlier technologies. They offered higher speed, less labour, and lowered cost.
The 454 FLX Pyro-sequencer from Roche Applied Sciences was the first next-generation sequencer to become commercially available in 2004, followed by Solexa 1G Genetic Analyzer from Illumina, the SOLiD (Supported Oligonucleotide Ligation and Detection) System from Applied Bio systems and HeliScope from Helicos BioSciences in 2006, 2007 and 2008 correspondingly. The different HT-NGS sequencing platforms developed that uses different detection principles as summarised in Table 2.
Table 2 Second/next generation sequencing techniques
The last decade saw a race between these industrial giants for improved sequencing technologies. In 2006, the X Prize Foundation in collaboration with J. Craig Venter Science Foundation, established the Archon X Prize for Genomics, 10 million US$ award to “the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 1,000,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $1,000 per genome"(http://www.xprize.org). The major players in the field have been upgrading their technologies and dropping their prices ever since. However the X Prize still remain unclaimed.
Today the different sequencing technologies are recommended for specific needs. 454 Roche is preferably used for ultra-deep sequencing and production of reference genome for the whole genome or transcriptome sequencing where as SOLid is recommended for targeted re-sequencing. Automated Sanger method continues to be used for sequencing of PCR products, plasmids and gap closure or finishing of genomes.
At the moment third generation sequencing technologies are being reported. These next-next-generation sequencing technologies include Nano-pore Sequencing, which involves nano-pore immersed in a conducting fluid and applied potential across that detect characteristic electric current due to conduction of ions through it; real-time monitoring of PCR activity through fluorescent resonant energy transfer; Single molecule real-time sequencing utilizing Zero-mode waveguides; pay-as-you-go sequencing and Direct to Consumer Whole Genome Sequencing (Clarke et al., 2009; Wabuyele et al.,2003; Levene et al., 2003; Pollack, 2012; Vorhaus, 2012). scalability, simplicity, efficiency, and economics are they key features of these. real-time results being the vital objective.
3.   Implications for the future
With the recent advances in sequencing technologies the quantity that can be sequenced in unit time at unit cost is increasing by day. Improvements in speed, accuracy and availability of high throughput DNA sequencing technologies have caused a meteoric rise in the volume of ‘omic sequences available in public domain (Figure 3). On-going researches’ are developing virtual environments to explore genomic space at the gene, protein, and function and pathway network level. The large volumes of data thus generated are creating powerful resources for scientific research in all areas of life; few cases are described below (Collins et al., 2003).

Figure 3 Number of genome sequences in Gene-Bank and WGS databases; Number of protein sequences in public domain (http://www.ncbi.nlm.nih.gov/genbank/statistics, Galperin and Koonin, 2010).
3.1.              Genomic Medicine
Unprecedented progress in genomics elucidating the genetic/genomic basis of health, illness, disease risk, and treatment responses is applicable to both biomedical research and clinical medicine. This could possibly revolutionize healthcare through earlier diagnosis, identification of the genetic factors associated with diseases, more effective prevention, production of designer drugs, custom-made finest treatment of diseases, and avoiding drug side effects. Genomic medicine brings humanity closer in offering a better quality of life to at high risk individuals and finding a cure for many life-threatening diseases like Cancer (Guttmacher and Collins, 2002).
3.2.              Agri-genomic Revolution
Genomics revolution is at the core of plant and animal breeding. Complete genomic sequences of model plants and crops for example Arabidopsis, rice, wheat, date-palm; availability of omic databases; and high-throughput and parallel approaches for analysis of mutations allows us to understand the function of genes in terms of their relationship to the phenotype. The technologies may soon be able to decipher the relationship between genetic variation in gene sequences and phenotypic variation in traits, rather than just between a gene and a mutant phenotype. genomic approach may also help in studying quantitative trait variation and molecular diversity of genes.
New approaches to QTL mapping and quantitative trait nucleotide (QTN), candidate gene approaches and whole gene scan have shaped from the new advents in sequencing. Association studies based on existing populations/ germplasm collections will be a major advance for species where experimental populations are difficult to access e.g. Oil-palm. Future prospects lie in improved plant breeding efficiency in the form of Marker Assisted Selection (MAS), identification of new trait supporting alleles in wild germplasm, targeted mutagenesis and more (Morgante and Salamini, 2003).
In animal breeding, Genome-wide SNP panels are now available for an increasing number of livestock species, enabling breeders to cost-effectively and accurately determine a genomic estimated breeding value making traditional approaches obsolete and revolutionizing global livestock industries
3.3.              Ecology and Evolution Studies
Comparative genome analysis in a phylogenetic context can provide the most meaningful insights into both germplasm characterization and processes of evolution. Genomic and meta-genomic sequencing techniques are beginning to reform the study of ecology and evolution starting with our understanding of Bacteria and Archaea. The NGS technologies have the potential to bring the genomics revolution to whole populations, and to endangered and ecologically and evolutionary important species (Hudson, 2008; Shokralla et al.,2012).
3.4.              Practical Difficulties
Biology is in the middle of a paradigm shift towards becoming a fully data driven science. The analysis of the growing volume of gene expression data becoming available from the various post-genomics technologies will present a challenge for generating necessary annotations and large-scale computational support.
3.5.              Public Concerns
The sequencing technologies have improved our understanding of the genetic makeup of living organisms. However there are many aspects of public policy to be addressed before such advances could be put to practice. Concerns regarding privacy, discrimination, biological terrorism, equitable access, intellectual property, validation of tests and products, ethics, economics and public awareness are only a few of them.
4.   Conclusion
The NGS technologies provide practical, massively parallel sequencing at lower cost and without the requirement for large, automated facilities, making genome and transcriptome sequencing and re-sequencing possible for small and large endeavours in research and practice (Morozova and Marra, 2008). Still many ethical, legal, and social issues surround access to genetic information.
The ramifications, the rapid advances in sequencing technology will have in our daily lives will be surely profound and lasting, even though are unpredictable -as Eric Lander[1] reflected, “it was easier to predict 10 years ago what we will be doing today than to predict today what is going to be possible in a few years’ time”.
5.   References
Adams, J. M., Jeppesen, P. G. N., Sanger, F., Barrell, B. G. (1969). "Nucleotide sequences from fragments of R17 bacteriophage RNA." Cold Spring Harbor Symposia on Quantitative Biology Cold Spring Harbor Laboratory Press 34: 611-620.
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D. H., Johnson, D., .. Corcoran, K. (2000). "Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays". Nature Biotechnology 18 (6): 630–634.
Clarke, J., Wu, H. C., Jayasinghe, L., Patel, A., Reid, S., Bayley, H. (2009). "Continuous base identification for single-molecule nanopore DNA sequencing." Nature nanotechnology, 4(4): 265-270.
Collins, F. S., Green, E. D., Guttmacher, A. E., & Guyer, M. S. (2003). "A vision for the future of genomics research." Nature: 422(6934), 835-847.
Commins, J., Toft, C., Fares, M. A."Computational Biology Methods and Their Application to the Comparative Genomics of Endocellular Symbiotic Bacteria of Insects." Biol. Procedures Online (2009).
Fiers, W., Contreras, R., Duerinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymaekers, A., van den Berghe, A., Volckaert, G., Ysebaert, M. (1976). "Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene". Nature 260 (5551): 500–7.
Galperin, M. Y., Koonin, E. V. (2010). "From complete genome sequence to ‘complete’understanding?." Trends in biotechnology 28(8): 398-406.
Guttmacher, A. E., & Collins, F. S. (2002). Genomic medicine—a primer. New England Journal of Medicine 347(19): 1512-1520.
Hudson, M. E. (2008). "Sequencing breakthroughs for genomic ecology and evolutionary biology." Molecular Ecology Resources: 8(1): 3-17.
Levene, M. J., Korlach, J., Turner, S. W., Foquet, M., Craighead, H. G., Webb, W. W. (2003). "Zero-mode waveguides for single-molecule analysis at high concentrations." Science 299(5607): 682-686.
Lichtenthaler, F. W. (2002). "Emil Fischer, his personality, his achievements, and his scientific progeny." European Journal of Organic Chemistry, 2002(24), 4095-4122.
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A.,.. Volkmer, G. A. (2005). "Genome Sequencing in Open Microfabricated High Density Picoliter Reactors". Nature 437 (7057): 376–80.
Maxam, A.M., Gilbert, W. (1977). "A new method for sequencing DNA". Proceedings of the National Academy of Sciences of the United States of America 74 (2): 560–4.
Min Jou, W., Haegeman, G., Ysebaert, M., Fiers, W. (1972). "Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein". Nature 237
Morgante, M., Salamini, F. (2003). "From plant genomics to breeding practice." Current Opinion in Biotechnology 14(2): 214-219.
Morozova, O., Marra, M. A. (2008). “Applications of next-generation sequencing technologies in functional genomics.” Genomics 92(5): 255-264.
NCBI Genbank Statistics <http://www.ncbi.nlm.nih.gov/genbank/statistics>, [accessed on 20/02/2013]
Nieduszynski, C. A., Liti, G. (2011). "From sequence to function: insights from natural variation in budding yeasts." Biochimica et Biophysica Acta (BBA)-General Subjects, 1810(10), 959-966.
“DNA Sequencing Costs.” NHGRI statistics <http://www.genome.gov/sequencingcosts/> [accessed on 20/02/2012]
Pollack, A.(2012) "Company Unveils DNA Sequencing Device Meant to Be Portable, Disposable and Cheap." The New York Times, Published: February 17, 2012 <http://www.nytimes.com/2012/02/18/health/oxford-nanopore-unveils-tiny-dna-sequencing-device.html?_r=0>
Ronaghi, M., Uhlén, M., Nyrén, P. (1998). "A sequencing method based on real-time pyrophosphate." Science (New York, NY), 281(5375): 363-365.
Rothberg, J. M., Hinz, W., Rearick, T. M., Schultz, J., Mileski, W., Davey, M., ...Bustillo, J. (2011). "An integrated semiconductor device enabling non-optical genome sequencing." Nature 475(7356): 348-352.
Sanger, F., Nicklen, S., Coulson, A.R. (1977). "DNA sequencing with chain-terminating inhibitors". Proceedings of the National Academy of Sciences of the United States of America 74 (12): 5463–7.
Sanger, F.; Coulson, A.R. (1975), "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase", Journal of Molecular Biology 94 (3): 441–448
Shendure, J., Porreca, G. J., Reppas, N. B., Lin, X., McCutcheon, J. P., Rosenbaum, A. M., Wang, M.D., Zhang, K., Mitra, R. D., Church, G. M. (2005). Accurate multiplex polony sequencing of an evolved bacterial genome. Science, 309(5741): 1728-1732.
Shokralla, S., Spall, J. L., Gibson, J. F., Hajibabaei, M. (2012). “Next‐generation sequencing technologies for environmental DNA research.” Molecular Ecology 21(8): 1794-1805.
Staden, R (1979). "A strategy of DNA sequencing employing computer programs.". Nucleic Acids Research 6 (7): 2601–10.
Vorhaus, Dan (2012). "DNA DTC: The Return of Direct to Consumer Whole Genome Sequencing". genomicslawreport.com <Vorhaus, Dan (29 November 2012). "DNA DTC: The Return of Direct to Consumer Whole Genome Sequencing". genomicslawreport.com>
Wabuyele, M. B., Farquar, H., Stryjewski, W., Hammer, R. P., Soper, S. A., Cheng, Y. W., Barany, F. (2003). "Approaching real-time molecular diagnostics: single-pair fluorescence resonance energy transfer (spFRET) detection for the analysis of low abundant point mutations in K-ras oncogenes." Journal of the American Chemical Society 125(23): 6937-6945.



[1]in his Genetics Society Mendel Medal Lecture at the Fourth International Conference of Quantitative Genetics 2012, Edinburgh

No comments: