Abstract
The de Bruijn graph assembly approach breaks reads into k-mers before assembling them into contigs. The string graph approach forms contigs by connecting two reads with k or more overlapping nucleotides. Both approaches must deal with the following problems: false-positive vertices, due to erroneous reads; gap problem, due to non-uniform coverage; branching problem, due to erroneous reads and repeat regions. A proper choice of k is crucial but for single k there is always a trade-off: a small k favors the situation of erroneous reads and non-uniform coverage, and a large k favors short repeat regions.
We propose an iterative de Bruijn graph approach iterating from small to large k exploring the advantages of the in between values. Our IDBA outperforms the existing algorithms by constructing longer contigs with similar accuracy and using less memory, both with real and simulated data. The running time of the algorithm is comparable to existing algorithms.
Availability: IDBA is available at http://www.cs.hku.hk/~alse/idba/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wang, J., et al.: The diploid genome sequence of an Asian individual. Nature 456(7218), 60–65 (2008)
Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19(2), 336–346 (2009)
Warren, R.L., et al.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)
Jeck, W.R., et al.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23(21), 2942–2944 (2007)
Dohm, J.C., et al.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17(11), 1697–1706 (2007)
Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Res. 18(2), 324–330 (2008)
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)
Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Hernandez, D., et al.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18(5), 802–809 (2008)
Chaisson, M., Pevzner, P., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)
Butler, J., et al.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18(5), 810–820 (2008)
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98(17), 9748–9753 (2001)
Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. J. Comput. Biol. 2(2), 291–306 (1995)
Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(suppl. 2), ii79–ii85 (2005)
Chin, F.Y., et al.: Finding optimal threshold for correction error reads in DNA assembling. BMC Bioinformatics 10(suppl. 1), S15 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L. (2010). IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-12683-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12682-6
Online ISBN: 978-3-642-12683-3
eBook Packages: Computer ScienceComputer Science (R0)