Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes

Identifieur interne : 000961 ( Ncbi/Merge ); précédent : 000960; suivant : 000962

Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes

Auteurs : Mohammed Sahli ; Tetsuo Shibuya

Source :

RBID : PMC:3441218

Descripteurs français

English descriptors

Abstract

Background

Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers.

Findings

In this paper, we present Arapan-S, a whole-genome assembly program dedicated to handling small genomes. It provides only one contig (along with the reverse complement of this contig) in many cases. Although genomes consist of a number of segments, the implemented algorithm can detect all the segments, as we demonstrate for Influenza Virus A. The Arapan-S program is based on the de Bruijn graph. We have implemented a very sophisticated and fast method to reconstruct the original sequence and neglect erroneous k-mers. The method explores the graph by using neither the shortest nor the longest path, but rather a specific and reliable path based on the coverage level or k-mers’ lengths. Arapan-S uses short reads, and it was tested on raw data downloaded from the NCBI Trace Archive.

Conclusions

Our findings show that the accuracy of the assembly was very high; the result was checked against the European Bioinformatics Institute (EBI) database using the NCBI BLAST Sequence Similarity Search. The identity and the genome coverage was more than 99%. We also compared the efficiency of Arapan-S with other well-known assemblers. In dealing with small genomes, the accuracy of Arapan-S is significantly higher than the accuracy of other assemblers. The assembly process is very fast and requires only a few seconds.

Arapan-S is available for free to the public. The binary files for Arapan-S are available through http://sourceforge.net/projects/dnascissor/files/.

Electronic supplementary material

The online version of this article (doi:10.1186/1756-0500-5-243) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/1756-0500-5-243
PubMed: 22591859
PubMed Central: 3441218

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3441218

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Department of Computer Science, Graduate School of Information Science and Technology,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 113-0033 Japan</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Human Genome Center, Institute of Medical Science,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 108-8639 Japan</wicri:noCountry>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22591859</idno>
<idno type="pmc">3441218</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441218</idno>
<idno type="RBID">PMC:3441218</idno>
<idno type="doi">10.1186/1756-0500-5-243</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000336</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000336</idno>
<idno type="wicri:Area/Pmc/Curation">000336</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000336</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001309</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001309</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:22591859</idno>
<idno type="wicri:Area/PubMed/Corpus">001D78</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001D78</idno>
<idno type="wicri:Area/PubMed/Curation">001D78</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001D78</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001D04</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001D04</idno>
<idno type="wicri:Area/Ncbi/Merge">000961</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Department of Computer Science, Graduate School of Information Science and Technology,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 113-0033 Japan</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Human Genome Center, Institute of Medical Science,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 108-8639 Japan</wicri:noCountry>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Research Notes</title>
<idno type="eISSN">1756-0500</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Contig Mapping</term>
<term>DNA, Viral (analysis)</term>
<term>Databases, Genetic</term>
<term>Genome, Viral</term>
<term>Influenza A virus (genetics)</term>
<term>Reproducibility of Results</term>
<term>Software</term>
<term>Time Factors</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN viral (analyse)</term>
<term>Algorithmes</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique</term>
<term>Cartographie de contigs</term>
<term>Facteurs temps</term>
<term>Génome viral</term>
<term>Logiciel</term>
<term>Reproductibilité des résultats</term>
<term>Virus de la grippe A (génétique)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="analysis" xml:lang="en">
<term>DNA, Viral</term>
</keywords>
<keywords scheme="MESH" qualifier="analyse" xml:lang="fr">
<term>ADN viral</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Influenza A virus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Virus de la grippe A</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Contig Mapping</term>
<term>Databases, Genetic</term>
<term>Genome, Viral</term>
<term>Reproducibility of Results</term>
<term>Software</term>
<term>Time Factors</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique</term>
<term>Cartographie de contigs</term>
<term>Facteurs temps</term>
<term>Génome viral</term>
<term>Logiciel</term>
<term>Reproductibilité des résultats</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers.</p>
</sec>
<sec>
<title>Findings</title>
<p>In this paper, we present Arapan-S, a whole-genome assembly program dedicated to handling small genomes. It provides only one contig (along with the reverse complement of this contig) in many cases. Although genomes consist of a number of segments, the implemented algorithm can detect all the segments, as we demonstrate for
<italic>Influenza Virus A</italic>
. The Arapan-S program is based on the de Bruijn graph. We have implemented a very sophisticated and fast method to reconstruct the original sequence and neglect erroneous
<italic>k</italic>
-mers. The method explores the graph by using neither the shortest nor the longest path, but rather a specific and reliable path based on the coverage level or
<italic>k</italic>
-mers’ lengths. Arapan-S uses short reads, and it was tested on raw data downloaded from the NCBI Trace Archive.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our findings show that the accuracy of the assembly was very high; the result was checked against the European Bioinformatics Institute (EBI) database using the NCBI BLAST Sequence Similarity Search. The identity and the genome coverage was more than 99%. We also compared the efficiency of Arapan-S with other well-known assemblers. In dealing with small genomes, the accuracy of Arapan-S is significantly higher than the accuracy of other assemblers. The assembly process is very fast and requires only a few seconds.</p>
<p>Arapan-S is available for free to the public. The binary files for Arapan-S are available through
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/dnascissor/files/">http://sourceforge.net/projects/dnascissor/files/</ext-link>
.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/1756-0500-5-243) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="White, O" uniqKey="White O">O White</name>
</author>
<author>
<name sortKey="Adams, Md" uniqKey="Adams M">MD Adams</name>
</author>
<author>
<name sortKey="Kerlavage, Ar" uniqKey="Kerlavage A">AR Kerlavage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Madan, A" uniqKey="Madan A">A Madan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
<author>
<name sortKey="Yang, Sp" uniqKey="Yang S">SP Yang</name>
</author>
<author>
<name sortKey="Hillier, L" uniqKey="Hillier L">L Hillier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chevreux, B" uniqKey="Chevreux B">B Chevreux</name>
</author>
<author>
<name sortKey="Pfisterer, T" uniqKey="Pfisterer T">T Pfisterer</name>
</author>
<author>
<name sortKey="Drescher, B" uniqKey="Drescher B">B Drescher</name>
</author>
<author>
<name sortKey="Driesel, Aj" uniqKey="Driesel A">AJ Driesel</name>
</author>
<author>
<name sortKey="Muller, Weg" uniqKey="Muller W">WEG Müller</name>
</author>
<author>
<name sortKey="Wetter, T" uniqKey="Wetter T">T Wetter</name>
</author>
<author>
<name sortKey="Suhai, S" uniqKey="Suhai S">S Suhai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warren, Rl" uniqKey="Warren R">RL Warren</name>
</author>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Holt, Ra" uniqKey="Holt R">RA Holt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Mcewen, Gk" uniqKey="Mcewen G">GK McEwen</name>
</author>
<author>
<name sortKey="Margulies, Eh" uniqKey="Margulies E">EH Margulies</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Butler, J" uniqKey="Butler J">J Butler</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I MacCallum</name>
</author>
<author>
<name sortKey="Kleber, M" uniqKey="Kleber M">M Kleber</name>
</author>
<author>
<name sortKey="Shlyakhter, Ia" uniqKey="Shlyakhter I">IA Shlyakhter</name>
</author>
<author>
<name sortKey="Belmonte, Mk" uniqKey="Belmonte M">MK Belmonte</name>
</author>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I Maccallum</name>
</author>
<author>
<name sortKey="Przybylski, D" uniqKey="Przybylski D">D Przybylski</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Burton, J" uniqKey="Burton J">J Burton</name>
</author>
<author>
<name sortKey="Shlyakhter, I" uniqKey="Shlyakhter I">I Shlyakhter</name>
</author>
<author>
<name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
<author>
<name sortKey="Malek, J" uniqKey="Malek J">J Malek</name>
</author>
<author>
<name sortKey="Mckernan, K" uniqKey="Mckernan K">K McKernan</name>
</author>
<author>
<name sortKey="Ranade, S" uniqKey="Ranade S">S Ranade</name>
</author>
<author>
<name sortKey="Shea, Tp" uniqKey="Shea T">TP Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author>
<name sortKey="Wong, K" uniqKey="Wong K">K Wong</name>
</author>
<author>
<name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author>
<name sortKey="Schein, Je" uniqKey="Schein J">JE Schein</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Qian, W" uniqKey="Qian W">W Qian</name>
</author>
<author>
<name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
<author>
<name sortKey="Shi, Z" uniqKey="Shi Z">Z Shi</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author>
<name sortKey="Shan, G" uniqKey="Shan G">G Shan</name>
</author>
<author>
<name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bryant, Dw" uniqKey="Bryant D">DW Bryant</name>
</author>
<author>
<name sortKey="Wong, Wk" uniqKey="Wong W">WK Wong</name>
</author>
<author>
<name sortKey="Mockler, Tc" uniqKey="Mockler T">TC Mockler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sommer, Dd" uniqKey="Sommer D">DD Sommer</name>
</author>
<author>
<name sortKey="Dlecher, Al" uniqKey="Dlecher A">AL Dlecher</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
<author>
<name sortKey="Brudno, M" uniqKey="Brudno M">M Brudno</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="22591859">
<pmc>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Department of Computer Science, Graduate School of Information Science and Technology,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 113-0033 Japan</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Human Genome Center, Institute of Medical Science,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 108-8639 Japan</wicri:noCountry>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22591859</idno>
<idno type="pmc">3441218</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441218</idno>
<idno type="RBID">PMC:3441218</idno>
<idno type="doi">10.1186/1756-0500-5-243</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000336</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000336</idno>
<idno type="wicri:Area/Pmc/Curation">000336</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000336</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001309</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001309</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Department of Computer Science, Graduate School of Information Science and Technology,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 113-0033 Japan</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution-id institution-id-type="ISNI">000000012151536X</institution-id>
<institution>Human Genome Center, Institute of Medical Science,</institution>
<institution>University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639 Japan</nlm:aff>
<wicri:noCountry code="subfield">Tokyo 108-8639 Japan</wicri:noCountry>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Research Notes</title>
<idno type="eISSN">1756-0500</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers.</p>
</sec>
<sec>
<title>Findings</title>
<p>In this paper, we present Arapan-S, a whole-genome assembly program dedicated to handling small genomes. It provides only one contig (along with the reverse complement of this contig) in many cases. Although genomes consist of a number of segments, the implemented algorithm can detect all the segments, as we demonstrate for
<italic>Influenza Virus A</italic>
. The Arapan-S program is based on the de Bruijn graph. We have implemented a very sophisticated and fast method to reconstruct the original sequence and neglect erroneous
<italic>k</italic>
-mers. The method explores the graph by using neither the shortest nor the longest path, but rather a specific and reliable path based on the coverage level or
<italic>k</italic>
-mers’ lengths. Arapan-S uses short reads, and it was tested on raw data downloaded from the NCBI Trace Archive.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our findings show that the accuracy of the assembly was very high; the result was checked against the European Bioinformatics Institute (EBI) database using the NCBI BLAST Sequence Similarity Search. The identity and the genome coverage was more than 99%. We also compared the efficiency of Arapan-S with other well-known assemblers. In dealing with small genomes, the accuracy of Arapan-S is significantly higher than the accuracy of other assemblers. The assembly process is very fast and requires only a few seconds.</p>
<p>Arapan-S is available for free to the public. The binary files for Arapan-S are available through
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/dnascissor/files/">http://sourceforge.net/projects/dnascissor/files/</ext-link>
.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/1756-0500-5-243) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="White, O" uniqKey="White O">O White</name>
</author>
<author>
<name sortKey="Adams, Md" uniqKey="Adams M">MD Adams</name>
</author>
<author>
<name sortKey="Kerlavage, Ar" uniqKey="Kerlavage A">AR Kerlavage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Madan, A" uniqKey="Madan A">A Madan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
<author>
<name sortKey="Yang, Sp" uniqKey="Yang S">SP Yang</name>
</author>
<author>
<name sortKey="Hillier, L" uniqKey="Hillier L">L Hillier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chevreux, B" uniqKey="Chevreux B">B Chevreux</name>
</author>
<author>
<name sortKey="Pfisterer, T" uniqKey="Pfisterer T">T Pfisterer</name>
</author>
<author>
<name sortKey="Drescher, B" uniqKey="Drescher B">B Drescher</name>
</author>
<author>
<name sortKey="Driesel, Aj" uniqKey="Driesel A">AJ Driesel</name>
</author>
<author>
<name sortKey="Muller, Weg" uniqKey="Muller W">WEG Müller</name>
</author>
<author>
<name sortKey="Wetter, T" uniqKey="Wetter T">T Wetter</name>
</author>
<author>
<name sortKey="Suhai, S" uniqKey="Suhai S">S Suhai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warren, Rl" uniqKey="Warren R">RL Warren</name>
</author>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Holt, Ra" uniqKey="Holt R">RA Holt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Mcewen, Gk" uniqKey="Mcewen G">GK McEwen</name>
</author>
<author>
<name sortKey="Margulies, Eh" uniqKey="Margulies E">EH Margulies</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Butler, J" uniqKey="Butler J">J Butler</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I MacCallum</name>
</author>
<author>
<name sortKey="Kleber, M" uniqKey="Kleber M">M Kleber</name>
</author>
<author>
<name sortKey="Shlyakhter, Ia" uniqKey="Shlyakhter I">IA Shlyakhter</name>
</author>
<author>
<name sortKey="Belmonte, Mk" uniqKey="Belmonte M">MK Belmonte</name>
</author>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I Maccallum</name>
</author>
<author>
<name sortKey="Przybylski, D" uniqKey="Przybylski D">D Przybylski</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Burton, J" uniqKey="Burton J">J Burton</name>
</author>
<author>
<name sortKey="Shlyakhter, I" uniqKey="Shlyakhter I">I Shlyakhter</name>
</author>
<author>
<name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
<author>
<name sortKey="Malek, J" uniqKey="Malek J">J Malek</name>
</author>
<author>
<name sortKey="Mckernan, K" uniqKey="Mckernan K">K McKernan</name>
</author>
<author>
<name sortKey="Ranade, S" uniqKey="Ranade S">S Ranade</name>
</author>
<author>
<name sortKey="Shea, Tp" uniqKey="Shea T">TP Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author>
<name sortKey="Wong, K" uniqKey="Wong K">K Wong</name>
</author>
<author>
<name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author>
<name sortKey="Schein, Je" uniqKey="Schein J">JE Schein</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Qian, W" uniqKey="Qian W">W Qian</name>
</author>
<author>
<name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
<author>
<name sortKey="Shi, Z" uniqKey="Shi Z">Z Shi</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author>
<name sortKey="Shan, G" uniqKey="Shan G">G Shan</name>
</author>
<author>
<name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bryant, Dw" uniqKey="Bryant D">DW Bryant</name>
</author>
<author>
<name sortKey="Wong, Wk" uniqKey="Wong W">WK Wong</name>
</author>
<author>
<name sortKey="Mockler, Tc" uniqKey="Mockler T">TC Mockler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sommer, Dd" uniqKey="Sommer D">DD Sommer</name>
</author>
<author>
<name sortKey="Dlecher, Al" uniqKey="Dlecher A">AL Dlecher</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
<author>
<name sortKey="Brudno, M" uniqKey="Brudno M">M Brudno</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes.</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, Bunkyo-ku, Tokyo, Japan. mohammed@hgc.jp</nlm:affiliation>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, Bunkyo-ku, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
<region type="région">Région de Kantō</region>
</placeName>
<orgName type="university">Université de Tokyo</orgName>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="RBID">pubmed:22591859</idno>
<idno type="pmid">22591859</idno>
<idno type="doi">10.1186/1756-0500-5-243</idno>
<idno type="wicri:Area/PubMed/Corpus">001D78</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001D78</idno>
<idno type="wicri:Area/PubMed/Curation">001D78</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001D78</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001D04</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001D04</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes.</title>
<author>
<name sortKey="Sahli, Mohammed" sort="Sahli, Mohammed" uniqKey="Sahli M" first="Mohammed" last="Sahli">Mohammed Sahli</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, Bunkyo-ku, Tokyo, Japan. mohammed@hgc.jp</nlm:affiliation>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, Bunkyo-ku, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
<region type="région">Région de Kantō</region>
</placeName>
<orgName type="university">Université de Tokyo</orgName>
</affiliation>
</author>
<author>
<name sortKey="Shibuya, Tetsuo" sort="Shibuya, Tetsuo" uniqKey="Shibuya T" first="Tetsuo" last="Shibuya">Tetsuo Shibuya</name>
</author>
</analytic>
<series>
<title level="j">BMC research notes</title>
<idno type="eISSN">1756-0500</idno>
<imprint>
<date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Contig Mapping</term>
<term>DNA, Viral (analysis)</term>
<term>Databases, Genetic</term>
<term>Genome, Viral</term>
<term>Influenza A virus (genetics)</term>
<term>Reproducibility of Results</term>
<term>Software</term>
<term>Time Factors</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN viral (analyse)</term>
<term>Algorithmes</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique</term>
<term>Cartographie de contigs</term>
<term>Facteurs temps</term>
<term>Génome viral</term>
<term>Logiciel</term>
<term>Reproductibilité des résultats</term>
<term>Virus de la grippe A (génétique)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="analysis" xml:lang="en">
<term>DNA, Viral</term>
</keywords>
<keywords scheme="MESH" qualifier="analyse" xml:lang="fr">
<term>ADN viral</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Influenza A virus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Virus de la grippe A</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Contig Mapping</term>
<term>Databases, Genetic</term>
<term>Genome, Viral</term>
<term>Reproducibility of Results</term>
<term>Software</term>
<term>Time Factors</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique</term>
<term>Cartographie de contigs</term>
<term>Facteurs temps</term>
<term>Génome viral</term>
<term>Logiciel</term>
<term>Reproductibilité des résultats</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers.</div>
</front>
</TEI>
</pubmed>
</double>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000961 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000961 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:3441218
   |texte=   Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:22591859" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021