Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs

Identifieur interne : 000268 ( Pmc/Curation ); précédent : 000267; suivant : 000269

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs

Auteurs : Dinghua Li [Hong Kong] ; Yukun Huang [Hong Kong] ; Chi-Ming Leung [Hong Kong] ; Ruibang Luo [Hong Kong] ; Hing-Fung Ting [Hong Kong] ; Tak-Wah Lam [Hong Kong]

Source :

RBID : PMC:5657035

Abstract

Background

The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of de Bruijn graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 k-mer size for graph construction and whatever choice of k will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of de Bruijn graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of k-mers, which would have been an effective way to differentiate between erroneous k-mers and correct k-mers.

Results

In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative de Bruijn graphs to take full advantage of multiple k-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct de Bruijn graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a de Bruijn graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of k-mers for building better HMM model.

We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7–19.3% more contigs than Xander, and these contigs were assigned to 10–25% more gene references. In our experiments, MegaGTA, depending on the number of k-mers used, is two to ten times faster than Xander.

Conclusion

MegaGTA improves on the algorithm of Xander and achieves higher sensitivity, accuracy and speed. Moreover, it is capable of assembling gene sequences from ultra-large metagenomic datasets. Its source code is freely available at https://github.com/HKU-BAL/megagta .


Url:
DOI: 10.1186/s12859-017-1825-3
PubMed: 29072142
PubMed Central: 5657035

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5657035

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs</title>
<author>
<name sortKey="Li, Dinghua" sort="Li, Dinghua" uniqKey="Li D" first="Dinghua" last="Li">Dinghua Li</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Huang, Yukun" sort="Huang, Yukun" uniqKey="Huang Y" first="Yukun" last="Huang">Yukun Huang</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Leung, Chi Ming" sort="Leung, Chi Ming" uniqKey="Leung C" first="Chi-Ming" last="Leung">Chi-Ming Leung</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">L3 Bioinformatics Limited, Western District, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>L3 Bioinformatics Limited, Western District</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Luo, Ruibang" sort="Luo, Ruibang" uniqKey="Luo R" first="Ruibang" last="Luo">Ruibang Luo</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">L3 Bioinformatics Limited, Western District, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>L3 Bioinformatics Limited, Western District</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Ting, Hing Fung" sort="Ting, Hing Fung" uniqKey="Ting H" first="Hing-Fung" last="Ting">Hing-Fung Ting</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Lam, Tak Wah" sort="Lam, Tak Wah" uniqKey="Lam T" first="Tak-Wah" last="Lam">Tak-Wah Lam</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">L3 Bioinformatics Limited, Western District, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>L3 Bioinformatics Limited, Western District</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">29072142</idno>
<idno type="pmc">5657035</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657035</idno>
<idno type="RBID">PMC:5657035</idno>
<idno type="doi">10.1186/s12859-017-1825-3</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000268</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000268</idno>
<idno type="wicri:Area/Pmc/Curation">000268</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000268</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs</title>
<author>
<name sortKey="Li, Dinghua" sort="Li, Dinghua" uniqKey="Li D" first="Dinghua" last="Li">Dinghua Li</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Huang, Yukun" sort="Huang, Yukun" uniqKey="Huang Y" first="Yukun" last="Huang">Yukun Huang</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Leung, Chi Ming" sort="Leung, Chi Ming" uniqKey="Leung C" first="Chi-Ming" last="Leung">Chi-Ming Leung</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">L3 Bioinformatics Limited, Western District, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>L3 Bioinformatics Limited, Western District</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Luo, Ruibang" sort="Luo, Ruibang" uniqKey="Luo R" first="Ruibang" last="Luo">Ruibang Luo</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">L3 Bioinformatics Limited, Western District, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>L3 Bioinformatics Limited, Western District</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Ting, Hing Fung" sort="Ting, Hing Fung" uniqKey="Ting H" first="Hing-Fung" last="Ting">Hing-Fung Ting</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Lam, Tak Wah" sort="Lam, Tak Wah" uniqKey="Lam T" first="Tak-Wah" last="Lam">Tak-Wah Lam</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Pokfulam</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">L3 Bioinformatics Limited, Western District, Hong Kong</nlm:aff>
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>L3 Bioinformatics Limited, Western District</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p id="Par1">The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of
<italic>de Bruijn</italic>
graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 
<italic>k</italic>
-mer size for graph construction and whatever choice of
<italic>k</italic>
will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of
<italic>de Bruijn</italic>
graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of
<italic>k</italic>
-mers, which would have been an effective way to differentiate between erroneous
<italic>k</italic>
-mers and correct
<italic>k</italic>
-mers.</p>
</sec>
<sec>
<title>Results</title>
<p id="Par2">In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative
<italic>de Bruijn</italic>
graphs to take full advantage of multiple
<italic>k</italic>
-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct
<italic>de Bruijn</italic>
graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a
<italic>de Bruijn</italic>
graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of
<italic>k</italic>
-mers for building better HMM model.</p>
<p id="Par3">We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7–19.3% more contigs than Xander, and these contigs were assigned to 10–25% more gene references. In our experiments, MegaGTA, depending on the number of
<italic>k</italic>
-mers used, is two to ten times faster than Xander.</p>
</sec>
<sec>
<title>Conclusion</title>
<p id="Par4">MegaGTA improves on the algorithm of Xander and achieves higher sensitivity, accuracy and speed. Moreover, it is capable of assembling gene sequences from ultra-large metagenomic datasets. Its source code is freely available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/HKU-BAL/megagta">https://github.com/HKU-BAL/megagta</ext-link>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nagarajan, N" uniqKey="Nagarajan N">N Nagarajan</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miller, C" uniqKey="Miller C">C Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yuan, C" uniqKey="Yuan C">C Yuan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Sun, Y" uniqKey="Sun Y">Y Sun</name>
</author>
<author>
<name sortKey="Cole, Jr" uniqKey="Cole J">JR Cole</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Q" uniqKey="Wang Q">Q Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, Bh" uniqKey="Bloom B">BH Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hart, Pe" uniqKey="Hart P">PE Hart</name>
</author>
<author>
<name sortKey="Nilsson, Nj" uniqKey="Nilsson N">NJ Nilsson</name>
</author>
<author>
<name sortKey="Raphael, B" uniqKey="Raphael B">B Raphael</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bolger, Am" uniqKey="Bolger A">AM Bolger</name>
</author>
<author>
<name sortKey="Lohse, M" uniqKey="Lohse M">M Lohse</name>
</author>
<author>
<name sortKey="Usadel, B" uniqKey="Usadel B">B Usadel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rho, M" uniqKey="Rho M">M Rho</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Ye, Y" uniqKey="Ye Y">Y Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Q" uniqKey="Wang Q">Q Wang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">29072142</article-id>
<article-id pub-id-type="pmc">5657035</article-id>
<article-id pub-id-type="publisher-id">1825</article-id>
<article-id pub-id-type="doi">10.1186/s12859-017-1825-3</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Dinghua</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Huang</surname>
<given-names>Yukun</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Leung</surname>
<given-names>Chi-Ming</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Luo</surname>
<given-names>Ruibang</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ting</surname>
<given-names>Hing-Fung</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Lam</surname>
<given-names>Tak-Wah</given-names>
</name>
<address>
<email>twlam@cs.hku.hk</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121742757</institution-id>
<institution-id institution-id-type="GRID">grid.194645.b</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of Hong Kong,</institution>
</institution-wrap>
Pokfulam, Hong Kong</aff>
<aff id="Aff2">
<label>2</label>
L3 Bioinformatics Limited, Western District, Hong Kong</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>16</day>
<month>10</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>16</day>
<month>10</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>18</volume>
<issue>Suppl 12</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.</issue-sponsor>
<elocation-id>408</elocation-id>
<permissions>
<copyright-statement>© The Author(s). 2017</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p id="Par1">The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of
<italic>de Bruijn</italic>
graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 
<italic>k</italic>
-mer size for graph construction and whatever choice of
<italic>k</italic>
will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of
<italic>de Bruijn</italic>
graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of
<italic>k</italic>
-mers, which would have been an effective way to differentiate between erroneous
<italic>k</italic>
-mers and correct
<italic>k</italic>
-mers.</p>
</sec>
<sec>
<title>Results</title>
<p id="Par2">In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative
<italic>de Bruijn</italic>
graphs to take full advantage of multiple
<italic>k</italic>
-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct
<italic>de Bruijn</italic>
graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a
<italic>de Bruijn</italic>
graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of
<italic>k</italic>
-mers for building better HMM model.</p>
<p id="Par3">We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7–19.3% more contigs than Xander, and these contigs were assigned to 10–25% more gene references. In our experiments, MegaGTA, depending on the number of
<italic>k</italic>
-mers used, is two to ten times faster than Xander.</p>
</sec>
<sec>
<title>Conclusion</title>
<p id="Par4">MegaGTA improves on the algorithm of Xander and achieves higher sensitivity, accuracy and speed. Moreover, it is capable of assembling gene sequences from ultra-large metagenomic datasets. Its source code is freely available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/HKU-BAL/megagta">https://github.com/HKU-BAL/megagta</ext-link>
.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Metagenomics</kwd>
<kwd>Assembly</kwd>
<kwd>De Bruijn graph</kwd>
<kwd>Targeted gene</kwd>
</kwd-group>
<conference>
<conf-name>12th International Symposium on Bioinformatics Research and Applications (ISBRA 2016)</conf-name>
<conf-acronym>ISBRA 2016</conf-acronym>
<conf-loc>Minsk, Belarus</conf-loc>
<conf-date>5-8 June 2016</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2017</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000268 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000268 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:5657035
   |texte=   MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:29072142" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021