Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Estimating the total genome length of a metagenomic sample using k-mers

Identifieur interne : 000318 ( Pmc/Checkpoint ); précédent : 000317; suivant : 000319

Estimating the total genome length of a metagenomic sample using k-mers

Auteurs : Kui Hua [République populaire de Chine] ; Xuegong Zhang [République populaire de Chine]

Source :

RBID : PMC:6456951

Abstract

Background

Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.

Results

As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses.

Conclusions

We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5467-x) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s12864-019-5467-x
PubMed: 30967110
PubMed Central: 6456951


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:6456951

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Estimating the total genome length of a metagenomic sample using k-mers</title>
<author>
<name sortKey="Hua, Kui" sort="Hua, Kui" uniqKey="Hua K" first="Kui" last="Hua">Kui Hua</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 0369 313X</institution-id>
<institution-id institution-id-type="GRID">grid.419897.a</institution-id>
<institution>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>Department of Automation, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 0369 313X</institution-id>
<institution-id institution-id-type="GRID">grid.419897.a</institution-id>
<institution>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>Department of Automation, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>School of Life Sciences, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30967110</idno>
<idno type="pmc">6456951</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456951</idno>
<idno type="RBID">PMC:6456951</idno>
<idno type="doi">10.1186/s12864-019-5467-x</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000302</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000302</idno>
<idno type="wicri:Area/Pmc/Curation">000302</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000302</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000318</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000318</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Estimating the total genome length of a metagenomic sample using k-mers</title>
<author>
<name sortKey="Hua, Kui" sort="Hua, Kui" uniqKey="Hua K" first="Kui" last="Hua">Kui Hua</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 0369 313X</institution-id>
<institution-id institution-id-type="GRID">grid.419897.a</institution-id>
<institution>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>Department of Automation, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 0369 313X</institution-id>
<institution-id institution-id-type="GRID">grid.419897.a</institution-id>
<institution>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>Department of Automation, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>School of Life Sciences, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Beijing</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.</p>
</sec>
<sec>
<title>Results</title>
<p>As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (
<italic>KRI</italic>
) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (10.1186/s12864-019-5467-x) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Falony, G" uniqKey="Falony G">G Falony</name>
</author>
<author>
<name sortKey="Wijmenga, C" uniqKey="Wijmenga C">C Wijmenga</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhernakova, A" uniqKey="Zhernakova A">A Zhernakova</name>
</author>
<author>
<name sortKey="Wijmenga, C" uniqKey="Wijmenga C">C Wijmenga</name>
</author>
<author>
<name sortKey="Fu, J" uniqKey="Fu J">J Fu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
<author>
<name sortKey="Liu, S" uniqKey="Liu S">S Liu</name>
</author>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
<author>
<name sortKey="Chen, T" uniqKey="Chen T">T Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodriguez, Rl" uniqKey="Rodriguez R">RL Rodriguez</name>
</author>
<author>
<name sortKey="Konstantinidis, Kt" uniqKey="Konstantinidis K">KT Konstantinidis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hooper, Sd" uniqKey="Hooper S">SD Hooper</name>
</author>
<author>
<name sortKey="Dalevi, D" uniqKey="Dalevi D">D Dalevi</name>
</author>
<author>
<name sortKey="Pati, A" uniqKey="Pati A">A Pati</name>
</author>
<author>
<name sortKey="Mavromatis, K" uniqKey="Mavromatis K">K Mavromatis</name>
</author>
<author>
<name sortKey="Ivanova, Nn" uniqKey="Ivanova N">NN Ivanova</name>
</author>
<author>
<name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daley, T" uniqKey="Daley T">T Daley</name>
</author>
<author>
<name sortKey="Smith, Ad" uniqKey="Smith A">AD Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodriguez, Rl" uniqKey="Rodriguez R">RL Rodriguez</name>
</author>
<author>
<name sortKey="Konstantinidis, Kt" uniqKey="Konstantinidis K">KT Konstantinidis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tamames, J" uniqKey="Tamames J">J Tamames</name>
</author>
<author>
<name sortKey="De La Pena, S" uniqKey="De La Pena S">S de la Pena</name>
</author>
<author>
<name sortKey="De Lorenzo, V" uniqKey="De Lorenzo V">V de Lorenzo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wendl, Mc" uniqKey="Wendl M">MC Wendl</name>
</author>
<author>
<name sortKey="Kota, K" uniqKey="Kota K">K Kota</name>
</author>
<author>
<name sortKey="Weinstock, Gm" uniqKey="Weinstock G">GM Weinstock</name>
</author>
<author>
<name sortKey="Mitreva, M" uniqKey="Mitreva M">M Mitreva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
<author>
<name sortKey="Waldron, L" uniqKey="Waldron L">L Waldron</name>
</author>
<author>
<name sortKey="Ballarini, A" uniqKey="Ballarini A">A Ballarini</name>
</author>
<author>
<name sortKey="Narasimhan, V" uniqKey="Narasimhan V">V Narasimhan</name>
</author>
<author>
<name sortKey="Jousson, O" uniqKey="Jousson O">O Jousson</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oh, J" uniqKey="Oh J">J Oh</name>
</author>
<author>
<name sortKey="Byrd, Al" uniqKey="Byrd A">AL Byrd</name>
</author>
<author>
<name sortKey="Deming, C" uniqKey="Deming C">C Deming</name>
</author>
<author>
<name sortKey="Conlan, S" uniqKey="Conlan S">S Conlan</name>
</author>
<author>
<name sortKey="Program, Ncs" uniqKey="Program N">NCS Program</name>
</author>
<author>
<name sortKey="Kong, Hh" uniqKey="Kong H">HH Kong</name>
</author>
<author>
<name sortKey="Segre, Ja" uniqKey="Segre J">JA Segre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barbour, Ad" uniqKey="Barbour A">AD Barbour</name>
</author>
<author>
<name sortKey="Chen, Lhy" uniqKey="Chen L">LHY Chen</name>
</author>
<author>
<name sortKey="Loh, Wl" uniqKey="Loh W">WL Loh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daley, T" uniqKey="Daley T">T Daley</name>
</author>
<author>
<name sortKey="Smith, Ad" uniqKey="Smith A">AD Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Golub, Gh" uniqKey="Golub G">GH Golub</name>
</author>
<author>
<name sortKey="Welsch, Jh" uniqKey="Welsch J">JH Welsch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Truong, Dt" uniqKey="Truong D">DT Truong</name>
</author>
<author>
<name sortKey="Franzosa, Ea" uniqKey="Franzosa E">EA Franzosa</name>
</author>
<author>
<name sortKey="Tickle, Tl" uniqKey="Tickle T">TL Tickle</name>
</author>
<author>
<name sortKey="Scholz, M" uniqKey="Scholz M">M Scholz</name>
</author>
<author>
<name sortKey="Weingart, G" uniqKey="Weingart G">G Weingart</name>
</author>
<author>
<name sortKey="Pasolli, E" uniqKey="Pasolli E">E Pasolli</name>
</author>
<author>
<name sortKey="Tett, A" uniqKey="Tett A">A Tett</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
<author>
<name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freitas, Tak" uniqKey="Freitas T">TAK Freitas</name>
</author>
<author>
<name sortKey="Li, P E" uniqKey="Li P">P-E Li</name>
</author>
<author>
<name sortKey="Scholz, Mb" uniqKey="Scholz M">MB Scholz</name>
</author>
<author>
<name sortKey="Chain, Ps" uniqKey="Chain P">PS Chain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marinier, E" uniqKey="Marinier E">E Marinier</name>
</author>
<author>
<name sortKey="Brown, Dg" uniqKey="Brown D">DG Brown</name>
</author>
<author>
<name sortKey="Mcconkey, Bj" uniqKey="Mcconkey B">BJ McConkey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marcais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pruitt, Kd" uniqKey="Pruitt K">KD Pruitt</name>
</author>
<author>
<name sortKey="Tatusova, T" uniqKey="Tatusova T">T Tatusova</name>
</author>
<author>
<name sortKey="Maglott, Dr" uniqKey="Maglott D">DR Maglott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mavromatis, K" uniqKey="Mavromatis K">K Mavromatis</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, S" uniqKey="Liu S">S Liu</name>
</author>
<author>
<name sortKey="Hua, K" uniqKey="Hua K">K Hua</name>
</author>
<author>
<name sortKey="Chen, S" uniqKey="Chen S">S Chen</name>
</author>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
<author>
<name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30967110</article-id>
<article-id pub-id-type="pmc">6456951</article-id>
<article-id pub-id-type="publisher-id">5467</article-id>
<article-id pub-id-type="doi">10.1186/s12864-019-5467-x</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Estimating the total genome length of a metagenomic sample using k-mers</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hua</surname>
<given-names>Kui</given-names>
</name>
<address>
<email>huak14@mails.tsinghua.edu.cn</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Zhang</surname>
<given-names>Xuegong</given-names>
</name>
<address>
<email>zhangxg@tsinghua.edu.cn</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 0369 313X</institution-id>
<institution-id institution-id-type="GRID">grid.419897.a</institution-id>
<institution>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST,</institution>
</institution-wrap>
Beijing, 100084 China</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>Department of Automation, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0662 3178</institution-id>
<institution-id institution-id-type="GRID">grid.12527.33</institution-id>
<institution>School of Life Sciences, Tsinghua University,</institution>
</institution-wrap>
Beijing, 100084 China</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>4</day>
<month>4</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>4</day>
<month>4</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>20</volume>
<issue>Suppl 2</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.</issue-sponsor>
<elocation-id>183</elocation-id>
<permissions>
<copyright-statement>© The Author(s) 2019</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.</p>
</sec>
<sec>
<title>Results</title>
<p>As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (
<italic>KRI</italic>
) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (10.1186/s12864-019-5467-x) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Metagenomics</kwd>
<kwd>Sequencing coverage</kwd>
<kwd>Distinct k-mers</kwd>
<kwd>Genome length</kwd>
</kwd-group>
<conference xlink:href="http://glab.hzau.edu.cn/APBC2019/">
<conf-name>The 17th Asia Pacific Bioinformatics Conference (APBC 2019)</conf-name>
<conf-acronym>APBC 2019</conf-acronym>
<conf-loc>Wuhan, China</conf-loc>
<conf-date>14-16 January 2019</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2019</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Hua, Kui" sort="Hua, Kui" uniqKey="Hua K" first="Kui" last="Hua">Kui Hua</name>
</noRegion>
<name sortKey="Hua, Kui" sort="Hua, Kui" uniqKey="Hua K" first="Kui" last="Hua">Kui Hua</name>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000318 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000318 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:6456951
   |texte=   Estimating the total genome length of a metagenomic sample using k-mers
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:30967110" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021