Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0002930 ( Pmc/Corpus ); précédent : 0002929; suivant : 0002931 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Quality control of microbiota metagenomics by k-mer analysis</title>
<author>
<name sortKey="Plaza Onate, Florian" sort="Plaza Onate, Florian" uniqKey="Plaza Onate F" first="Florian" last="Plaza Onate">Florian Plaza Onate</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Batto, Jean Michel" sort="Batto, Jean Michel" uniqKey="Batto J" first="Jean-Michel" last="Batto">Jean-Michel Batto</name>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Juste, Catherine" sort="Juste, Catherine" uniqKey="Juste C" first="Catherine" last="Juste">Catherine Juste</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fadlallah, Jehane" sort="Fadlallah, Jehane" uniqKey="Fadlallah J" first="Jehane" last="Fadlallah">Jehane Fadlallah</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fougeroux, Cyrielle" sort="Fougeroux, Cyrielle" uniqKey="Fougeroux C" first="Cyrielle" last="Fougeroux">Cyrielle Fougeroux</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gouas, Doriane" sort="Gouas, Doriane" uniqKey="Gouas D" first="Doriane" last="Gouas">Doriane Gouas</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pons, Nicolas" sort="Pons, Nicolas" uniqKey="Pons N" first="Nicolas" last="Pons">Nicolas Pons</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kennedy, Sean" sort="Kennedy, Sean" uniqKey="Kennedy S" first="Sean" last="Kennedy">Sean Kennedy</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Levenez, Florence" sort="Levenez, Florence" uniqKey="Levenez F" first="Florence" last="Levenez">Florence Levenez</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dore, Joel" sort="Dore, Joel" uniqKey="Dore J" first="Joel" last="Dore">Joel Dore</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ehrlich, S Dusko" sort="Ehrlich, S Dusko" uniqKey="Ehrlich S" first="S Dusko" last="Ehrlich">S Dusko Ehrlich</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gorochov, Guy" sort="Gorochov, Guy" uniqKey="Gorochov G" first="Guy" last="Gorochov">Guy Gorochov</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Larsen, Martin" sort="Larsen, Martin" uniqKey="Larsen M" first="Martin" last="Larsen">Martin Larsen</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25887914</idno>
<idno type="pmc">4373121</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373121</idno>
<idno type="RBID">PMC:4373121</idno>
<idno type="doi">10.1186/s12864-015-1406-7</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000293</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000293</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Quality control of microbiota metagenomics by k-mer analysis</title>
<author>
<name sortKey="Plaza Onate, Florian" sort="Plaza Onate, Florian" uniqKey="Plaza Onate F" first="Florian" last="Plaza Onate">Florian Plaza Onate</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Batto, Jean Michel" sort="Batto, Jean Michel" uniqKey="Batto J" first="Jean-Michel" last="Batto">Jean-Michel Batto</name>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Juste, Catherine" sort="Juste, Catherine" uniqKey="Juste C" first="Catherine" last="Juste">Catherine Juste</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fadlallah, Jehane" sort="Fadlallah, Jehane" uniqKey="Fadlallah J" first="Jehane" last="Fadlallah">Jehane Fadlallah</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fougeroux, Cyrielle" sort="Fougeroux, Cyrielle" uniqKey="Fougeroux C" first="Cyrielle" last="Fougeroux">Cyrielle Fougeroux</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gouas, Doriane" sort="Gouas, Doriane" uniqKey="Gouas D" first="Doriane" last="Gouas">Doriane Gouas</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pons, Nicolas" sort="Pons, Nicolas" uniqKey="Pons N" first="Nicolas" last="Pons">Nicolas Pons</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kennedy, Sean" sort="Kennedy, Sean" uniqKey="Kennedy S" first="Sean" last="Kennedy">Sean Kennedy</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Levenez, Florence" sort="Levenez, Florence" uniqKey="Levenez F" first="Florence" last="Levenez">Florence Levenez</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dore, Joel" sort="Dore, Joel" uniqKey="Dore J" first="Joel" last="Dore">Joel Dore</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ehrlich, S Dusko" sort="Ehrlich, S Dusko" uniqKey="Ehrlich S" first="S Dusko" last="Ehrlich">S Dusko Ehrlich</name>
<affiliation>
<nlm:aff id="Aff1">INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">UMR1319 Micalis, INRA, Jouy-en-Josas, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gorochov, Guy" sort="Gorochov, Guy" uniqKey="Gorochov G" first="Guy" last="Gorochov">Guy Gorochov</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Larsen, Martin" sort="Larsen, Martin" uniqKey="Larsen M" first="Martin" last="Larsen">Martin Larsen</name>
<affiliation>
<nlm:aff id="Aff3">Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.</p>
</sec>
<sec>
<title>Results</title>
<p>We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Pelletier, E" uniqKey="Pelletier E">E Pelletier</name>
</author>
<author>
<name sortKey="Le Paslier, D" uniqKey="Le Paslier D">D Le Paslier</name>
</author>
<author>
<name sortKey="Yamada, T" uniqKey="Yamada T">T Yamada</name>
</author>
<author>
<name sortKey="Mende, Dr" uniqKey="Mende D">DR Mende</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cotillard, A" uniqKey="Cotillard A">A Cotillard</name>
</author>
<author>
<name sortKey="Kennedy, Sp" uniqKey="Kennedy S">SP Kennedy</name>
</author>
<author>
<name sortKey="Kong, Lc" uniqKey="Kong L">LC Kong</name>
</author>
<author>
<name sortKey="Prifti, E" uniqKey="Prifti E">E Prifti</name>
</author>
<author>
<name sortKey="Pons, N" uniqKey="Pons N">N Pons</name>
</author>
<author>
<name sortKey="Le Chatelier, E" uniqKey="Le Chatelier E">E Le Chatelier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le Chatelier, E" uniqKey="Le Chatelier E">E Le Chatelier</name>
</author>
<author>
<name sortKey="Nielsen, T" uniqKey="Nielsen T">T Nielsen</name>
</author>
<author>
<name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
<author>
<name sortKey="Prifti, E" uniqKey="Prifti E">E Prifti</name>
</author>
<author>
<name sortKey="Hildebrand, F" uniqKey="Hildebrand F">F Hildebrand</name>
</author>
<author>
<name sortKey="Falony, G" uniqKey="Falony G">G Falony</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author>
<name sortKey="Burgdorf, Ks" uniqKey="Burgdorf K">KS Burgdorf</name>
</author>
<author>
<name sortKey="Manichanh, C" uniqKey="Manichanh C">C Manichanh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yatsunenko, T" uniqKey="Yatsunenko T">T Yatsunenko</name>
</author>
<author>
<name sortKey="Rey, Fe" uniqKey="Rey F">FE Rey</name>
</author>
<author>
<name sortKey="Manary, Mj" uniqKey="Manary M">MJ Manary</name>
</author>
<author>
<name sortKey="Trehan, I" uniqKey="Trehan I">I Trehan</name>
</author>
<author>
<name sortKey="Dominguez Bello, Mg" uniqKey="Dominguez Bello M">MG Dominguez-Bello</name>
</author>
<author>
<name sortKey="Contreras, M" uniqKey="Contreras M">M Contreras</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kamada, N" uniqKey="Kamada N">N Kamada</name>
</author>
<author>
<name sortKey="Seo, Su" uniqKey="Seo S">SU Seo</name>
</author>
<author>
<name sortKey="Chen, Gy" uniqKey="Chen G">GY Chen</name>
</author>
<author>
<name sortKey="Nunez, G" uniqKey="Nunez G">G Nunez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, T" uniqKey="Ding T">T Ding</name>
</author>
<author>
<name sortKey="Schloss, Pd" uniqKey="Schloss P">PD Schloss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Adler, Cj" uniqKey="Adler C">CJ Adler</name>
</author>
<author>
<name sortKey="Dobney, K" uniqKey="Dobney K">K Dobney</name>
</author>
<author>
<name sortKey="Weyrich, Ls" uniqKey="Weyrich L">LS Weyrich</name>
</author>
<author>
<name sortKey="Kaidonis, J" uniqKey="Kaidonis J">J Kaidonis</name>
</author>
<author>
<name sortKey="Walker, Aw" uniqKey="Walker A">AW Walker</name>
</author>
<author>
<name sortKey="Haak, W" uniqKey="Haak W">W Haak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Biesbroek, G" uniqKey="Biesbroek G">G Biesbroek</name>
</author>
<author>
<name sortKey="Sanders, Ea" uniqKey="Sanders E">EA Sanders</name>
</author>
<author>
<name sortKey="Roeselers, G" uniqKey="Roeselers G">G Roeselers</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Caspers, Mp" uniqKey="Caspers M">MP Caspers</name>
</author>
<author>
<name sortKey="Trzcinski, K" uniqKey="Trzcinski K">K Trzcinski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J Schroder</name>
</author>
<author>
<name sortKey="Bailey, J" uniqKey="Bailey J">J Bailey</name>
</author>
<author>
<name sortKey="Conway, T" uniqKey="Conway T">T Conway</name>
</author>
<author>
<name sortKey="Zobel, J" uniqKey="Zobel J">J Zobel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Xv" uniqKey="Wang X">XV Wang</name>
</author>
<author>
<name sortKey="Blades, N" uniqKey="Blades N">N Blades</name>
</author>
<author>
<name sortKey="Ding, J" uniqKey="Ding J">J Ding</name>
</author>
<author>
<name sortKey="Sultana, R" uniqKey="Sultana R">R Sultana</name>
</author>
<author>
<name sortKey="Parmigiani, G" uniqKey="Parmigiani G">G Parmigiani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keegan, Kp" uniqKey="Keegan K">KP Keegan</name>
</author>
<author>
<name sortKey="Trimble, Wl" uniqKey="Trimble W">WL Trimble</name>
</author>
<author>
<name sortKey="Wilkening, J" uniqKey="Wilkening J">J Wilkening</name>
</author>
<author>
<name sortKey="Wilke, A" uniqKey="Wilke A">A Wilke</name>
</author>
<author>
<name sortKey="Harrison, T" uniqKey="Harrison T">T Harrison</name>
</author>
<author>
<name sortKey="D Ouza, M" uniqKey="D Ouza M">M D’Souza</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leggett, Rm" uniqKey="Leggett R">RM Leggett</name>
</author>
<author>
<name sortKey="Ramirez Gonzalez, Rh" uniqKey="Ramirez Gonzalez R">RH Ramirez-Gonzalez</name>
</author>
<author>
<name sortKey="Clavijo, Bj" uniqKey="Clavijo B">BJ Clavijo</name>
</author>
<author>
<name sortKey="Waite, D" uniqKey="Waite D">D Waite</name>
</author>
<author>
<name sortKey="Davey, Rp" uniqKey="Davey R">RP Davey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccutcheon, Jp" uniqKey="Mccutcheon J">JP McCutcheon</name>
</author>
<author>
<name sortKey="Moran, Na" uniqKey="Moran N">NA Moran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Yatsunenko, T" uniqKey="Yatsunenko T">T Yatsunenko</name>
</author>
<author>
<name sortKey="Cantarel, Bl" uniqKey="Cantarel B">BL Cantarel</name>
</author>
<author>
<name sortKey="Duncan, A" uniqKey="Duncan A">A Duncan</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Olson, R" uniqKey="Olson R">R Olson</name>
</author>
<author>
<name sortKey="Disz, T" uniqKey="Disz T">T Disz</name>
</author>
<author>
<name sortKey="Pusch, Gd" uniqKey="Pusch G">GD Pusch</name>
</author>
<author>
<name sortKey="Vonstein, V" uniqKey="Vonstein V">V Vonstein</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Williams, D" uniqKey="Williams D">D Williams</name>
</author>
<author>
<name sortKey="Trimble, Wl" uniqKey="Trimble W">WL Trimble</name>
</author>
<author>
<name sortKey="Shilts, M" uniqKey="Shilts M">M Shilts</name>
</author>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Ochman, H" uniqKey="Ochman H">H Ochman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gao, L" uniqKey="Gao L">L Gao</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shannon, Ce" uniqKey="Shannon C">CE Shannon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Juste, C" uniqKey="Juste C">C Juste</name>
</author>
<author>
<name sortKey="Kreil, Dp" uniqKey="Kreil D">DP Kreil</name>
</author>
<author>
<name sortKey="Beauvallet, C" uniqKey="Beauvallet C">C Beauvallet</name>
</author>
<author>
<name sortKey="Guillot, A" uniqKey="Guillot A">A Guillot</name>
</author>
<author>
<name sortKey="Vaca, S" uniqKey="Vaca S">S Vaca</name>
</author>
<author>
<name sortKey="Carapito, C" uniqKey="Carapito C">C Carapito</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Godon, Jj" uniqKey="Godon J">JJ Godon</name>
</author>
<author>
<name sortKey="Zumstein, E" uniqKey="Zumstein E">E Zumstein</name>
</author>
<author>
<name sortKey="Dabert, P" uniqKey="Dabert P">P Dabert</name>
</author>
<author>
<name sortKey="Habouzit, F" uniqKey="Habouzit F">F Habouzit</name>
</author>
<author>
<name sortKey="Moletta, R" uniqKey="Moletta R">R Moletta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dillies, Ma" uniqKey="Dillies M">MA Dillies</name>
</author>
<author>
<name sortKey="Rau, A" uniqKey="Rau A">A Rau</name>
</author>
<author>
<name sortKey="Aubert, J" uniqKey="Aubert J">J Aubert</name>
</author>
<author>
<name sortKey="Hennequet Antier, C" uniqKey="Hennequet Antier C">C Hennequet-Antier</name>
</author>
<author>
<name sortKey="Jeanmougin, M" uniqKey="Jeanmougin M">M Jeanmougin</name>
</author>
<author>
<name sortKey="Servant, N" uniqKey="Servant N">N Servant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ward, J" uniqKey="Ward J">J Ward</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, B" uniqKey="Yang B">B Yang</name>
</author>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author>
<name sortKey="Chen, Jc" uniqKey="Chen J">JC Chen</name>
</author>
<author>
<name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glenn, Tc" uniqKey="Glenn T">TC Glenn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Jia, H" uniqKey="Jia H">H Jia</name>
</author>
<author>
<name sortKey="Cai, X" uniqKey="Cai X">X Cai</name>
</author>
<author>
<name sortKey="Zhong, H" uniqKey="Zhong H">H Zhong</name>
</author>
<author>
<name sortKey="Feng, Q" uniqKey="Feng Q">Q Feng</name>
</author>
<author>
<name sortKey="Sunagawa, S" uniqKey="Sunagawa S">S Sunagawa</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25887914</article-id>
<article-id pub-id-type="pmc">4373121</article-id>
<article-id pub-id-type="publisher-id">1406</article-id>
<article-id pub-id-type="doi">10.1186/s12864-015-1406-7</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Quality control of microbiota metagenomics by k-mer analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Plaza Onate</surname>
<given-names>Florian</given-names>
</name>
<address>
<email>florian.plaza@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Batto</surname>
<given-names>Jean-Michel</given-names>
</name>
<address>
<email>jean-michel.batto@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Juste</surname>
<given-names>Catherine</given-names>
</name>
<address>
<email>catherine.juste@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fadlallah</surname>
<given-names>Jehane</given-names>
</name>
<address>
<email>jehane_fad@yahoo.fr</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fougeroux</surname>
<given-names>Cyrielle</given-names>
</name>
<address>
<email>cyrielle.fougeroux@yahoo.fr</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gouas</surname>
<given-names>Doriane</given-names>
</name>
<address>
<email>dorianegouas@gmail.com</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff5"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pons</surname>
<given-names>Nicolas</given-names>
</name>
<address>
<email>nicolas.pons@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kennedy</surname>
<given-names>Sean</given-names>
</name>
<address>
<email>skennedy@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Levenez</surname>
<given-names>Florence</given-names>
</name>
<address>
<email>florence.levenez@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Dore</surname>
<given-names>Joel</given-names>
</name>
<address>
<email>joel.dore@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ehrlich</surname>
<given-names>S Dusko</given-names>
</name>
<address>
<email>dusko.ehrlich@jouy.inra.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gorochov</surname>
<given-names>Guy</given-names>
</name>
<address>
<email>guy.gorochov@upmc.fr</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
<xref ref-type="aff" rid="Aff5"></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Larsen</surname>
<given-names>Martin</given-names>
</name>
<address>
<email>Martin.Larsen@upmc.fr</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
<xref ref-type="aff" rid="Aff5"></xref>
</contrib>
<aff id="Aff1">
<label></label>
INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350 Jouy en Josas, France</aff>
<aff id="Aff2">
<label></label>
UMR1319 Micalis, INRA, Jouy-en-Josas, France</aff>
<aff id="Aff3">
<label></label>
Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l’Hôpital, 75013 Paris, France</aff>
<aff id="Aff4">
<label></label>
Département d’Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013 Paris, France</aff>
<aff id="Aff5">
<label></label>
Inserm UMR-S1135, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013 Paris, France</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>14</day>
<month>3</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>14</day>
<month>3</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>16</volume>
<issue>1</issue>
<elocation-id>183</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>6</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>2</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© Plaza Onate et al.; licensee BioMed Central. 2015</copyright-statement>
<license license-type="open-access">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0">http://creativecommons.org/licenses/by/4.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.</p>
</sec>
<sec>
<title>Results</title>
<p>We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Metagenomics</kwd>
<kwd>Next generation sequencing</kwd>
<kwd>Quality control</kwd>
<kwd>Sampling bias</kwd>
<kwd>Sample size limits</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2015</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Analysis of human microbiota has in recent years unraveled a universe of intricate interactions between man and microorganisms with direct implications for health and disease [
<xref ref-type="bibr" rid="CR1">1</xref>
-
<xref ref-type="bibr" rid="CR5">5</xref>
]. A large proportion of commensal bacterial species are presently either highly fastidious or cannot be cultured
<italic>in vitro</italic>
. This has been a major obstacle to accurately describe the microbiota composition. Metagenomic analysis based on state-of-the-art next generation sequencing (NGS) along with sophisticated bioinformatics overcomes these barriers by analyzing complex samples
<italic>ex vivo</italic>
.</p>
<p>Quantitative metagenomic analysis creates a gene and species profile, which allows the identification and phylogenetic classification of known as well as novel genes and species. Arumugam and co-workers discovered 3 functionally distinct gut microbiota compositions designated “enterotypes” [
<xref ref-type="bibr" rid="CR1">1</xref>
]. Indeed, highly diverse consortia of commensals may functionally synergize to derive energy from nutrients in a highly coordinated and efficient manner. An imbalance of gut microbiota composition has been associated with a large range of pathologies, such as obesity [
<xref ref-type="bibr" rid="CR2">2</xref>
], allergy and autoimmunity [
<xref ref-type="bibr" rid="CR6">6</xref>
].</p>
<p>Although most studies make use of bacteria rich stool samples, a range of other body habitats with a much lower bacterial load is steadily gaining interest, such as vaginal, skin, oral and nasal body habitats [
<xref ref-type="bibr" rid="CR7">7</xref>
]. A recent study demonstrates that it is technically feasible to analyze microbiota composition in samples of poor genomic DNA quantity and quality, such as dental plaques of pre-historic skeletons [
<xref ref-type="bibr" rid="CR8">8</xref>
]. However, it is also increasingly clear that this type of analysis is often associated with strong biases, which are difficult to discern and complicated to correct [
<xref ref-type="bibr" rid="CR9">9</xref>
]. The increasing number of samples and the use of samples from sites of low microbial density augment the importance of speed and quality control of sample processing, sequencing and data analysis. A number of studies have addressed this need by developing bioinformatics tools to monitor and correct NGS errors. Errors in this context refers to direct sequence errors at the individual base level [
<xref ref-type="bibr" rid="CR10">10</xref>
,
<xref ref-type="bibr" rid="CR11">11</xref>
], but also the distribution and abundance of individual sequences including sequences derived from sample or technological contaminants [
<xref ref-type="bibr" rid="CR12">12</xref>
-
<xref ref-type="bibr" rid="CR14">14</xref>
]. We developed a novel method, which rapidly determines and quantifies the quality of metagenomic sequence distribution at the sample level. Metagenomic analysis of complex microbiota communities is particularly sensitive to errors in sequence distribution, because abundance measures of individual bacterial genes and strains are based on sequence distribution within a given sample.</p>
<p>The information density of bacterial genomes is higher than complex eukaryotic organisms, because they harbor much less non-coding nucleotides [
<xref ref-type="bibr" rid="CR15">15</xref>
]. Moreover, bacterial genome size is tightly linked with host symbiosis. Indeed, commensals with a long history of host symbiosis generally have small genome sizes as compared to more recent bacterial symbionts [
<xref ref-type="bibr" rid="CR16">16</xref>
]. The metagenome of human gut microbiota consists of approximately 1000 different bacterial genomes and therefore has a size of approximately 1 Gbp. Of note, no single bacterial strain surpasses an abundance of 0.5% of the total gut microbiota [
<xref ref-type="bibr" rid="CR17">17</xref>
], emphasizing its highly diverse nature. We therefore hypothesize that contrary to genomes of individual bacterial strains [
<xref ref-type="bibr" rid="CR18">18</xref>
] a metagenome of high diversity fragmented into short sequences of length k (k-mers), would be distributed uniformly if k is sufficiently small.</p>
<p>K-mers are regarded as strings of length k restricted to the 4-letter alphabet (A, G, C, T). They have been used to solve various problems, such as rapid comparison of DNA sequences [
<xref ref-type="bibr" rid="CR19">19</xref>
], estimation of bacterial genome size [
<xref ref-type="bibr" rid="CR20">20</xref>
] and phylogeny of double-stranded DNA viruses [
<xref ref-type="bibr" rid="CR21">21</xref>
]. We propose to introduce an automated k-mer distribution analysis of raw DNA sequences directly downstream of the deep-sequencing analysis. Practically, we count the occurrence of all k
<sup>4</sup>
possible k-mers in the raw metagenomics sequence dataset (palindromic k-mers are aggregated when sequencing direction is arbitrary) and evaluate their distribution using a metric based on the information theory of Shannon [
<xref ref-type="bibr" rid="CR22">22</xref>
].</p>
<p>Here we show that k-mer distributions of good quality metagenomic sequence data of complex gut microbiota samples are equally distributed unlike genomic sequences of individual bacterial species. We furthermore demonstrate that k-mer distribution is associated with the quality of the metagenomic data. Moreover, the Shannon Entropy of the k-mer distribution predicts the rate of sequence mapping to a predefined reference gene catalogue. Our approach analysis unprocessed raw sequences and may significantly facilitate the decision making of whether to 1) recollect, 2) reprocess a sample or 3) increase number of sequence reads before continuing with more extensive analysis. Moreover, it introduces a quality metric that may help validate conclusions made from metagenomic data.</p>
</sec>
<sec id="Sec2" sec-type="methods">
<title>Methods</title>
<sec id="Sec3">
<title>Faecal sample collection and processing</title>
<p>Faecal samples from 30 human donors were collected in dedicated hermetically closed plastic containers kept anaerobically (oxygen poor and CO
<sub>2</sub>
rich) with activated Anaerocult® A strips (Merck Millipore, Molsheim, France). Samples were aliquoted anaerobically and cryopreserved (−80°C) within 24 hours. Microbiota from 2.5g of stool were separated from the fecal matrix on an inverse Nycodenz® gradient under anaerobic conditions as previously described [
<xref ref-type="bibr" rid="CR23">23</xref>
]. The separation yielded an average of 1.59x10
<sup>11</sup>
(95% confidence interval = [7.8x10
<sup>10</sup>
:3.2x10
<sup>11</sup>
]) purified microbial cells per sample. Undiluted as well as four 10xfold serial dilutions of microbiota were pelleted by centrifugation (3000xg for 10 minutes) and cryo-preserved as dry-pellets for subsequent DNA extraction.</p>
</sec>
<sec id="Sec4">
<title>DNA extraction</title>
<p>Genomic DNA was extracted using two distinct but overlapping protocols for whole stool and gradient purified commensals, respectively. Whole stool samples were treated as previously described [
<xref ref-type="bibr" rid="CR24">24</xref>
]. Briefly, 200 mg of faecal sample was lysed chemically (guanidine thiocyanate and N-lauroyl sarcosine) and mechanically (glass beads) followed by elimination of cell debris by centrifugation and precipitation of genomic DNA. Finally, genomic DNA was RNase treated. DNA concentration and molecular size were estimated by Nanodrop (Thermo Scientific) and agarose gel electrophoresis. Gradient purified commensal samples were treated similar to whole stool samples with the exception that DNA precipitation was performed in smaller volumes and with extra-long incubation times.</p>
</sec>
<sec id="Sec5">
<title>Metagenomic library construction</title>
<p>Libraries were constructed according to manufactures protocol (Life Technologies). Briefly, extracted genomic DNA was sheared by sonication, size-exclusion purified by Agencourt beads (Beckman Coulter), ligated to P1 and P2 adaptor oligonucleotides with appropriate barcodes, PCR amplified (default 6 cycles for all 52 metagenomes analysed but augmented for dilution series metagenomes as indicated in Table 
<xref rid="Tab1" ref-type="table">1</xref>
) and loaded onto the flow-chip for downstream SOLiD sequencing.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>
<bold>DNA quantity used for serial dilution library constructions</bold>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr valign="top">
<th></th>
<th colspan="3">
<bold>Donor #1</bold>
</th>
<th colspan="3">
<bold>Donor #2</bold>
</th>
</tr>
<tr valign="top">
<th>
<bold>Sample Size (10</bold>
<sup>
<bold>x</bold>
</sup>
<bold>bacteria)</bold>
</th>
<th>
<bold>Purified dsDNA (ng/ml)</bold>
<sup>
<bold>1</bold>
</sup>
</th>
<th>
<bold>DNA for ligation (μg)</bold>
<sup>
<bold>2</bold>
</sup>
</th>
<th>
<bold>PCR cycles</bold>
</th>
<th>
<bold>Purified dsDNA (ng/ml)</bold>
<sup>
<bold>1</bold>
</sup>
</th>
<th>
<bold>DNA for ligation (μg)</bold>
<sup>
<bold>2</bold>
</sup>
</th>
<th>
<bold>PCR cycles</bold>
</th>
</tr>
</thead>
<tbody>
<tr valign="top">
<td>10</td>
<td>34.9</td>
<td>1.00</td>
<td>6</td>
<td>30.7</td>
<td>1.00</td>
<td>6</td>
</tr>
<tr valign="top">
<td>9</td>
<td>4.67</td>
<td>0.41</td>
<td>7</td>
<td>7.17</td>
<td>0.35</td>
<td>7</td>
</tr>
<tr valign="top">
<td>8</td>
<td>2.68</td>
<td>0.07</td>
<td>8</td>
<td>2.63</td>
<td>0.06</td>
<td>8</td>
</tr>
<tr valign="top">
<td>7</td>
<td>0.358</td>
<td><0,04</td>
<td>9</td>
<td>0.356</td>
<td><0,04</td>
<td>9</td>
</tr>
<tr valign="top">
<td>6</td>
<td>0.296</td>
<td><0,03</td>
<td>10</td>
<td>0.228</td>
<td><0,02</td>
<td>10</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>1</sup>
Genomic dsDNA extracted from indicated number of bacteria.</p>
<p>
<sup>2</sup>
Amount of sheared and size purified genomic DNA utilized for ligation with P1 and P2 adaptor oligonucleotides.</p>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec id="Sec6">
<title>Metagenomic sequencing and data analysis</title>
<p>Microbiota gene content was determined by high-throughput SOLiD sequencing of total faecal DNA [
<xref ref-type="bibr" rid="CR25">25</xref>
]. An average of 34.3 million ± 36 million (mean ± s.d.) and 52.6 million ± 56.8 million 35-base-long single reads were determined for each sample from 10 dilution series samples and 52 whole stool samples, respectively (a total of 3.1 Gb of sequence). Raw sequences for all dilution series samples have been deposited in the European Bioinformatics Institute (EBI) European Nucleotide Archive (ENA) under the accession number PRJEB7925. By using Bowtie (version 1.0.0) [
<xref ref-type="bibr" rid="CR26">26</xref>
] an average of 4.6 million ± 3.5 million and 13.8 million ± 15.4 million reads per individual from the two groups of samples, respectively, were mapped on the reference catalogue of 3.3 million genes [
<xref ref-type="bibr" rid="CR4">4</xref>
] with a maximum of 3 mismatches. Reads mapping at multiple positions were discarded and an average of 3.6 million ± 2.7 million and 13.0 million ± 14.7 million uniquely mapped reads per individual from the two sample groups, respectively, were retained for estimating the abundance of each reference gene by using METEOR software [
<xref ref-type="bibr" rid="CR27">27</xref>
]. Abundance of each gene in an individual was normalized with the method coined Reads Per Kilobase per Million (RPKM) as previously described [
<xref ref-type="bibr" rid="CR28">28</xref>
]. Briefly, gene abundance was determined as the number of reads that uniquely mapped to a defined gene. Subsequently, normalized gene abundances were transformed in frequencies by dividing them by the total number of uniquely mapped reads for a given sample. The resulting microbial gene profile was used for further analyses.</p>
</sec>
<sec id="Sec7">
<title>Bacterial genome sequences</title>
<p>28 bacterial genomes from a range of species covering common human commensals were extracted from the collection of available reference genomes from NCBI (cf. Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1).</p>
</sec>
<sec id="Sec8">
<title>K-mer analysis</title>
<p>The abundances of all overlapping k-mer sequences present in a set of whole-genome shotgun short-read sequences were counted with in-house developed C++ software (
<ext-link ext-link-type="uri" xlink:href="http://www.mgps.eu/people/fplaza/">www.mgps.eu/people/fplaza/</ext-link>
) optimized for small k, which supports colour space reads and the CSFasta file format as input. Sequence reads with missing colour cells were discarded and remaining reads were trimmed to 35 bases. K-mer analysis of bacterial genomes was conducted with Jellyfish version 1.1 (
<ext-link ext-link-type="uri" xlink:href="http://www.cbcb.umd.edu/software/jellyfish/">http://www.cbcb.umd.edu/software/jellyfish/</ext-link>
). The frequencies of different k-mers at each abundance value contained in a set of sequences are plotted as a k-mer abundance histogram. A repeated sequence in a sampled genome affects the shape of these k-mer abundance spectra depending on its length and copy number. A DNA sequence of length l will contain (l – k +1) different k-mers if it does not contain repeats of length greater than k–1.</p>
<p>Each k-mer has a reverse complement. E.g. the complement of 4mer ATTC is GAAT. Note that some k-mers are their own reverse complement (e.g. AGCT) if and only if k is even. Since the shot-gun short-read sequencing technology applied does not differentiate according to sequence orientation, we apply a “canonical representation”, which consider k-mers and their reverse complement equivalent (e.g. the 4-mers ATTC and GAAT are grouped together).</p>
<p>If the same sequence occurs n times in a genome, shotgun sequencing would sample k-mers from this sequence n times more often than those that occur in a single-copy (also referred to as average read depth). Therefore, repeated sequences in the genome result in higher abundances of associated k-mers. These collections of k-mers at higher-than-normal abundances appear as multiple peaks at different positions along the x-axis of the k-mer abundance histogram.</p>
</sec>
<sec id="Sec9">
<title>Hierarchical cluster analysis</title>
<p>Agglomerative hierarchical cluster analysis of k-mer distributions of individual bacterial genomes performed according to Ward’s minimum variance method [
<xref ref-type="bibr" rid="CR29">29</xref>
] was accomplished using JMP7 software (SAS Software, NC, USA). The optimal number of clusters was identified according to the largest distance change between successive junctions of the associated dendrogram plot. Validity and reproducibility of the classification obtained with hierarchical cluster analysis was assessed using non-hierarchical k-means cluster analysis, in which the optimal number of clusters identified through hierarchical cluster analysis was pre-specified. Reproducibility of the classifications obtained with both hierarchical and non-hierarchical clustering was assessed by determination of the kappa value.</p>
</sec>
<sec id="Sec10">
<title>Ethics statement</title>
<p>The study was conducted in accordance with the Declaration of Helsinki. Human stool samples were obtained following acquisition of the study participants’ written informed consent and the study protocol was reviewed and approved by local ethics committee of Pitié-Salpêtrière Hospital, Paris (“Les Comités de protection des personnes”).</p>
</sec>
<sec id="Sec11">
<title>Statistical analysis</title>
<p>Spearman’s rank correlation was calculated using the R project (
<ext-link ext-link-type="uri" xlink:href="http://www.r-project.org/">http://www.R-project.org</ext-link>
, Vienna, Austria).
<italic>P</italic>
-values < 0.05 were considered statistically significant.</p>
</sec>
</sec>
<sec id="Sec12" sec-type="discussion">
<title>Results and discussion</title>
<sec id="Sec13">
<title>K-mer distribution of complex microbiota is homogenous irrespective of bacterial composition</title>
<p>Highly complex microbiota metagenomic raw sequence data can be split in short sequences of length k bases, which can be binned into a finite set of possible k-mer sequences (4
<sup>k</sup>
combinations). K-mer analysis of single bacterial genome data has previously revealed differences in k-mer distribution between bacterial species [
<xref ref-type="bibr" rid="CR30">30</xref>
]. In contrast, we hypothesize that k-mer distribution of a large set of sequence data derived from a complex mix of microorganisms follows a relatively uniform distribution. To validate this hypothesis we selected two distinct stool samples representing two different enterotypes (
<italic>Prevotella</italic>
dominated for donor #1 and
<italic>Bacteroides</italic>
dominated for donor #2 - Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
A). We then analysed the occurrence of each 4-mer by searching through all raw sequence reads for the two metagenomes. Interestingly, the two selected metagenomes had very similar 4-mer distributions despite their highly different bacterial compositions (Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
B). Of note, the Shannon-Entropy for both samples was high (0.9932 and 0.9930 for donor #1 and #2, respectively) characteristic of a uniform distribution of 4-mers (Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
B). In line with our hypothesis, the Shannon-Entropy of the two selected metagenomes was clearly higher than the one of 28 known genomes of bacterial species from a large spectrum of phyla and classes (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1A top panel and C). In other words, genomes from individual bacterial species have a more heterogenous 4-mer distribution than complex metagenomes, even when such metagenomes are derived from very different gut microbiota compositions. This result was confirmed by evaluating the average normalized Shannon-index of the k-mer distribution for genomes derived from 28 bacterial strains compared to gut metagenomes derived from 21 low (<10
<sup>10</sup>
bacteria) (cf. Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1A middle panel) and 31 high (>10
<sup>10</sup>
bacteria) (cf. Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1A bottom panel) bacterial content human stool samples (
<italic>P</italic>
 = 0.001 and <0.0001, respectively, cf. Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1B). Similarly, we compared the 28 bacterial strains with 110 healthy individuals from the study by Yatsunenko
<italic>et al.</italic>
(mean and 95% confidence intervals for strains and metagenomes: 0.972 [0.963:0.980] and 0.983 [0.981:0.984], respectively,
<italic>P</italic>
 = 0.004) [
<xref ref-type="bibr" rid="CR5">5</xref>
]. Of note, the Yatsunenko study employed Illumina sequencing, showing that the methodology is platform-independent.
<fig id="Fig1">
<label>Figure 1</label>
<caption>
<p>
<bold>4-mer distribution analysis for complex microbiota metagenomes compared to individual bacterial genomes. A</bold>
, Bar diagram of quantitative metagenomics of gut microbiota from two healthy volunteers, donor #1 (blue) and #2 (red), aggregated to express the frequency of a selected number of taxonomic classes from the
<italic>Bacteroidetes</italic>
and
<italic>Firmicutes</italic>
phylums.
<bold>B</bold>
, Line graph showing the 4-mer distribution of metagenomic sequences from gut microbiota of donor #1 and #2. A histogram depicting the 4-mer abundance distribution is plotted to the right of the line graph. Distribution entropy is indicated (normalized Shannon Entropy).
<bold>C</bold>
, Scatter plot visualizes the 4-mer distribution entropy for 28 bacterial genomes and two gut microbiota metagenomes.
<bold>D</bold>
, The 28 bacterial genomes are divided into 6 objective clusters by non-supervised agglomerative hierarchical cluster analysis of metagenomic 4-mer distributions based on Ward’s minimum variance method.</p>
</caption>
<graphic xlink:href="12864_2015_1406_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p>Moreover, individual bacterial genomes aggregated into 6 clusters defined by their k-mer distribution using agglomerative hierarchical cluster analysis (Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
D). The clusters were validated with a non-hierarchical K-means cluster analysis. The agreement between the two clustering techniques was good as defined by Cohen’s kappa agreement value (κ = 0.48). Interestingly, the identified clusters are associated with the phylogeny of the bacteria and can be used to evaluate taxonomic relations, as previously suggested [
<xref ref-type="bibr" rid="CR30">30</xref>
]. Deductions from this result suggest that 4-mer analysis of metagenomes of complex bacterial mixtures can be decomposed into a linear regression of k-mer distribution vectors of individual bacteria genomes and a residual, which would represent the component unexplained by known bacterial genomes. In other words, this type of analysis could identify novel bacterial species and potentially elucidate their phylogenetic descent. This approach is beyond the scope of the present study.</p>
</sec>
<sec id="Sec14">
<title>Quantitative metagenomic analysis of serially diluted gut microbiota identifies lowest analyzable sample size limit</title>
<p>Biased metagenomic sequence distribution can be a result of technical obstacles (DNA extraction and library construction), contaminations and limiting amount of sample material [
<xref ref-type="bibr" rid="CR9">9</xref>
,
<xref ref-type="bibr" rid="CR31">31</xref>
]. Whereas the former causes may be improved or avoided the latter is most often unavoidable. Of note, the reliability of sequence distribution directly affects the validity of quantitative metagenomic data. Therefore, there is an urgent need for a method to evaluate metagenomic quality. To investigate if k-mer distribution analysis of complex metagenomes could predict metagenomic quality of samples with limiting material, we generated 10-fold serial dilutions of two purified gut microbiota samples presented above (cf. Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
- donor #1 and #2). Each dilution underwent genomic DNA extraction and metagenomic analysis (Table 
<xref rid="Tab1" ref-type="table">1</xref>
). All dilutions of the same sample should ideally have identical gene distribution with the more concentrated sample being the most representative of the underlying gut microbiota and thus of best quality. We therefore mapped raw metagenomic sequences onto a reference gene catalogue [
<xref ref-type="bibr" rid="CR4">4</xref>
] for all analyzed samples and correlated gene frequencies from four 10-fold dilutions with gene frequencies from the most concentrated sample, serving as internal reference sample (Figure 
<xref rid="Fig2" ref-type="fig">2</xref>
A). For both samples (donor #1 and #2) this analysis demonstrated strong correlations between all serial dilutions and their reference sample with a clear reduction in correlation for the highest dilution for both samples, indicating the analytical sample size limitation associated with our analytical protocol (Figure 
<xref rid="Fig2" ref-type="fig">2</xref>
B). As expected, correlation between two unrelated donors (the highest concentration sample from donor #1 and #2 - spearman r = 0.22) was significantly lower than intra-donor correlations (Figure 
<xref rid="Fig2" ref-type="fig">2</xref>
B and C).
<fig id="Fig2">
<label>Figure 2</label>
<caption>
<p>
<bold>Quantitative metagenomics of serially diluted gut microbiota. A</bold>
, Scatter plot of gene frequencies derived from quantitative metagenomic profiles of undiluted gut microbiota on the x-axis versus colour coded 10-, 100-, 1000- and 10.000-fold diluted gut microbiota on the y-axis (samples derived from donor #1 gut microbiota).
<bold>B</bold>
, Categorical line graph depicts spearman rank correlation coefficients between gene frequencies from metagenomic analysis of undiluted gut microbiota versus gene frequencies of 10-, 100-, 1000- and 10.000-fold diluted gut microbiota from donor #1 (blue) and donor #2 (red).
<bold>C</bold>
, Scatter plot of gene frequencies of undiluted samples from the two unrelated donors #1 (x-axis) and #2 (y-axis) are depicted, and their spearman rank correlation is indicated as a dotted line in B, Genes, present in the reference gene catalogue, which are not detected in the samples are excluded from the analysis.</p>
</caption>
<graphic xlink:href="12864_2015_1406_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
</sec>
<sec id="Sec15">
<title>K-mer distribution analysis of metagenomic sequences identifies the same lower sample size limit as quantitative metagenomic analysis</title>
<p>Having established a metagenomic dataset including metagenomes with a defined decline in quality we investigated if k-mer analysis of raw sequences of the same dataset would be able to predict the lower sample size limit as defined in the previous paragraph based on a comparative gene mapping procedure. 4-mer analysis of raw metagenomes corresponding to dilution series samples (1 to 10.000 fold dilutions) of gut microbiota from donor #1 and #2 identified a biased 4-mer distribution for 1.000- and 10.000-fold dilution samples from both donor #1 and #2 (Figure 
<xref rid="Fig3" ref-type="fig">3</xref>
A, left panel). Interestingly, aberrant k-mers were not fully overlapping between sample dilutions (Figure 
<xref rid="Fig3" ref-type="fig">3</xref>
A, right panel), suggesting that low quality is derived from both sample preparation and system noise. Calculating the Shannon-Entropy for 4-mer distributions from all metagenomes confirmed that the two most dilute samples suffered from a particularly biased raw sequence read composition (Figure 
<xref rid="Fig3" ref-type="fig">3</xref>
B). To identify aberrant 4-mers, we correlated the 4-mer frequency observed for each dilution series metagenome with the 4-mer frequency observed for the undiluted reference sample of donor #1 and #2, respectively (Figure 
<xref rid="Fig3" ref-type="fig">3</xref>
C). This analysis revealed a distinct subset of 4-mers largely overrepresented in the diluted samples. A closer look at these 4-mers uncovered a tight association with the unique barcode-cassette sequence flanking the genome fragments of the metagenomic shot-gun repertoire. These sequences are derived from self-ligated shot-gun cassettes. Excessive amounts of these sequences are a consequence of limited genomic DNA and subsequent reduced ligation efficiency. Indeed, when we removed all raw sequence reads matching the barcode-cassette sequence of the respective metagenome repertoire, the 4-mer distributions of diluted samples were less aberrant (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S2A), although the 10.000-fold diluted sample remained quantitatively more biased (reduced Shannon-Entropy) than the other dilutions for both donor #1 and #2 (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S2B). Similarly, the correlation analysis revealed that the 10.000-fold diluted sample included k-mers largely overrepresented in the diluted sample compared to the undiluted reference k-mer distribution (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S2C). Of note, this bias is correlated with the skewed gene distribution observed for the 10.000-fold dilution (Figure 
<xref rid="Fig2" ref-type="fig">2</xref>
B).
<fig id="Fig3">
<label>Figure 3</label>
<caption>
<p>
<bold>4-mer distribution analysis of raw metagenomic sequences of serially diluted gut microbiota. A,</bold>
4-mer abundance distribution (left panel) and individual frequency (right panel) of metagenomic sequences from colour coded dilution series metagenomics of gut microbiota from donor #1 (upper panel) and #2 (lower panel).
<bold>B,</bold>
Bar plot visualizes the normalized Shannon Entropy of 4-mer distribution for undiluted and 10-, 100-, 1000- and 10.000-fold diluted gut microbiota metagenomics from donor #1 (blue) and #2 (red).
<bold>C,</bold>
Scatter plots depict the correlation between 4-mer distributions of metagenomic sequences from undiluted gut microbiota (y-axis) and 4-mer distributions of metagenomic sequences from 10-, 100-, 1000- and 10.000-fold diluted gut microbiota (x-axis) for donor #1 (upper panel) and #2 (lower panel).</p>
</caption>
<graphic xlink:href="12864_2015_1406_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p>These observations demonstrate that metagenomic quality, as defined by the capacity to precisely and robustly define gene distributions of microbiota, can be predicted by a k-mer distribution analysis of metagenomic raw sequences. It is however not clear if the skewed k-mer distribution observed for the highest sample dilutions (corresponding to low quality metagenomes) is due to aberrant bacterial gene sequences, as observed by correlative analysis of mapped reads (Figure 
<xref rid="Fig2" ref-type="fig">2</xref>
), or due to concomitant non-mappable sequences similar to but distinct from the barcode-cassette sequences discussed above. We therefore filtered raw metagenome sequences to only contain mappable sequences. 4-mer analysis revealed an almost equal distribution of 4-mers for all dilution series metagenomes (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S3A) resulting in very similar Shannon-Entropy for 4-mer distributions of all samples (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S3B). Equally, k-mer frequencies correlated perfectly between dilution series samples from the same donor (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S3C). The predictive features of the k-mer analysis are therefore relying on a secondary but concomitant degradation of sequence quality and distribution.</p>
</sec>
<sec id="Sec16">
<title>K-mer distribution predicts metagenomic sequence mapping to a reference gene catalogue</title>
<p>Our data demonstrate that k-mer analysis is primarily identifying the presence of aberrant sequences, such as contaminations linked to poor metagenome library assembly resulting from limited quantity of genomic DNA. Because sequence contaminations are unlikely to map to known bacterial genes, we speculated that skewed k-mer distributions could predict the frequency of raw sequence reads mapping to the reference gene catalogue. Of note, raw sequences in this context refer to entirely unmanipulated NGS datasets. This approach was chosen to render the methodology broadly applicable. Indeed, we were able to show a clear positive association between 4-mer distribution quantified as Shannon-Entropy and the frequency of mapped reads for dilution series metagenomes of donor #1 and #2 (r = 0.88,
<italic>P</italic>
= 0.0009 - Figure 
<xref rid="Fig4" ref-type="fig">4</xref>
A). Of note, the three most concentrated dilution series samples for both donor #1 and #2 had very similar 4-mer distributions and thus similar gene mapping frequency, whereas the more diluted samples suffered a pronounced drop in the uniformity of their 4-mer distribution with an associated drop in gene mapping efficiency. Applying this analytical approach to a set of 52 metagenomes of 28 human gut microbiota (some gut microbiota were analyzed up to three times with different initial sample size input) showed that our observation was generally applicable, and that 4-mer analysis predicted gene mapping efficiencies below approximately 20% (r = 0.34,
<italic>P</italic>
= 0.0141 - Figure 
<xref rid="Fig4" ref-type="fig">4</xref>
B). Of note, the rate of mapping was based on unfiltered raw sequences and therefore lower than previously reported [
<xref ref-type="bibr" rid="CR32">32</xref>
]. We observed that low mapping efficiency was strongly associated with limiting sample material (less than 10
<sup>10</sup>
bacteria per sample – Figure 
<xref rid="Fig4" ref-type="fig">4</xref>
B). Low (<10
<sup>10</sup>
bacteria) and high (>10
<sup>10</sup>
bacteria) quantity samples differed significantly with regards to the quantity of DNA available for the ligation step of metagenomic library construction (P = 0.0004; median values and 25%-75% ranges are 1.0 μg [1.0;1.0] and 0.7 μg [0.6;1.0], respectively). The quantities were conform with what was observed for the dilution series samples (cf. Table 
<xref rid="Tab1" ref-type="table">1</xref>
). Above a mapping efficiency of 20% the normalized Shannon Entropy reaches a plateau despite variation in mapping efficiency. This is likely to be a consequence of the relatively large inherent variation in gene distributions between individuals, which is more or less compatible with the known but still incomplete gene reference catalog [
<xref ref-type="bibr" rid="CR4">4</xref>
]. The constant increase in gene coverage provided by reference catalogues should eventually remove variations of gene mapping between samples.
<fig id="Fig4">
<label>Figure 4</label>
<caption>
<p>
<bold>4-mer distribution of microbiota metagenomes correlates with gene mapping efficiency to a reference gene catalogue. A</bold>
, Line graphs depict the frequency of gene mapping to a reference gene catalogue as a function of the normalized Shannon Entropy of 4-mer distributions for undiluted and 10-, 100-, 1000- and 10.000-fold diluted gut microbiota metagenomics from donor #1 (blue) and #2 (red).
<bold>B</bold>
, Scatter plot illustrates the association between normalized Shannon Entropy of 4-mer distributions and the frequency of gene mapping to a reference gene catalogue for 52 gut microbiota metagenomic profiles stratified according to small (red dots, <10
<sup>10</sup>
bacteria) and large (black dots, >10
<sup>10</sup>
bacteria) sample size. Spearman rank correlation statistics are indicated.</p>
</caption>
<graphic xlink:href="12864_2015_1406_Fig4_HTML" id="MO4"></graphic>
</fig>
</p>
</sec>
</sec>
<sec id="Sec17" sec-type="conclusions">
<title>Conclusion</title>
<p>The metagenomic protocol employed in the present study enabled analysis of samples containing more than 10
<sup>8</sup>
bacteria (1000-fold dilution). This lower limit fits most live habitat derived microbiota, whereas e.g. analysis of dental plaques from skeletons [
<xref ref-type="bibr" rid="CR8">8</xref>
] or other low density microbiota habitats, may be inherently biased in gene and/or species distribution due to limiting sample size. Our study suggests that for these studies it is important to validate the employed metagenomic protocol (e.g. by analyzing a serial dilution of a known quantity of commensals) as described here. Of note, the present study monitors the gene distribution of microbiota. It is likely that reducing the zoom from gene to a given phylogenetic level would equilibrate a large amount of the variance observed at the gene distribution level of low quality metagenomic datasets.</p>
<p>Our study demonstrates that a k-mer distribution analysis of metagenomic raw sequence reads identifies metagenomes of low quality and predicts low gene mapping efficiency. Low quality metagenomes were defined as metagenomes for which the gene distribution was considerably different from a reference sample. In the present study this was modelled by concentrated versus dilute samples of two stool samples. Metagenome quality was lowered by a significant reduction of sample size. It remains to be validated if the technology would also apply to metagenomes suffering from e.g. technical biases or contaminations.</p>
<p>We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment of raw NGS data prior to filtering and gene mapping analysis. It would allow a qualified decision as to whether 1) obtained metagenomic dataset should be further analyzed (filtering, gene mapping etc.), 2) if more sequence reads should be acquired to surpass a predetermined threshold of mapped reads or 3) sample should be discarded or reprocessed to improve metagenomic quality. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, higher quality analysis including measurable quality metrics and a significant cost reduction. Finally, increased quality would have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.</p>
</sec>
</body>
<back>
<app-group>
<app id="App1">
<sec id="Sec18">
<title>Additional file</title>
<p>
<media position="anchor" xlink:href="12864_2015_1406_MOESM1_ESM.pdf" id="MOESM1">
<label>Additional file 1: Table S1.</label>
<caption>
<p>All bacterial genomes can be obtained from NCBI (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/taxonomy">http://www.ncbi.nlm.nih.gov/taxonomy</ext-link>
).
<bold>Figure S1.</bold>
4-mer distribution analysis of 26 bacterial genomes and 52 metagenomic sequences of gut microbiota from low and high bacterial content samples.
<bold>Figure S2.</bold>
4-mer distribution analysis of barcode-cassette filtered metagenomic sequences of serially diluted gut microbiota.
<bold>Figure S3.</bold>
4-mer distribution analysis of gene mapped metagenomic sequences of serially diluted gut microbiota.</p>
</caption>
</media>
</p>
</sec>
</app>
</app-group>
<fn-group>
<fn>
<p>
<bold>Competing interests</bold>
</p>
<p>The authors declare that they have no competing interests.</p>
</fn>
<fn>
<p>
<bold>Authors’ contributions</bold>
</p>
<p>Conceived and designed the experiments: JMB and ML. Performed the experiments: CJ, JF, CF, DG, SK, FL and ML. Performed metagenomic analysis and gene mapping: NP, SK and ML. Conceived and designed k-mer analysis: FPO, JMB and ML. Performed k-mer analysis: FPO, JMB and ML. Wrote the manuscript: ML. Critical revision of the manuscript: FPO, JMB, CJ, JD, DE and GG. All authors read and approved the final manuscript.</p>
</fn>
</fn-group>
<ack>
<title>Acknowledgement</title>
<p>The authors acknowledge the funding agencies and the volunteers providing samples for the study. The study was funded by INSERM, the University Pierre et Marie Curie ËMERGENCE” program, Fondation pour l’Aide a la Recherche sur la Sclerose En Plaques (ARSEP), ARTHRITIS Fondation COURTIN and Agence nationale de la recherché (ANR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arumugam</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Pelletier</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Le Paslier</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Yamada</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Mende</surname>
<given-names>DR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Enterotypes of the human gut microbiome</article-title>
<source>Nature</source>
<year>2011</year>
<volume>473</volume>
<issue>7346</issue>
<fpage>174</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="doi">10.1038/nature09944</pub-id>
<pub-id pub-id-type="pmid">21508958</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cotillard</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kennedy</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>LC</given-names>
</name>
<name>
<surname>Prifti</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Pons</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Le Chatelier</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Dietary intervention impact on gut microbial gene richness</article-title>
<source>Nature</source>
<year>2013</year>
<volume>500</volume>
<issue>7464</issue>
<fpage>585</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="doi">10.1038/nature12480</pub-id>
<pub-id pub-id-type="pmid">23985875</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le Chatelier</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Prifti</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Hildebrand</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Falony</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Richness of human gut microbiome correlates with metabolic markers</article-title>
<source>Nature</source>
<year>2013</year>
<volume>500</volume>
<issue>7464</issue>
<fpage>541</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1038/nature12506</pub-id>
<pub-id pub-id-type="pmid">23985870</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Arumugam</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Burgdorf</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Manichanh</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A human gut microbial gene catalogue established by metagenomic sequencing</article-title>
<source>Nature</source>
<year>2010</year>
<volume>464</volume>
<issue>7285</issue>
<fpage>59</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1038/nature08821</pub-id>
<pub-id pub-id-type="pmid">20203603</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yatsunenko</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rey</surname>
<given-names>FE</given-names>
</name>
<name>
<surname>Manary</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Trehan</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Dominguez-Bello</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Contreras</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Human gut microbiome viewed across age and geography</article-title>
<source>Nature</source>
<year>2012</year>
<volume>486</volume>
<issue>7402</issue>
<fpage>222</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">22699611</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kamada</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Seo</surname>
<given-names>SU</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>GY</given-names>
</name>
<name>
<surname>Nunez</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Role of the gut microbiota in immunity and inflammatory disease</article-title>
<source>Nature reviews</source>
<year>2013</year>
<volume>13</volume>
<issue>5</issue>
<fpage>321</fpage>
<lpage>35</lpage>
<pub-id pub-id-type="pmid">23618829</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Schloss</surname>
<given-names>PD</given-names>
</name>
</person-group>
<article-title>Dynamics and associations of microbial community types across the human body</article-title>
<source>Nature</source>
<year>2014</year>
<volume>509</volume>
<issue>7500</issue>
<fpage>357</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1038/nature13178</pub-id>
<pub-id pub-id-type="pmid">24739969</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adler</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Dobney</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Weyrich</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>Kaidonis</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Walker</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>Haak</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Sequencing ancient calcified dental plaque shows changes in oral microbiota with dietary shifts of the Neolithic and Industrial revolutions</article-title>
<source>Nat Genet</source>
<year>2013</year>
<volume>45</volume>
<issue>4</issue>
<fpage>450</fpage>
<lpage>455</lpage>
<pub-id pub-id-type="doi">10.1038/ng.2536</pub-id>
<pub-id pub-id-type="pmid">23416520</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Biesbroek</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sanders</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Roeselers</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Caspers</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Trzcinski</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Deep sequencing analyses of low density microbial communities: working at the boundary of accurate microbiota detection</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<issue>3</issue>
<fpage>e32942</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0032942</pub-id>
<pub-id pub-id-type="pmid">22412957</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schroder</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Conway</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Zobel</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Reference-free validation of short read data</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<issue>9</issue>
<fpage>e12681</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0012681</pub-id>
<pub-id pub-id-type="pmid">20877643</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>XV</given-names>
</name>
<name>
<surname>Blades</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sultana</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Parmigiani</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Estimation of sequencing error rates in short reads</article-title>
<source>BMC Bioinformatics</source>
<year>2012</year>
<volume>13</volume>
<fpage>185</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-13-185</pub-id>
<pub-id pub-id-type="pmid">22846331</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keegan</surname>
<given-names>KP</given-names>
</name>
<name>
<surname>Trimble</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Wilkening</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Harrison</surname>
<given-names>T</given-names>
</name>
<name>
<surname>D’Souza</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE</article-title>
<source>PLoS Comput Biol</source>
<year>2012</year>
<volume>8</volume>
<issue>6</issue>
<fpage>e1002541</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1002541</pub-id>
<pub-id pub-id-type="pmid">22685393</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leggett</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Ramirez-Gonzalez</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Clavijo</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Waite</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Davey</surname>
<given-names>RP</given-names>
</name>
</person-group>
<article-title>Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics</article-title>
<source>Front Genet</source>
<year>2013</year>
<volume>4</volume>
<fpage>288</fpage>
<pub-id pub-id-type="doi">10.3389/fgene.2013.00288</pub-id>
<pub-id pub-id-type="pmid">24381581</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Simpson</surname>
<given-names>JT</given-names>
</name>
</person-group>
<article-title>Exploring genome characteristics and sequence quality without a reference</article-title>
<source>Bioinformatics</source>
<year>2014</year>
<volume>30</volume>
<issue>9</issue>
<fpage>1228</fpage>
<lpage>35</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btu023</pub-id>
<pub-id pub-id-type="pmid">24443382</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Evolution of genome architecture</article-title>
<source>Int J Biochem Cell Biol</source>
<year>2009</year>
<volume>41</volume>
<issue>2</issue>
<fpage>298</fpage>
<lpage>306</lpage>
<pub-id pub-id-type="doi">10.1016/j.biocel.2008.09.015</pub-id>
<pub-id pub-id-type="pmid">18929678</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCutcheon</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Moran</surname>
<given-names>NA</given-names>
</name>
</person-group>
<article-title>Extreme genome reduction in symbiotic bacteria</article-title>
<source>Nat Rev Microbiol</source>
<year>2011</year>
<volume>10</volume>
<issue>1</issue>
<fpage>13</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="pmid">22064560</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yatsunenko</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Cantarel</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Duncan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A core gut microbiome in obese and lean twins</article-title>
<source>Nature</source>
<year>2009</year>
<volume>457</volume>
<issue>7228</issue>
<fpage>480</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.1038/nature07540</pub-id>
<pub-id pub-id-type="pmid">19043404</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Disz</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pusch</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Vonstein</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Real time metagenomics: using k-mers to annotate metagenomes</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<issue>24</issue>
<fpage>3316</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts599</pub-id>
<pub-id pub-id-type="pmid">23047562</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>MUSCLE: multiple sequence alignment with high accuracy and high throughput</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<issue>5</issue>
<fpage>1792</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh340</pub-id>
<pub-id pub-id-type="pmid">15034147</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Williams</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Trimble</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Shilts</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Ochman</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Rapid quantification of sequence repeats to resolve the size, structure and contents of bacterial genomes</article-title>
<source>BMC Genomics</source>
<year>2013</year>
<volume>14</volume>
<fpage>537</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-14-537</pub-id>
<pub-id pub-id-type="pmid">23924250</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Whole genome molecular phylogeny of large dsDNA viruses using composition vector method</article-title>
<source>BMC Evol Biol</source>
<year>2007</year>
<volume>7</volume>
<fpage>41</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2148-7-41</pub-id>
<pub-id pub-id-type="pmid">17359548</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shannon</surname>
<given-names>CE</given-names>
</name>
</person-group>
<article-title>A mathematical theory of communication</article-title>
<source>Bell System Technical Journal</source>
<year>1948</year>
<volume>27</volume>
<issue>4</issue>
<fpage>623–656</fpage>
<lpage>423</lpage>
<pub-id pub-id-type="doi">10.1002/j.1538-7305.1948.tb00917.x</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<label>23.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Juste</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kreil</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Beauvallet</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Guillot</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vaca</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Carapito</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Bacterial protein signals are associated with Crohn’s disease</article-title>
<source>Gut</source>
<year>2014</year>
<volume>63</volume>
<issue>10</issue>
<fpage>1566</fpage>
<lpage>77</lpage>
<pub-id pub-id-type="doi">10.1136/gutjnl-2012-303786</pub-id>
<pub-id pub-id-type="pmid">24436141</pub-id>
</element-citation>
</ref>
<ref id="CR24">
<label>24.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Godon</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Zumstein</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Dabert</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Habouzit</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Moletta</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Molecular microbial diversity of an anaerobic digestor as determined by small-subunit rDNA sequence analysis</article-title>
<source>Appl Environ Microbiol</source>
<year>1997</year>
<volume>63</volume>
<issue>7</issue>
<fpage>2802</fpage>
<lpage>13</lpage>
<pub-id pub-id-type="pmid">9212428</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mardis</surname>
<given-names>ER</given-names>
</name>
</person-group>
<article-title>The impact of next-generation sequencing technology on genetics</article-title>
<source>Trends Genet</source>
<year>2008</year>
<volume>24</volume>
<issue>3</issue>
<fpage>133</fpage>
<lpage>41</lpage>
<pub-id pub-id-type="doi">10.1016/j.tig.2007.12.007</pub-id>
<pub-id pub-id-type="pmid">18262675</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</article-title>
<source>Genome Biol</source>
<year>2009</year>
<volume>10</volume>
<issue>3</issue>
<fpage>R25</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2009-10-3-r25</pub-id>
<pub-id pub-id-type="pmid">19261174</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27.</label>
<mixed-citation publication-type="other">Pons N, Batto JM, Kennedy S, Almeida M, Boumezbeur F, Moumen B, et al. METEOR, a platform for quantitative metagenomic profiling of complex ecosystems.
<ext-link ext-link-type="uri" xlink:href="http://www.jobim2010.fr/sites/default/files/presentations/27Pons.pdf">http://www.jobim2010.fr/sites/default/files/presentations/27Pons.pdf</ext-link>
. In: Journées Ouvertes en Biologie, Informatique et Mathématiques. 2010</mixed-citation>
</ref>
<ref id="CR28">
<label>28.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dillies</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Rau</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Aubert</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hennequet-Antier</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jeanmougin</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Servant</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis</article-title>
<source>Brief Bioinform</source>
<year>2013</year>
<volume>14</volume>
<issue>6</issue>
<fpage>671</fpage>
<lpage>83</lpage>
<pub-id pub-id-type="doi">10.1093/bib/bbs046</pub-id>
<pub-id pub-id-type="pmid">22988256</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ward</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Hierarchical grouping to optimize an objective function</article-title>
<source>J Am Stat Assoc</source>
<year>1963</year>
<volume>58</volume>
<fpage>236</fpage>
<lpage>44</lpage>
<pub-id pub-id-type="doi">10.1080/01621459.1963.10500845</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Chin</surname>
<given-names>FY</given-names>
</name>
</person-group>
<article-title>Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<issue>2</issue>
<fpage>S5</fpage>
</element-citation>
</ref>
<ref id="CR31">
<label>31.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glenn</surname>
<given-names>TC</given-names>
</name>
</person-group>
<article-title>Field guide to next-generation DNA sequencers</article-title>
<source>Mol Ecol Resour</source>
<year>2011</year>
<volume>11</volume>
<issue>5</issue>
<fpage>759</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="doi">10.1111/j.1755-0998.2011.03024.x</pub-id>
<pub-id pub-id-type="pmid">21592312</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<label>32.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Sunagawa</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An integrated catalog of reference genes in the human gut microbiome</article-title>
<source>Nat Biotechnol</source>
<year>2014</year>
<volume>32</volume>
<issue>8</issue>
<fpage>834</fpage>
<lpage>41</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2942</pub-id>
<pub-id pub-id-type="pmid">24997786</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0002930 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0002930 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021