Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Estimating the total genome length of a metagenomic sample using k-mers.

Identifieur interne : 000577 ( PubMed/Corpus ); précédent : 000576; suivant : 000578

Estimating the total genome length of a metagenomic sample using k-mers.

Auteurs : Kui Hua ; Xuegong Zhang

Source :

RBID : pubmed:30967110

English descriptors

Abstract

Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.

DOI: 10.1186/s12864-019-5467-x
PubMed: 30967110

Links to Exploration step

pubmed:30967110

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Estimating the total genome length of a metagenomic sample using k-mers.</title>
<author>
<name sortKey="Hua, Kui" sort="Hua, Kui" uniqKey="Hua K" first="Kui" last="Hua">Kui Hua</name>
<affiliation>
<nlm:affiliation>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<affiliation>
<nlm:affiliation>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China. zhangxg@tsinghua.edu.cn.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:30967110</idno>
<idno type="pmid">30967110</idno>
<idno type="doi">10.1186/s12864-019-5467-x</idno>
<idno type="wicri:Area/PubMed/Corpus">000577</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000577</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Estimating the total genome length of a metagenomic sample using k-mers.</title>
<author>
<name sortKey="Hua, Kui" sort="Hua, Kui" uniqKey="Hua K" first="Kui" last="Hua">Kui Hua</name>
<affiliation>
<nlm:affiliation>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<affiliation>
<nlm:affiliation>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China. zhangxg@tsinghua.edu.cn.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Datasets as Topic</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Metagenomics (methods)</term>
<term>Microbiota (genetics)</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Microbiota</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Metagenomics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Datasets as Topic</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">30967110</PMID>
<DateCompleted>
<Year>2019</Year>
<Month>08</Month>
<Day>28</Day>
</DateCompleted>
<DateRevised>
<Year>2020</Year>
<Month>02</Month>
<Day>25</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">1471-2164</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>20</Volume>
<Issue>Suppl 2</Issue>
<PubDate>
<Year>2019</Year>
<Month>Apr</Month>
<Day>04</Day>
</PubDate>
</JournalIssue>
<Title>BMC genomics</Title>
<ISOAbbreviation>BMC Genomics</ISOAbbreviation>
</Journal>
<ArticleTitle>Estimating the total genome length of a metagenomic sample using k-mers.</ArticleTitle>
<Pagination>
<MedlinePgn>183</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/s12864-019-5467-x</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Hua</LastName>
<ForeName>Kui</ForeName>
<Initials>K</Initials>
<AffiliationInfo>
<Affiliation>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Department of Automation, Tsinghua University, Beijing, 100084, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Zhang</LastName>
<ForeName>Xuegong</ForeName>
<Initials>X</Initials>
<AffiliationInfo>
<Affiliation>MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China. zhangxg@tsinghua.edu.cn.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Department of Automation, Tsinghua University, Beijing, 100084, China. zhangxg@tsinghua.edu.cn.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>School of Life Sciences, Tsinghua University, Beijing, 100084, China. zhangxg@tsinghua.edu.cn.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2019</Year>
<Month>04</Month>
<Day>04</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>BMC Genomics</MedlineTA>
<NlmUniqueID>100965258</NlmUniqueID>
<ISSNLinking>1471-2164</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D066264" MajorTopicYN="N">Datasets as Topic</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D054892" MajorTopicYN="Y">Metagenome</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D056186" MajorTopicYN="N">Metagenomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D064307" MajorTopicYN="N">Microbiota</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Distinct k-mers</Keyword>
<Keyword MajorTopicYN="N">Genome length</Keyword>
<Keyword MajorTopicYN="N">Metagenomics</Keyword>
<Keyword MajorTopicYN="N">Sequencing coverage</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2019</Year>
<Month>4</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2019</Year>
<Month>4</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>8</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">30967110</ArticleId>
<ArticleId IdType="doi">10.1186/s12864-019-5467-x</ArticleId>
<ArticleId IdType="pii">10.1186/s12864-019-5467-x</ArticleId>
<ArticleId IdType="pmc">PMC6456951</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17130148</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2007 Jun;4(6):495-500</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17468765</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2007 Oct 18;449(7164):804-10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17943116</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2010 Feb 1;26(3):295-301</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20008478</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2012 Jun 8;336(6086):1251-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22674326</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2012 Jun 10;9(8):811-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22688413</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Math Biol. 2013 Nov;67(5):1141-61</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22965653</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2012 Oct 4;490(7418):55-60</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23023125</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2013 Apr;10(4):325-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23435259</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Environ Microbiol Rep. 2012 Jun;4(3):335-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23760797</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Mar 1;30(5):629-35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24123672</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>ISME J. 2014 Nov;8(11):2349-51</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24824669</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Nov 15;30(22):3159-65</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25107873</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2014 Oct 2;514(7520):59-64</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25279917</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2015 Jan 16;16:10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25592313</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2015 May 26;43(10):e69</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25765641</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2015 Oct;12(10):902-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26418763</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2016 Apr 29;352(6285):560-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27126039</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2016 Apr 29;352(6285):565-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27126040</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cell Syst. 2018 Aug 22;7(2):192-200.e3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">30056005</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genomics. 1988 Apr;2(3):231-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3294162</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000577 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000577 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:30967110
   |texte=   Estimating the total genome length of a metagenomic sample using k-mers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:30967110" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021