Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.
Identifieur interne : 001F54 ( PubMed/Curation ); précédent : 001F53; suivant : 001F55Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.
Auteurs : Bin Yang [République populaire de Chine] ; Yu Peng ; Henry Chi-Ming Leung ; Siu-Ming Yiu ; Jing-Chi Chen ; Francis Yuk-Lun ChinSource :
- BMC bioinformatics [ 1471-2105 ] ; 2010.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- chemical , chemistry : DNA.
- genetics : Escherichia coli, Genome, Bacterial, Lactobacillus.
- methods : Data Mining, Metagenomics, Sequence Analysis, DNA.
- Algorithms, Cluster Analysis, Databases, Genetic, Environmental Microbiology.
Abstract
With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.
DOI: 10.1186/1471-2105-11-S2-S5
PubMed: 20406503
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: Pour aller vers cette notice dans l'étape Curation :001F54
Links to Exploration step
pubmed:20406503Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.</title>
<author><name sortKey="Yang, Bin" sort="Yang, Bin" uniqKey="Yang B" first="Bin" last="Yang">Bin Yang</name>
<affiliation wicri:level="1"><nlm:affiliation>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096 PR China. byang@cs.hku.hk</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Peng, Yu" sort="Peng, Yu" uniqKey="Peng Y" first="Yu" last="Peng">Yu Peng</name>
</author>
<author><name sortKey="Leung, Henry Chi Ming" sort="Leung, Henry Chi Ming" uniqKey="Leung H" first="Henry Chi-Ming" last="Leung">Henry Chi-Ming Leung</name>
</author>
<author><name sortKey="Yiu, Siu Ming" sort="Yiu, Siu Ming" uniqKey="Yiu S" first="Siu-Ming" last="Yiu">Siu-Ming Yiu</name>
</author>
<author><name sortKey="Chen, Jing Chi" sort="Chen, Jing Chi" uniqKey="Chen J" first="Jing-Chi" last="Chen">Jing-Chi Chen</name>
</author>
<author><name sortKey="Chin, Francis Yuk Lun" sort="Chin, Francis Yuk Lun" uniqKey="Chin F" first="Francis Yuk-Lun" last="Chin">Francis Yuk-Lun Chin</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="RBID">pubmed:20406503</idno>
<idno type="pmid">20406503</idno>
<idno type="doi">10.1186/1471-2105-11-S2-S5</idno>
<idno type="wicri:Area/PubMed/Corpus">001F54</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F54</idno>
<idno type="wicri:Area/PubMed/Curation">001F54</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001F54</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.</title>
<author><name sortKey="Yang, Bin" sort="Yang, Bin" uniqKey="Yang B" first="Bin" last="Yang">Bin Yang</name>
<affiliation wicri:level="1"><nlm:affiliation>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096 PR China. byang@cs.hku.hk</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Peng, Yu" sort="Peng, Yu" uniqKey="Peng Y" first="Yu" last="Peng">Yu Peng</name>
</author>
<author><name sortKey="Leung, Henry Chi Ming" sort="Leung, Henry Chi Ming" uniqKey="Leung H" first="Henry Chi-Ming" last="Leung">Henry Chi-Ming Leung</name>
</author>
<author><name sortKey="Yiu, Siu Ming" sort="Yiu, Siu Ming" uniqKey="Yiu S" first="Siu-Ming" last="Yiu">Siu-Ming Yiu</name>
</author>
<author><name sortKey="Chen, Jing Chi" sort="Chen, Jing Chi" uniqKey="Chen J" first="Jing-Chi" last="Chen">Jing-Chi Chen</name>
</author>
<author><name sortKey="Chin, Francis Yuk Lun" sort="Chin, Francis Yuk Lun" uniqKey="Chin F" first="Francis Yuk-Lun" last="Chin">Francis Yuk-Lun Chin</name>
</author>
</analytic>
<series><title level="j">BMC bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Cluster Analysis</term>
<term>DNA (chemistry)</term>
<term>Data Mining (methods)</term>
<term>Databases, Genetic</term>
<term>Environmental Microbiology</term>
<term>Escherichia coli (genetics)</term>
<term>Genome, Bacterial (genetics)</term>
<term>Lactobacillus (genetics)</term>
<term>Metagenomics (methods)</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>ADN ()</term>
<term>Algorithmes</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données génétiques</term>
<term>Escherichia coli (génétique)</term>
<term>Fouille de données ()</term>
<term>Génome bactérien (génétique)</term>
<term>Lactobacillus (génétique)</term>
<term>Microbiologie de l'environnement</term>
<term>Métagénomique ()</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en"><term>DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Escherichia coli</term>
<term>Genome, Bacterial</term>
<term>Lactobacillus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Escherichia coli</term>
<term>Génome bactérien</term>
<term>Lactobacillus</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Data Mining</term>
<term>Metagenomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Databases, Genetic</term>
<term>Environmental Microbiology</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>ADN</term>
<term>Algorithmes</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données génétiques</term>
<term>Fouille de données</term>
<term>Microbiologie de l'environnement</term>
<term>Métagénomique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" Owner="NLM"><PMID Version="1">20406503</PMID>
<DateCompleted><Year>2010</Year>
<Month>08</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic"><Journal><ISSN IssnType="Electronic">1471-2105</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>11 Suppl 2</Volume>
<PubDate><Year>2010</Year>
<Month>Apr</Month>
<Day>16</Day>
</PubDate>
</JournalIssue>
<Title>BMC bioinformatics</Title>
<ISOAbbreviation>BMC Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.</ArticleTitle>
<Pagination><MedlinePgn>S5</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/1471-2105-11-S2-S5</ELocationID>
<Abstract><AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Yang</LastName>
<ForeName>Bin</ForeName>
<Initials>B</Initials>
<AffiliationInfo><Affiliation>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096 PR China. byang@cs.hku.hk</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Peng</LastName>
<ForeName>Yu</ForeName>
<Initials>Y</Initials>
</Author>
<Author ValidYN="Y"><LastName>Leung</LastName>
<ForeName>Henry Chi-Ming</ForeName>
<Initials>HC</Initials>
</Author>
<Author ValidYN="Y"><LastName>Yiu</LastName>
<ForeName>Siu-Ming</ForeName>
<Initials>SM</Initials>
</Author>
<Author ValidYN="Y"><LastName>Chen</LastName>
<ForeName>Jing-Chi</ForeName>
<Initials>JC</Initials>
</Author>
<Author ValidYN="Y"><LastName>Chin</LastName>
<ForeName>Francis Yuk-Lun</ForeName>
<Initials>FY</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2010</Year>
<Month>04</Month>
<Day>16</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>BMC Bioinformatics</MedlineTA>
<NlmUniqueID>100965194</NlmUniqueID>
<ISSNLinking>1471-2105</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList><Chemical><RegistryNumber>9007-49-2</RegistryNumber>
<NameOfSubstance UI="D004247">DNA</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016000" MajorTopicYN="N">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D004247" MajorTopicYN="N">DNA</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D057225" MajorTopicYN="N">Data Mining</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D030541" MajorTopicYN="N">Databases, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D004783" MajorTopicYN="N">Environmental Microbiology</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D004926" MajorTopicYN="N">Escherichia coli</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016680" MajorTopicYN="N">Genome, Bacterial</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D007778" MajorTopicYN="N">Lactobacillus</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D056186" MajorTopicYN="N">Metagenomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2010</Year>
<Month>4</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2010</Year>
<Month>5</Month>
<Day>6</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2010</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">20406503</ArticleId>
<ArticleId IdType="pii">1471-2105-11-S2-S5</ArticleId>
<ArticleId IdType="doi">10.1186/1471-2105-11-S2-S5</ArticleId>
<ArticleId IdType="pmc">PMC3165929</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Science. 2000 Mar 24;287(5461):2204-15</Citation>
<ArticleIdList><ArticleId IdType="pubmed">10731134</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>J Dent Hyg. 2008 Oct;82 Suppl 3:4-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">19275822</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nature. 2004 Mar 4;428(6978):37-43</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14961025</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Science. 2004 Apr 2;304(5667):66-74</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15001713</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Environ Microbiol. 2004 Sep;6(9):938-47</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15305919</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Appl Environ Microbiol. 1990 Jun;56(6):1919-25</Citation>
<ArticleIdList><ArticleId IdType="pubmed">2200342</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nucleic Acids Res. 1992 Mar 25;20(6):1363-70</Citation>
<ArticleIdList><ArticleId IdType="pubmed">1313968</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12832-6</Citation>
<ArticleIdList><ArticleId IdType="pubmed">7809130</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Trends Genet. 1995 Jul;11(7):283-90</Citation>
<ArticleIdList><ArticleId IdType="pubmed">7482779</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>J Bacteriol. 1997 Jun;179(12):3899-913</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9190805</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nucleic Acids Res. 1997 Sep 1;25(17):3389-402</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9254694</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Bioinformatics. 2004 Oct 26;5:163</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15507136</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nucleic Acids Res. 2005 Jan 1;33(Database issue):D294-6</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15608200</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Science. 2005 Apr 22;308(5721):554-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15845853</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nat Biotechnol. 2006 Oct;24(10):1263-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16998472</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Appl Environ Microbiol. 2007 Jan;73(1):278-88</Citation>
<ArticleIdList><ArticleId IdType="pubmed">17071787</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nat Methods. 2007 Jan;4(1):63-72</Citation>
<ArticleIdList><ArticleId IdType="pubmed">17179938</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2007 Mar;17(3):377-86</Citation>
<ArticleIdList><ArticleId IdType="pubmed">17255551</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nat Methods. 2007 Jun;4(6):495-500</Citation>
<ArticleIdList><ArticleId IdType="pubmed">17468765</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nature. 2008 Mar 20;452(7185):340-3</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18311127</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>PLoS One. 2008;3(8):e3064</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18725973</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13580-5</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18757757</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Bioinformatics. 2009;10:56</Citation>
<ArticleIdList><ArticleId IdType="pubmed">19210774</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2001 Aug;11(8):1404-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11483581</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F54 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 001F54 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Curation |type= RBID |clé= pubmed:20406503 |texte= Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i -Sk "pubmed:20406503" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |