Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.

Identifieur interne : 001F54 ( PubMed/Corpus ); précédent : 001F53; suivant : 001F55

Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.

Auteurs : Bin Yang ; Yu Peng ; Henry Chi-Ming Leung ; Siu-Ming Yiu ; Jing-Chi Chen ; Francis Yuk-Lun Chin

Source :

RBID : pubmed:20406503

English descriptors

Abstract

With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.

DOI: 10.1186/1471-2105-11-S2-S5
PubMed: 20406503

Links to Exploration step

pubmed:20406503

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.</title>
<author>
<name sortKey="Yang, Bin" sort="Yang, Bin" uniqKey="Yang B" first="Bin" last="Yang">Bin Yang</name>
<affiliation>
<nlm:affiliation>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096 PR China. byang@cs.hku.hk</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Peng, Yu" sort="Peng, Yu" uniqKey="Peng Y" first="Yu" last="Peng">Yu Peng</name>
</author>
<author>
<name sortKey="Leung, Henry Chi Ming" sort="Leung, Henry Chi Ming" uniqKey="Leung H" first="Henry Chi-Ming" last="Leung">Henry Chi-Ming Leung</name>
</author>
<author>
<name sortKey="Yiu, Siu Ming" sort="Yiu, Siu Ming" uniqKey="Yiu S" first="Siu-Ming" last="Yiu">Siu-Ming Yiu</name>
</author>
<author>
<name sortKey="Chen, Jing Chi" sort="Chen, Jing Chi" uniqKey="Chen J" first="Jing-Chi" last="Chen">Jing-Chi Chen</name>
</author>
<author>
<name sortKey="Chin, Francis Yuk Lun" sort="Chin, Francis Yuk Lun" uniqKey="Chin F" first="Francis Yuk-Lun" last="Chin">Francis Yuk-Lun Chin</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="RBID">pubmed:20406503</idno>
<idno type="pmid">20406503</idno>
<idno type="doi">10.1186/1471-2105-11-S2-S5</idno>
<idno type="wicri:Area/PubMed/Corpus">001F54</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F54</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.</title>
<author>
<name sortKey="Yang, Bin" sort="Yang, Bin" uniqKey="Yang B" first="Bin" last="Yang">Bin Yang</name>
<affiliation>
<nlm:affiliation>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096 PR China. byang@cs.hku.hk</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Peng, Yu" sort="Peng, Yu" uniqKey="Peng Y" first="Yu" last="Peng">Yu Peng</name>
</author>
<author>
<name sortKey="Leung, Henry Chi Ming" sort="Leung, Henry Chi Ming" uniqKey="Leung H" first="Henry Chi-Ming" last="Leung">Henry Chi-Ming Leung</name>
</author>
<author>
<name sortKey="Yiu, Siu Ming" sort="Yiu, Siu Ming" uniqKey="Yiu S" first="Siu-Ming" last="Yiu">Siu-Ming Yiu</name>
</author>
<author>
<name sortKey="Chen, Jing Chi" sort="Chen, Jing Chi" uniqKey="Chen J" first="Jing-Chi" last="Chen">Jing-Chi Chen</name>
</author>
<author>
<name sortKey="Chin, Francis Yuk Lun" sort="Chin, Francis Yuk Lun" uniqKey="Chin F" first="Francis Yuk-Lun" last="Chin">Francis Yuk-Lun Chin</name>
</author>
</analytic>
<series>
<title level="j">BMC bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Cluster Analysis</term>
<term>DNA (chemistry)</term>
<term>Data Mining (methods)</term>
<term>Databases, Genetic</term>
<term>Environmental Microbiology</term>
<term>Escherichia coli (genetics)</term>
<term>Genome, Bacterial (genetics)</term>
<term>Lactobacillus (genetics)</term>
<term>Metagenomics (methods)</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Escherichia coli</term>
<term>Genome, Bacterial</term>
<term>Lactobacillus</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Data Mining</term>
<term>Metagenomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Databases, Genetic</term>
<term>Environmental Microbiology</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">20406503</PMID>
<DateCompleted>
<Year>2010</Year>
<Month>08</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">1471-2105</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>11 Suppl 2</Volume>
<PubDate>
<Year>2010</Year>
<Month>Apr</Month>
<Day>16</Day>
</PubDate>
</JournalIssue>
<Title>BMC bioinformatics</Title>
<ISOAbbreviation>BMC Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.</ArticleTitle>
<Pagination>
<MedlinePgn>S5</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/1471-2105-11-S2-S5</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Yang</LastName>
<ForeName>Bin</ForeName>
<Initials>B</Initials>
<AffiliationInfo>
<Affiliation>State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096 PR China. byang@cs.hku.hk</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Peng</LastName>
<ForeName>Yu</ForeName>
<Initials>Y</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Leung</LastName>
<ForeName>Henry Chi-Ming</ForeName>
<Initials>HC</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Yiu</LastName>
<ForeName>Siu-Ming</ForeName>
<Initials>SM</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Chen</LastName>
<ForeName>Jing-Chi</ForeName>
<Initials>JC</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Chin</LastName>
<ForeName>Francis Yuk-Lun</ForeName>
<Initials>FY</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2010</Year>
<Month>04</Month>
<Day>16</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>BMC Bioinformatics</MedlineTA>
<NlmUniqueID>100965194</NlmUniqueID>
<ISSNLinking>1471-2105</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>9007-49-2</RegistryNumber>
<NameOfSubstance UI="D004247">DNA</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016000" MajorTopicYN="N">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004247" MajorTopicYN="N">DNA</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D057225" MajorTopicYN="N">Data Mining</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D030541" MajorTopicYN="N">Databases, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004783" MajorTopicYN="N">Environmental Microbiology</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004926" MajorTopicYN="N">Escherichia coli</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016680" MajorTopicYN="N">Genome, Bacterial</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D007778" MajorTopicYN="N">Lactobacillus</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D056186" MajorTopicYN="N">Metagenomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2010</Year>
<Month>4</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2010</Year>
<Month>5</Month>
<Day>6</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2010</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">20406503</ArticleId>
<ArticleId IdType="pii">1471-2105-11-S2-S5</ArticleId>
<ArticleId IdType="doi">10.1186/1471-2105-11-S2-S5</ArticleId>
<ArticleId IdType="pmc">PMC3165929</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Science. 2000 Mar 24;287(5461):2204-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10731134</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Dent Hyg. 2008 Oct;82 Suppl 3:4-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19275822</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2004 Mar 4;428(6978):37-43</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14961025</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2004 Apr 2;304(5667):66-74</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15001713</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Environ Microbiol. 2004 Sep;6(9):938-47</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15305919</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Appl Environ Microbiol. 1990 Jun;56(6):1919-25</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">2200342</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 1992 Mar 25;20(6):1363-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1313968</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12832-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7809130</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Genet. 1995 Jul;11(7):283-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7482779</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Bacteriol. 1997 Jun;179(12):3899-913</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9190805</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 1997 Sep 1;25(17):3389-402</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9254694</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2004 Oct 26;5:163</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15507136</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2005 Jan 1;33(Database issue):D294-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15608200</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2005 Apr 22;308(5721):554-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15845853</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2006 Oct;24(10):1263-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16998472</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Appl Environ Microbiol. 2007 Jan;73(1):278-88</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17071787</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2007 Jan;4(1):63-72</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17179938</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2007 Mar;17(3):377-86</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17255551</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2007 Jun;4(6):495-500</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17468765</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2008 Mar 20;452(7185):340-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18311127</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2008;3(8):e3064</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18725973</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13580-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18757757</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2009;10:56</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19210774</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2001 Aug;11(8):1404-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11483581</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F54 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001F54 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:20406503
   |texte=   Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:20406503" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021