Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.

Identifieur interne : 001709 ( PubMed/Corpus ); précédent : 001708; suivant : 001710

A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.

Auteurs : Le Van Vinh ; Tran Van Lang ; Le Thanh Binh ; Tran Van Hoai

Source :

RBID : pubmed:25648210

Abstract

Metagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as "binning", is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification.

DOI: 10.1186/s13015-014-0030-4
PubMed: 25648210

Links to Exploration step

pubmed:25648210

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.</title>
<author>
<name sortKey="Vinh, Le Van" sort="Vinh, Le Van" uniqKey="Vinh L" first="Le Van" last="Vinh">Le Van Vinh</name>
<affiliation>
<nlm:affiliation>Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Lang, Tran Van" sort="Lang, Tran Van" uniqKey="Lang T" first="Tran Van" last="Lang">Tran Van Lang</name>
<affiliation>
<nlm:affiliation>Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology (VAST), 01 Mac Dinh Chi, Q1, Ho Chi Minh City, Vietnam ; Faculty of Information Technology, Lac Hong University, 10 Huynh Van Nghe, Bien Hoa, Dong Nai Vietnam.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Binh, Le Thanh" sort="Binh, Le Thanh" uniqKey="Binh L" first="Le Thanh" last="Binh">Le Thanh Binh</name>
<affiliation>
<nlm:affiliation>Institute of Biotechnology, Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Ha Noi Vietnam.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Hoai, Tran Van" sort="Hoai, Tran Van" uniqKey="Hoai T" first="Tran Van" last="Hoai">Tran Van Hoai</name>
<affiliation>
<nlm:affiliation>Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2015">2015</date>
<idno type="RBID">pubmed:25648210</idno>
<idno type="pmid">25648210</idno>
<idno type="doi">10.1186/s13015-014-0030-4</idno>
<idno type="wicri:Area/PubMed/Corpus">001709</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001709</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.</title>
<author>
<name sortKey="Vinh, Le Van" sort="Vinh, Le Van" uniqKey="Vinh L" first="Le Van" last="Vinh">Le Van Vinh</name>
<affiliation>
<nlm:affiliation>Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Lang, Tran Van" sort="Lang, Tran Van" uniqKey="Lang T" first="Tran Van" last="Lang">Tran Van Lang</name>
<affiliation>
<nlm:affiliation>Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology (VAST), 01 Mac Dinh Chi, Q1, Ho Chi Minh City, Vietnam ; Faculty of Information Technology, Lac Hong University, 10 Huynh Van Nghe, Bien Hoa, Dong Nai Vietnam.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Binh, Le Thanh" sort="Binh, Le Thanh" uniqKey="Binh L" first="Le Thanh" last="Binh">Le Thanh Binh</name>
<affiliation>
<nlm:affiliation>Institute of Biotechnology, Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Ha Noi Vietnam.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Hoai, Tran Van" sort="Hoai, Tran Van" uniqKey="Hoai T" first="Tran Van" last="Hoai">Tran Van Hoai</name>
<affiliation>
<nlm:affiliation>Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint>
<date when="2015" type="published">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Metagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as "binning", is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">25648210</PMID>
<DateCompleted>
<Year>2015</Year>
<Month>02</Month>
<Day>04</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Print">1748-7188</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>10</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2015</Year>
</PubDate>
</JournalIssue>
<Title>Algorithms for molecular biology : AMB</Title>
<ISOAbbreviation>Algorithms Mol Biol</ISOAbbreviation>
</Journal>
<ArticleTitle>A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.</ArticleTitle>
<Pagination>
<MedlinePgn>2</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/s13015-014-0030-4</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Metagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as "binning", is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">This paper presents an unsupervised algorithm, called BiMeta, for binning of reads from different species in a metagenomic dataset. The algorithm consists of two phases. In the first phase of the algorithm, reads are grouped into groups based on overlap information between the reads. The second phase merges the groups by using an observation on l-mer frequency distribution of sets of non-overlapping reads. The experimental results on simulated and real datasets showed that BiMeta outperforms three state-of-the-art binning algorithms for both short and long reads (≥700 b p) datasets.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">This paper developed a novel and efficient algorithm for binning of metagenomic reads, which does not require any reference database. The software implementing the algorithm and all test datasets mentioned in this paper can be downloaded at http://it.hcmute.edu.vn/bioinfo/bimeta/index.htm.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Vinh</LastName>
<ForeName>Le Van</ForeName>
<Initials>le V</Initials>
<AffiliationInfo>
<Affiliation>Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Lang</LastName>
<ForeName>Tran Van</ForeName>
<Initials>TV</Initials>
<AffiliationInfo>
<Affiliation>Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology (VAST), 01 Mac Dinh Chi, Q1, Ho Chi Minh City, Vietnam ; Faculty of Information Technology, Lac Hong University, 10 Huynh Van Nghe, Bien Hoa, Dong Nai Vietnam.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Binh</LastName>
<ForeName>Le Thanh</ForeName>
<Initials>le T</Initials>
<AffiliationInfo>
<Affiliation>Institute of Biotechnology, Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Ha Noi Vietnam.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Hoai</LastName>
<ForeName>Tran Van</ForeName>
<Initials>TV</Initials>
<AffiliationInfo>
<Affiliation>Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2015</Year>
<Month>01</Month>
<Day>16</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Algorithms Mol Biol</MedlineTA>
<NlmUniqueID>101265088</NlmUniqueID>
<ISSNLinking>1748-7188</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Algorithm</Keyword>
<Keyword MajorTopicYN="N">Binning</Keyword>
<Keyword MajorTopicYN="N">Metagenomics</Keyword>
<Keyword MajorTopicYN="N">Next-generation sequencing</Keyword>
<Keyword MajorTopicYN="N">l-mers frequency</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2014</Year>
<Month>07</Month>
<Day>10</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2014</Year>
<Month>10</Month>
<Day>20</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2015</Year>
<Month>2</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2015</Year>
<Month>2</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2015</Year>
<Month>2</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">25648210</ArticleId>
<ArticleId IdType="doi">10.1186/s13015-014-0030-4</ArticleId>
<ArticleId IdType="pii">30</ArticleId>
<ArticleId IdType="pmc">PMC4304631</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>PLoS One. 2008 Oct 08;3(10):e3373</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18841204</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2008 Dec 17;9:546</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19091119</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2011 Mar;18(3):523-34</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21385052</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2009 Oct 02;10:316</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19799776</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2012 Sep 15;28(18):i356-i362</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22962452</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2008 Oct 13;9(10):R151</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18851752</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2009 Feb 11;10:56</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19210774</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2010 Nov 02;11:544</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21044341</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Appl Environ Microbiol. 2007 Jan;73(1):278-88</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17071787</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Microbiol Rev. 1995 Mar;59(1):143-69</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7535888</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2004 Mar 4;428(6978):37-43</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14961025</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2012 Sep 26;7(1):27</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23009059</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2007 Mar;17(3):377-86</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17255551</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Biol. 2007 Mar;5(3):e82</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17355177</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2014;15 Suppl 1:S12</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24564377</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2010 Mar 4;464(7285):59-65</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20203603</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2012 Feb;19(2):241-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22300323</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):42-54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26355506</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2010 Feb 26;6(2):e1000667</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20195499</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2008 Oct;26(10):1135-45</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18846087</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2004 Apr 2;304(5667):66-74</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15001713</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Nov 1;27(21):2957-63</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21903629</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Jun 1;27(11):1489-95</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21493653</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Biomed Biotechnol. 2012;2012:251364</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22829749</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2008 Apr;36(7):2230-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18285365</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2009;10(10):R108</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19814784</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001709 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001709 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:25648210
   |texte=   A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:25648210" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021