MersV1, Ncbi, Merge, bibRecord, 001656

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Identifieur interne : 001656 ( Ncbi/Merge ); précédent : 001655; suivant : 001657

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Auteurs : Abdullah-Al Mamun [États-Unis] ; Soumitra Pal [États-Unis] ; Sanguthevar Rajasekaran [États-Unis]

Source :

Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2016.

RBID : pubmed:27283950

Descripteurs français

KwdFr :
- Algorithmes, Alignement de séquences, Analyse de séquence d'ADN, Biologie informatique (), Génome, Humains, Logiciel, Séquence nucléotidique.
MESH :
- Algorithmes, Alignement de séquences, Analyse de séquence d'ADN, Biologie informatique, Génome, Humains, Logiciel, Séquence nucléotidique.

English descriptors

KwdEn :
- Algorithms, Base Sequence, Computational Biology (methods), Genome, Humans, Sequence Alignment, Sequence Analysis, DNA, Software.
MESH :
- methods : Computational Biology.
- Algorithms, Base Sequence, Genome, Humans, Sequence Alignment, Sequence Analysis, DNA, Software.

Abstract

A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.

DOI: 10.1093/bioinformatics/btw345
PubMed: 27283950

Links toward previous steps (curation, corpus...)

to stream PubMed, to step Corpus: 001088
to stream PubMed, to step Curation: 001088
to stream PubMed, to step Checkpoint: 001037

Links to Exploration step

pubmed:27283950

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">KCMBT: a k-mer Counter based on Multiple Burst Trees.</title>
<author><name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:27283950</idno>
<idno type="pmid">27283950</idno>
<idno type="doi">10.1093/bioinformatics/btw345</idno>
<idno type="wicri:Area/PubMed/Corpus">001088</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001088</idno>
<idno type="wicri:Area/PubMed/Curation">001088</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001088</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001037</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001037</idno>
<idno type="wicri:Area/Ncbi/Merge">001656</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">KCMBT: a k-mer Counter based on Multiple Burst Trees.</title>
<author><name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Base Sequence</term>
<term>Computational Biology (methods)</term>
<term>Genome</term>
<term>Humans</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence d'ADN</term>
<term>Biologie informatique ()</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquence nucléotidique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Base Sequence</term>
<term>Genome</term>
<term>Humans</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence d'ADN</term>
<term>Biologie informatique</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquence nucléotidique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM"><PMID Version="1">27283950</PMID>
<DateCompleted><Year>2017</Year>
<Month>07</Month>
<Day>31</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>32</Volume>
<Issue>18</Issue>
<PubDate><Year>2016</Year>
<Month>09</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>KCMBT: a k-mer Counter based on Multiple Burst Trees.</ArticleTitle>
<Pagination><MedlinePgn>2783-90</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btw345</ELocationID>
<Abstract><AbstractText Label="MOTIVATION">A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.</AbstractText>
<AbstractText Label="RESULTS">We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION">KCMBT is freely available on GitHub: (https://github.com/abdullah009/kcmbt_mt).</AbstractText>
<AbstractText Label="CONTACT">rajasek@engr.uconn.edu</AbstractText>
<AbstractText Label="SUPPLEMENTARY INFORMATION">Supplementary data are available at Bioinformatics online.</AbstractText>
<CopyrightInformation>© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Mamun</LastName>
<ForeName>Abdullah-Al</ForeName>
<Initials>AA</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Pal</LastName>
<ForeName>Soumitra</ForeName>
<Initials>S</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Rajasekaran</LastName>
<ForeName>Sanguthevar</ForeName>
<Initials>S</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y"><Grant><GrantID>R01 LM010101</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2016</Year>
<Month>06</Month>
<Day>09</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D001483" MajorTopicYN="N">Base Sequence</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016678" MajorTopicYN="N">Genome</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016415" MajorTopicYN="Y">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="Y">Sequence Analysis, DNA</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012984" MajorTopicYN="N">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2016</Year>
<Month>01</Month>
<Day>25</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2016</Year>
<Month>05</Month>
<Day>25</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2016</Year>
<Month>6</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2016</Year>
<Month>6</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2017</Year>
<Month>8</Month>
<Day>2</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">27283950</ArticleId>
<ArticleId IdType="pii">btw345</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btw345</ArticleId>
<ArticleId IdType="pmc">PMC5939891</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Genome Biol. 2010;11(11):R116</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21114842</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2008 Dec 15;24(24):2818-24</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18952627</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2014 Jul 15;30(14):1950-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24618471</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2008 May;18(5):821-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18349386</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Bioinformatics. 2011 Aug 10;12:333</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21831268</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2014 Jul 15;30(14):2070-2</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24642064</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11504945</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Genomics. 2008 Oct 31;9:517</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18976482</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2011 Jul 1;27(13):i137-41</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21685062</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2015 May 15;31(10):1569-76</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25609798</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2013 Feb 1;29(3):308-15</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23202746</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Bioinformatics. 2013 May 16;14:160</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23679007</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2003 Jan;13(1):91-6</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12529310</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Connecticut</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Connecticut"><name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
</region>
<name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001656 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001656 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:27283950
   |texte=   KCMBT: a k-mer Counter based on Multiple Burst Trees.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:27283950" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

KCMBT: a k-mer Counter based on Multiple Burst Trees.

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki