Gerbil: a fast and memory-efficient k-mer counter with GPU-support.
Identifieur interne : 000D36 ( PubMed/Corpus ); précédent : 000D35; suivant : 000D37Gerbil: a fast and memory-efficient k-mer counter with GPU-support.
Auteurs : Marius Erbert ; Steffen Rechner ; Matthias Müller-HannemannSource :
- Algorithms for molecular biology : AMB [ 1748-7188 ] ; 2017.
Abstract
A basic task in bioinformatics is the counting of k-mers in genome sequences. Existing k-mer counting tools are most often optimized for small k < 32 and suffer from excessive memory resource consumption or degrading performance for large k. However, given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important.
DOI: 10.1186/s13015-017-0097-9
PubMed: 28373894
Links to Exploration step
pubmed:28373894Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Gerbil: a fast and memory-efficient <i>k</i>
-mer counter with GPU-support.</title>
<author><name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<affiliation><nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
<affiliation><nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<affiliation><nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:28373894</idno>
<idno type="pmid">28373894</idno>
<idno type="doi">10.1186/s13015-017-0097-9</idno>
<idno type="wicri:Area/PubMed/Corpus">000D36</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000D36</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Gerbil: a fast and memory-efficient <i>k</i>
-mer counter with GPU-support.</title>
<author><name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<affiliation><nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
<affiliation><nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<affiliation><nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series><title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint><date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A basic task in bioinformatics is the counting of <i>k</i>
-mers in genome sequences. Existing <i>k</i>
-mer counting tools are most often optimized for small <i>k</i>
< 32 and suffer from excessive memory resource consumption or degrading performance for large <i>k</i>
. However, given the technology trend towards long reads of next-generation sequencers, support for large <i>k</i>
becomes increasingly important.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM"><PMID Version="1">28373894</PMID>
<DateRevised><Year>2019</Year>
<Month>11</Month>
<Day>20</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection"><Journal><ISSN IssnType="Print">1748-7188</ISSN>
<JournalIssue CitedMedium="Print"><Volume>12</Volume>
<PubDate><Year>2017</Year>
</PubDate>
</JournalIssue>
<Title>Algorithms for molecular biology : AMB</Title>
<ISOAbbreviation>Algorithms Mol Biol</ISOAbbreviation>
</Journal>
<ArticleTitle>Gerbil: a fast and memory-efficient <i>k</i>
-mer counter with GPU-support.</ArticleTitle>
<Pagination><MedlinePgn>9</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/s13015-017-0097-9</ELocationID>
<Abstract><AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">A basic task in bioinformatics is the counting of <i>k</i>
-mers in genome sequences. Existing <i>k</i>
-mer counting tools are most often optimized for small <i>k</i>
< 32 and suffer from excessive memory resource consumption or degrading performance for large <i>k</i>
. However, given the technology trend towards long reads of next-generation sequencers, support for large <i>k</i>
becomes increasingly important.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present the open source <i>k</i>
-mer counting software <i>Gerbil</i>
that has been designed for the efficient counting of <i>k</i>
-mers for <i>k</i>
≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the <i>k</i>
-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality, <i>Gerbil</i>
can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that <i>Gerbil</i>
is able to efficiently support both small and large <i>k</i>
.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">While <i>Gerbil</i>
's performance is comparable to existing state-of-the-art open source <i>k</i>
-mer counting tools for small <i>k</i>
< 32, it vastly outperforms its competitors for large <i>k</i>
, thereby enabling new applications which require large values of <i>k</i>
.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Erbert</LastName>
<ForeName>Marius</ForeName>
<Initials>M</Initials>
<AffiliationInfo><Affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</Affiliation>
<Identifier Source="GRID">grid.9018.0</Identifier>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Rechner</LastName>
<ForeName>Steffen</ForeName>
<Initials>S</Initials>
<AffiliationInfo><Affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</Affiliation>
<Identifier Source="GRID">grid.9018.0</Identifier>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Müller-Hannemann</LastName>
<ForeName>Matthias</ForeName>
<Initials>M</Initials>
<AffiliationInfo><Affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</Affiliation>
<Identifier Source="GRID">grid.9018.0</Identifier>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2017</Year>
<Month>03</Month>
<Day>31</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Algorithms Mol Biol</MedlineTA>
<NlmUniqueID>101265088</NlmUniqueID>
<ISSNLinking>1748-7188</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM"><Keyword MajorTopicYN="N">Algorithm engineering</Keyword>
<Keyword MajorTopicYN="N">GPU computing</Keyword>
<Keyword MajorTopicYN="N">Genome sequences</Keyword>
<Keyword MajorTopicYN="N">de novo assembly</Keyword>
<Keyword MajorTopicYN="N">k-mer counting</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2016</Year>
<Month>12</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2017</Year>
<Month>02</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2017</Year>
<Month>4</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2017</Year>
<Month>4</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2017</Year>
<Month>4</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">28373894</ArticleId>
<ArticleId IdType="doi">10.1186/s13015-017-0097-9</ArticleId>
<ArticleId IdType="pii">97</ArticleId>
<ArticleId IdType="pmc">PMC5374613</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>J Comput Biol. 2004;11(4):734-52</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15579242</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2014 Jul 15;30(14):1950-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24618471</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2016 Sep 15;32(18):2783-90</Citation>
<ArticleIdList><ArticleId IdType="pubmed">27283950</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>J Comput Biol. 2016 Apr;23 (4):248-55</Citation>
<ArticleIdList><ArticleId IdType="pubmed">26982880</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Bioinformatics. 2011 Aug 10;12:333</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21831268</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Brief Bioinform. 2017 Jan;18(1):1-8</Citation>
<ArticleIdList><ArticleId IdType="pubmed">26868358</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2004 Dec 12;20(18):3363-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15256412</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Res Notes. 2014 Jul 30;7:484</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25077983</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2015 May 15;31(10):1569-76</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25609798</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Sci Data. 2014 Nov 25;1:140045</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25977796</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2014 Jan 1;30(1):31-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23732276</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Bioinformatics. 2013 May 16;14:160</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23679007</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D36 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000D36 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:28373894 |texte= Gerbil: a fast and memory-efficient k-mer counter with GPU-support. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:28373894" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |