Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Gerbil: a fast and memory-efficient k-mer counter with GPU-support

Identifieur interne : 000D76 ( Main/Exploration ); précédent : 000D75; suivant : 000D77

Gerbil: a fast and memory-efficient k-mer counter with GPU-support

Auteurs : Marius Erbert ; Steffen Rechner ; Matthias Müller-Hannemann

Source :

RBID : PMC:5374613

Abstract

Background

A basic task in bioinformatics is the counting of k-mers in genome sequences. Existing k-mer counting tools are most often optimized for small k < 32 and suffer from excessive memory resource consumption or degrading performance for large k. However, given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important.

Results

We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k ≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the k-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality, Gerbil can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that Gerbil is able to efficiently support both small and large k.

Conclusions

While Gerbil’s performance is comparable to existing state-of-the-art open source k-mer counting tools for small k < 32, it vastly outperforms its competitors for large k, thereby enabling new applications which require large values of k.

Electronic supplementary material

The online version of this article (doi:10.1186/s13015-017-0097-9) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s13015-017-0097-9
PubMed: 28373894
PubMed Central: 5374613


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Gerbil: a fast and memory-efficient 
<italic>k</italic>
-mer counter with GPU-support</title>
<author>
<name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<affiliation>
<nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
<affiliation>
<nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<affiliation>
<nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28373894</idno>
<idno type="pmc">5374613</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374613</idno>
<idno type="RBID">PMC:5374613</idno>
<idno type="doi">10.1186/s13015-017-0097-9</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000247</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000247</idno>
<idno type="wicri:Area/Pmc/Curation">000247</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000247</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000849</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000849</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:28373894</idno>
<idno type="wicri:Area/PubMed/Corpus">000D36</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000D36</idno>
<idno type="wicri:Area/PubMed/Curation">000D36</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000D36</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000C94</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000C94</idno>
<idno type="wicri:Area/Ncbi/Merge">001995</idno>
<idno type="wicri:Area/Ncbi/Curation">001995</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001995</idno>
<idno type="wicri:Area/Main/Merge">000D79</idno>
<idno type="wicri:Area/Main/Curation">000D76</idno>
<idno type="wicri:Area/Main/Exploration">000D76</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Gerbil: a fast and memory-efficient 
<italic>k</italic>
-mer counter with GPU-support</title>
<author>
<name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<affiliation>
<nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
<affiliation>
<nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<affiliation>
<nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Algorithms for Molecular Biology : AMB</title>
<idno type="eISSN">1748-7188</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>A basic task in bioinformatics is the counting of 
<italic>k</italic>
-mers in genome sequences. Existing 
<italic>k</italic>
-mer counting tools are most often optimized for small 
<italic>k</italic>
< 32 and suffer from excessive memory resource consumption or degrading performance for large 
<italic>k</italic>
. However, given the technology trend towards long reads of next-generation sequencers, support for large 
<italic>k</italic>
becomes increasingly important.</p>
</sec>
<sec>
<title>Results</title>
<p>We present the open source 
<italic>k</italic>
-mer counting software
<italic>Gerbil</italic>
that has been designed for the efficient counting of 
<italic>k</italic>
-mers for 
<italic>k</italic>
≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the 
<italic>k</italic>
-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality,
<italic>Gerbil</italic>
can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that
<italic>Gerbil</italic>
is able to efficiently support both small and large 
<italic>k</italic>
.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>While
<italic>Gerbil</italic>
’s performance is comparable to existing state-of-the-art open source 
<italic>k</italic>
-mer counting tools for small 
<italic>k</italic>
< 32, it vastly outperforms its competitors for large 
<italic>k</italic>
, thereby enabling new applications which require large values of 
<italic>k</italic>
.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s13015-017-0097-9) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Xavier, Bb" uniqKey="Xavier B">BB Xavier</name>
</author>
<author>
<name sortKey="Sabirova, J" uniqKey="Sabirova J">J Sabirova</name>
</author>
<author>
<name sortKey="Pieter, M" uniqKey="Pieter M">M Pieter</name>
</author>
<author>
<name sortKey="Hernalsteens, J P" uniqKey="Hernalsteens J">J-P Hernalsteens</name>
</author>
<author>
<name sortKey="De Greve, H" uniqKey="De Greve H">H de Greve</name>
</author>
<author>
<name sortKey="Goossens, H" uniqKey="Goossens H">H Goossens</name>
</author>
<author>
<name sortKey="Malhotra Kumar, S" uniqKey="Malhotra Kumar S">S Malhotra-Kumar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sameith, K" uniqKey="Sameith K">K Sameith</name>
</author>
<author>
<name sortKey="Roscito, Jg" uniqKey="Roscito J">JG Roscito</name>
</author>
<author>
<name sortKey="Hiller, M" uniqKey="Hiller M">M Hiller</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author>
<name sortKey="Pritchard, Jk" uniqKey="Pritchard J">JK Pritchard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author>
<name sortKey="Lavenier, D" uniqKey="Lavenier D">D Lavenier</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author>
<name sortKey="Debudaj Grabysz, A" uniqKey="Debudaj Grabysz A">A Debudaj-Grabysz</name>
</author>
<author>
<name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roy, Rs" uniqKey="Roy R">RS Roy</name>
</author>
<author>
<name sortKey="Bhattacharya, D" uniqKey="Bhattacharya D">D Bhattacharya</name>
</author>
<author>
<name sortKey="Schliep, A" uniqKey="Schliep A">A Schliep</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author>
<name sortKey="Kokot, M" uniqKey="Kokot M">M Kokot</name>
</author>
<author>
<name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
<author>
<name sortKey="Debudaj Grabysz, A" uniqKey="Debudaj Grabysz A">A Debudaj-Grabysz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perez, N" uniqKey="Perez N">N Pérez</name>
</author>
<author>
<name sortKey="Gutierrez, M" uniqKey="Gutierrez M">M Gutierrez</name>
</author>
<author>
<name sortKey="Vera, N" uniqKey="Vera N">N Vera</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mamun, Aa" uniqKey="Mamun A">AA Mamun</name>
</author>
<author>
<name sortKey="Pal, S" uniqKey="Pal S">S Pal</name>
</author>
<author>
<name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roberts, M" uniqKey="Roberts M">M Roberts</name>
</author>
<author>
<name sortKey="Hunt, Br" uniqKey="Hunt B">BR Hunt</name>
</author>
<author>
<name sortKey="Yorke, Ja" uniqKey="Yorke J">JA Yorke</name>
</author>
<author>
<name sortKey="Bolanos, Ra" uniqKey="Bolanos R">RA Bolanos</name>
</author>
<author>
<name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roberts, M" uniqKey="Roberts M">M Roberts</name>
</author>
<author>
<name sortKey="Hayes, W" uniqKey="Hayes W">W Hayes</name>
</author>
<author>
<name sortKey="Hunt, Br" uniqKey="Hunt B">BR Hunt</name>
</author>
<author>
<name sortKey="Mount, Sm" uniqKey="Mount S">SM Mount</name>
</author>
<author>
<name sortKey="Yorke, Ja" uniqKey="Yorke J">JA Yorke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Ke" uniqKey="Kim K">KE Kim</name>
</author>
<author>
<name sortKey="Peluso, P" uniqKey="Peluso P">P Peluso</name>
</author>
<author>
<name sortKey="Babayan, P" uniqKey="Babayan P">P Babayan</name>
</author>
<author>
<name sortKey="Yeadon, Pj" uniqKey="Yeadon P">PJ Yeadon</name>
</author>
<author>
<name sortKey="Yu, C" uniqKey="Yu C">C Yu</name>
</author>
<author>
<name sortKey="Fisher, Ww" uniqKey="Fisher W">WW Fisher</name>
</author>
<author>
<name sortKey="Chin, Cs" uniqKey="Chin C">CS Chin</name>
</author>
<author>
<name sortKey="Rapicavoli, Na" uniqKey="Rapicavoli N">NA Rapicavoli</name>
</author>
<author>
<name sortKey="Rank, Dr" uniqKey="Rank D">DR Rank</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D76 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D76 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:5374613
   |texte=   Gerbil: a fast and memory-efficient k-mer counter with GPU-support
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:28373894" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021