Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Gerbil: a fast and memory-efficient k-mer counter with GPU-support.

Identifieur interne : 000D36 ( PubMed/Corpus ); précédent : 000D35; suivant : 000D37

Gerbil: a fast and memory-efficient k-mer counter with GPU-support.

Auteurs : Marius Erbert ; Steffen Rechner ; Matthias Müller-Hannemann

Source :

RBID : pubmed:28373894

Abstract

A basic task in bioinformatics is the counting of k-mers in genome sequences. Existing k-mer counting tools are most often optimized for small k < 32 and suffer from excessive memory resource consumption or degrading performance for large k. However, given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important.

DOI: 10.1186/s13015-017-0097-9
PubMed: 28373894

Links to Exploration step

pubmed:28373894

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Gerbil: a fast and memory-efficient 
<i>k</i>
-mer counter with GPU-support.</title>
<author>
<name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<affiliation>
<nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
<affiliation>
<nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<affiliation>
<nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:28373894</idno>
<idno type="pmid">28373894</idno>
<idno type="doi">10.1186/s13015-017-0097-9</idno>
<idno type="wicri:Area/PubMed/Corpus">000D36</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000D36</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Gerbil: a fast and memory-efficient 
<i>k</i>
-mer counter with GPU-support.</title>
<author>
<name sortKey="Erbert, Marius" sort="Erbert, Marius" uniqKey="Erbert M" first="Marius" last="Erbert">Marius Erbert</name>
<affiliation>
<nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rechner, Steffen" sort="Rechner, Steffen" uniqKey="Rechner S" first="Steffen" last="Rechner">Steffen Rechner</name>
<affiliation>
<nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Muller Hannemann, Matthias" sort="Muller Hannemann, Matthias" uniqKey="Muller Hannemann M" first="Matthias" last="Müller-Hannemann">Matthias Müller-Hannemann</name>
<affiliation>
<nlm:affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint>
<date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A basic task in bioinformatics is the counting of 
<i>k</i>
-mers in genome sequences. Existing 
<i>k</i>
-mer counting tools are most often optimized for small 
<i>k</i>
< 32 and suffer from excessive memory resource consumption or degrading performance for large 
<i>k</i>
. However, given the technology trend towards long reads of next-generation sequencers, support for large 
<i>k</i>
becomes increasingly important.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">28373894</PMID>
<DateRevised>
<Year>2019</Year>
<Month>11</Month>
<Day>20</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Print">1748-7188</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>12</Volume>
<PubDate>
<Year>2017</Year>
</PubDate>
</JournalIssue>
<Title>Algorithms for molecular biology : AMB</Title>
<ISOAbbreviation>Algorithms Mol Biol</ISOAbbreviation>
</Journal>
<ArticleTitle>Gerbil: a fast and memory-efficient 
<i>k</i>
-mer counter with GPU-support.</ArticleTitle>
<Pagination>
<MedlinePgn>9</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/s13015-017-0097-9</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">A basic task in bioinformatics is the counting of 
<i>k</i>
-mers in genome sequences. Existing 
<i>k</i>
-mer counting tools are most often optimized for small 
<i>k</i>
< 32 and suffer from excessive memory resource consumption or degrading performance for large 
<i>k</i>
. However, given the technology trend towards long reads of next-generation sequencers, support for large 
<i>k</i>
becomes increasingly important.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present the open source 
<i>k</i>
-mer counting software
<i>Gerbil</i>
that has been designed for the efficient counting of 
<i>k</i>
-mers for 
<i>k</i>
≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the 
<i>k</i>
-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality,
<i>Gerbil</i>
can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that
<i>Gerbil</i>
is able to efficiently support both small and large 
<i>k</i>
.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">While
<i>Gerbil</i>
's performance is comparable to existing state-of-the-art open source 
<i>k</i>
-mer counting tools for small 
<i>k</i>
< 32, it vastly outperforms its competitors for large 
<i>k</i>
, thereby enabling new applications which require large values of 
<i>k</i>
.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Erbert</LastName>
<ForeName>Marius</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</Affiliation>
<Identifier Source="GRID">grid.9018.0</Identifier>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Rechner</LastName>
<ForeName>Steffen</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</Affiliation>
<Identifier Source="GRID">grid.9018.0</Identifier>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Müller-Hannemann</LastName>
<ForeName>Matthias</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle (Saae), Germany.</Affiliation>
<Identifier Source="GRID">grid.9018.0</Identifier>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2017</Year>
<Month>03</Month>
<Day>31</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Algorithms Mol Biol</MedlineTA>
<NlmUniqueID>101265088</NlmUniqueID>
<ISSNLinking>1748-7188</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Algorithm engineering</Keyword>
<Keyword MajorTopicYN="N">GPU computing</Keyword>
<Keyword MajorTopicYN="N">Genome sequences</Keyword>
<Keyword MajorTopicYN="N">de novo assembly</Keyword>
<Keyword MajorTopicYN="N">k-mer counting</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2016</Year>
<Month>12</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2017</Year>
<Month>02</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2017</Year>
<Month>4</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2017</Year>
<Month>4</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2017</Year>
<Month>4</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">28373894</ArticleId>
<ArticleId IdType="doi">10.1186/s13015-017-0097-9</ArticleId>
<ArticleId IdType="pii">97</ArticleId>
<ArticleId IdType="pmc">PMC5374613</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>J Comput Biol. 2004;11(4):734-52</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15579242</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Jul 15;30(14):1950-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24618471</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Sep 15;32(18):2783-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27283950</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2016 Apr;23 (4):248-55</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26982880</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2011 Aug 10;12:333</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21831268</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Brief Bioinform. 2017 Jan;18(1):1-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26868358</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2004 Dec 12;20(18):3363-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15256412</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Res Notes. 2014 Jul 30;7:484</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25077983</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2015 May 15;31(10):1569-76</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25609798</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Sci Data. 2014 Nov 25;1:140045</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25977796</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Jan 1;30(1):31-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23732276</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2013 May 16;14:160</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23679007</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D36 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000D36 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:28373894
   |texte=   Gerbil: a fast and memory-efficient k-mer counter with GPU-support.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:28373894" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021