Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

KAnalyze: a fast versatile pipelined K-mer toolkit

Identifieur interne : 001C08 ( Main/Exploration ); précédent : 001C07; suivant : 001C09

KAnalyze: a fast versatile pipelined K-mer toolkit

Auteurs : Peter Audano ; Fredrik Vannberg

Source :

RBID : PMC:4080738

Descripteurs français

English descriptors

Abstract

Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.

Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.

Availability and implementation: KAnalyze is available on SourceForge:

https://sourceforge.net/projects/kanalyze/

Contact:fredrik.vannberg@biology.gatech.edu

Supplementary information: Supplementary data are available at Bioinformatics online.


Url:
DOI: 10.1093/bioinformatics/btu152
PubMed: 24642064
PubMed Central: 4080738


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">KAnalyze: a fast versatile pipelined K-mer toolkit</title>
<author>
<name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
</author>
<author>
<name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24642064</idno>
<idno type="pmc">4080738</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080738</idno>
<idno type="RBID">PMC:4080738</idno>
<idno type="doi">10.1093/bioinformatics/btu152</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000B11</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B11</idno>
<idno type="wicri:Area/Pmc/Curation">000B11</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B11</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001052</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001052</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:24642064</idno>
<idno type="wicri:Area/PubMed/Corpus">001A25</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A25</idno>
<idno type="wicri:Area/PubMed/Curation">001A25</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001A25</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001861</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001861</idno>
<idno type="wicri:Area/Ncbi/Merge">000D18</idno>
<idno type="wicri:Area/Ncbi/Curation">000D18</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000D18</idno>
<idno type="wicri:doubleKey">1367-4803:2014:Audano P:kanalyze:a:fast</idno>
<idno type="wicri:Area/Main/Merge">001C21</idno>
<idno type="wicri:Area/Main/Curation">001C08</idno>
<idno type="wicri:Area/Main/Exploration">001C08</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">KAnalyze: a fast versatile pipelined K-mer toolkit</title>
<author>
<name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
</author>
<author>
<name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Chromosomes, Human, Pair 1 (chemistry)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Chromosomes humains de la paire 1 ()</term>
<term>Humains</term>
<term>Logiciel</term>
</keywords>
<keywords scheme="MESH" qualifier="chemistry" xml:lang="en">
<term>Chromosomes, Human, Pair 1</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Chromosomes humains de la paire 1</term>
<term>Humains</term>
<term>Logiciel</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>
<bold>Motivation</bold>
: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.</p>
<p>
<bold>Results</bold>
: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.</p>
<p>
<bold>Availability and implementation</bold>
: KAnalyze is available on SourceForge:</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/kanalyze/">https://sourceforge.net/projects/kanalyze/</ext-link>
</p>
<p>
<bold>Contact:</bold>
<email>fredrik.vannberg@biology.gatech.edu</email>
</p>
<p>
<bold>Supplementary information</bold>
:
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary data</ext-link>
are available at
<italic>Bioinformatics</italic>
online.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Knuth, D" uniqKey="Knuth D">D Knuth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Newburger, De" uniqKey="Newburger D">DE Newburger</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nordstrom, Kjv" uniqKey="Nordstrom K">KJV Nordström</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, G" uniqKey="Wilson G">G Wilson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
<name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001C08 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001C08 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:4080738
   |texte=   KAnalyze: a fast versatile pipelined K-mer toolkit
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:24642064" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021