KAnalyze: a fast versatile pipelined k-mer toolkit.
Identifieur interne : 001A25 ( PubMed/Corpus ); précédent : 001A24; suivant : 001A26KAnalyze: a fast versatile pipelined k-mer toolkit.
Auteurs : Peter Audano ; Fredrik VannbergSource :
- Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2014.
English descriptors
- KwdEn :
- MESH :
- chemistry : Chromosomes, Human, Pair 1.
- methods : Sequence Analysis, DNA.
- Algorithms, Humans, Software.
Abstract
Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.
DOI: 10.1093/bioinformatics/btu152
PubMed: 24642064
Links to Exploration step
pubmed:24642064Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">KAnalyze: a fast versatile pipelined k-mer toolkit.</title>
<author><name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
<affiliation><nlm:affiliation>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
<affiliation><nlm:affiliation>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:24642064</idno>
<idno type="pmid">24642064</idno>
<idno type="doi">10.1093/bioinformatics/btu152</idno>
<idno type="wicri:Area/PubMed/Corpus">001A25</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A25</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">KAnalyze: a fast versatile pipelined k-mer toolkit.</title>
<author><name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
<affiliation><nlm:affiliation>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
<affiliation><nlm:affiliation>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Chromosomes, Human, Pair 1 (chemistry)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="chemistry" xml:lang="en"><term>Chromosomes, Human, Pair 1</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Humans</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM"><PMID Version="1">24642064</PMID>
<DateCompleted><Year>2014</Year>
<Month>09</Month>
<Day>18</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>30</Volume>
<Issue>14</Issue>
<PubDate><Year>2014</Year>
<Month>Jul</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>KAnalyze: a fast versatile pipelined k-mer toolkit.</ArticleTitle>
<Pagination><MedlinePgn>2070-2</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btu152</ELocationID>
<Abstract><AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION" NlmCategory="METHODS">KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/.</AbstractText>
<CopyrightInformation>© The Author 2014. Published by Oxford University Press.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Audano</LastName>
<ForeName>Peter</ForeName>
<Initials>P</Initials>
<AffiliationInfo><Affiliation>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Vannberg</LastName>
<ForeName>Fredrik</ForeName>
<Initials>F</Initials>
<AffiliationInfo><Affiliation>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y"><Grant><GrantID>T32 GM105490</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2014</Year>
<Month>03</Month>
<Day>18</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D002878" MajorTopicYN="N">Chromosomes, Human, Pair 1</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="N">chemistry</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2014</Year>
<Month>3</Month>
<Day>20</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2014</Year>
<Month>3</Month>
<Day>20</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2014</Year>
<Month>9</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">24642064</ArticleId>
<ArticleId IdType="pii">btu152</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btu152</ArticleId>
<ArticleId IdType="pmc">PMC4080738</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Nat Biotechnol. 2013 Apr;31(4):325-30</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23475072</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>PLoS Biol. 2014 Jan;12(1):e1001745</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24415924</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nucleic Acids Res. 2009 Jan;37(Database issue):D77-82</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18842628</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A25 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001A25 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:24642064 |texte= KAnalyze: a fast versatile pipelined k-mer toolkit. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:24642064" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
![]() | This area was generated with Dilib version V0.6.33. | ![]() |