KmerStream: streaming algorithms for k-mer abundance estimation.
Identifieur interne : 001805 ( PubMed/Corpus ); précédent : 001804; suivant : 001806KmerStream: streaming algorithms for k-mer abundance estimation.
Auteurs : Páll Melsted ; Bjarni V. Halld RssonSource :
- Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2014.
English descriptors
- KwdEn :
- MESH :
Abstract
Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.
DOI: 10.1093/bioinformatics/btu713
PubMed: 25355787
Links to Exploration step
pubmed:25355787Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">KmerStream: streaming algorithms for k-mer abundance estimation.</title>
<author><name sortKey="Melsted, Pall" sort="Melsted, Pall" uniqKey="Melsted P" first="Páll" last="Melsted">Páll Melsted</name>
<affiliation><nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Halld Rsson, Bjarni V" sort="Halld Rsson, Bjarni V" uniqKey="Halld Rsson B" first="Bjarni V" last="Halld Rsson">Bjarni V. Halld Rsson</name>
<affiliation><nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25355787</idno>
<idno type="pmid">25355787</idno>
<idno type="doi">10.1093/bioinformatics/btu713</idno>
<idno type="wicri:Area/PubMed/Corpus">001805</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001805</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">KmerStream: streaming algorithms for k-mer abundance estimation.</title>
<author><name sortKey="Melsted, Pall" sort="Melsted, Pall" uniqKey="Melsted P" first="Páll" last="Melsted">Páll Melsted</name>
<affiliation><nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Halld Rsson, Bjarni V" sort="Halld Rsson, Bjarni V" uniqKey="Halld Rsson B" first="Bjarni V" last="Halld Rsson">Bjarni V. Halld Rsson</name>
<affiliation><nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Genome Size</term>
<term>Genome, Human</term>
<term>Genomics (methods)</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Genomics</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Genome Size</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM"><PMID Version="1">25355787</PMID>
<DateCompleted><Year>2015</Year>
<Month>03</Month>
<Day>05</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>30</Volume>
<Issue>24</Issue>
<PubDate><Year>2014</Year>
<Month>Dec</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>KmerStream: streaming algorithms for k-mer abundance estimation.</ArticleTitle>
<Pagination><MedlinePgn>3541-7</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btu713</ELocationID>
<Abstract><AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present KmerStream, a streaming algorithm for estimating the number of distinct k-mers present in high-throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are logarithmic in the size of the input. We derive a simple model that allows us to estimate the error rate of the sequencing experiment, as well as the genome size, using only the aggregate statistics reported by KmerStream. As an application we show how KmerStream can be used to compute the error rate of a DNA sequencing experiment. We run KmerStream on a set of 2656 whole genome sequenced individuals and compare the error rate to quality values reported by the sequencing equipment. We discover that while the quality values alone are largely reliable as a predictor of error rate, there is considerable variability in the error rates between sequencing runs, even when accounting for reported quality values.</AbstractText>
<CopyrightInformation>© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Melsted</LastName>
<ForeName>Páll</ForeName>
<Initials>P</Initials>
<AffiliationInfo><Affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Halldórsson</LastName>
<ForeName>Bjarni V</ForeName>
<Initials>BV</Initials>
<AffiliationInfo><Affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2014</Year>
<Month>10</Month>
<Day>28</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D059646" MajorTopicYN="N">Genome Size</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D015894" MajorTopicYN="N">Genome, Human</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D023281" MajorTopicYN="N">Genomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012984" MajorTopicYN="N">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2014</Year>
<Month>10</Month>
<Day>31</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2014</Year>
<Month>10</Month>
<Day>31</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2015</Year>
<Month>3</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">25355787</ArticleId>
<ArticleId IdType="pii">btu713</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btu713</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001805 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001805 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:25355787 |texte= KmerStream: streaming algorithms for k-mer abundance estimation. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:25355787" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |