Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

KmerStream: streaming algorithms for k-mer abundance estimation.

Identifieur interne : 001859 ( PubMed/Checkpoint ); précédent : 001858; suivant : 001860

KmerStream: streaming algorithms for k-mer abundance estimation.

Auteurs : Páll Melsted [Islande] ; Bjarni V. Halld Rsson [Islande]

Source :

RBID : pubmed:25355787

Descripteurs français

English descriptors

Abstract

Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.

DOI: 10.1093/bioinformatics/btu713
PubMed: 25355787


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:25355787

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">KmerStream: streaming algorithms for k-mer abundance estimation.</title>
<author>
<name sortKey="Melsted, Pall" sort="Melsted, Pall" uniqKey="Melsted P" first="Páll" last="Melsted">Páll Melsted</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
<country xml:lang="fr">Islande</country>
<wicri:regionArea>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík</wicri:regionArea>
<wicri:noRegion>Reykjavík</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Halld Rsson, Bjarni V" sort="Halld Rsson, Bjarni V" uniqKey="Halld Rsson B" first="Bjarni V" last="Halld Rsson">Bjarni V. Halld Rsson</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
<country xml:lang="fr">Islande</country>
<wicri:regionArea>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík</wicri:regionArea>
<wicri:noRegion>Reykjavík</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25355787</idno>
<idno type="pmid">25355787</idno>
<idno type="doi">10.1093/bioinformatics/btu713</idno>
<idno type="wicri:Area/PubMed/Corpus">001805</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001805</idno>
<idno type="wicri:Area/PubMed/Curation">001805</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001805</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001859</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001859</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">KmerStream: streaming algorithms for k-mer abundance estimation.</title>
<author>
<name sortKey="Melsted, Pall" sort="Melsted, Pall" uniqKey="Melsted P" first="Páll" last="Melsted">Páll Melsted</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
<country xml:lang="fr">Islande</country>
<wicri:regionArea>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík</wicri:regionArea>
<wicri:noRegion>Reykjavík</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Halld Rsson, Bjarni V" sort="Halld Rsson, Bjarni V" uniqKey="Halld Rsson B" first="Bjarni V" last="Halld Rsson">Bjarni V. Halld Rsson</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</nlm:affiliation>
<country xml:lang="fr">Islande</country>
<wicri:regionArea>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík</wicri:regionArea>
<wicri:noRegion>Reykjavík</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Genome Size</term>
<term>Genome, Human</term>
<term>Genomics (methods)</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génome humain</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit ()</term>
<term>Taille du génome</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Genomics</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Genome Size</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génome humain</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
<term>Taille du génome</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">25355787</PMID>
<DateCompleted>
<Year>2015</Year>
<Month>03</Month>
<Day>05</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>30</Volume>
<Issue>24</Issue>
<PubDate>
<Year>2014</Year>
<Month>Dec</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>KmerStream: streaming algorithms for k-mer abundance estimation.</ArticleTitle>
<Pagination>
<MedlinePgn>3541-7</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btu713</ELocationID>
<Abstract>
<AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present KmerStream, a streaming algorithm for estimating the number of distinct k-mers present in high-throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are logarithmic in the size of the input. We derive a simple model that allows us to estimate the error rate of the sequencing experiment, as well as the genome size, using only the aggregate statistics reported by KmerStream. As an application we show how KmerStream can be used to compute the error rate of a DNA sequencing experiment. We run KmerStream on a set of 2656 whole genome sequenced individuals and compare the error rate to quality values reported by the sequencing equipment. We discover that while the quality values alone are largely reliable as a predictor of error rate, there is considerable variability in the error rates between sequencing runs, even when accounting for reported quality values.</AbstractText>
<CopyrightInformation>© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Melsted</LastName>
<ForeName>Páll</ForeName>
<Initials>P</Initials>
<AffiliationInfo>
<Affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Halldórsson</LastName>
<ForeName>Bjarni V</ForeName>
<Initials>BV</Initials>
<AffiliationInfo>
<Affiliation>Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2014</Year>
<Month>10</Month>
<Day>28</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059646" MajorTopicYN="N">Genome Size</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015894" MajorTopicYN="N">Genome, Human</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D023281" MajorTopicYN="N">Genomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="N">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2014</Year>
<Month>10</Month>
<Day>31</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2014</Year>
<Month>10</Month>
<Day>31</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2015</Year>
<Month>3</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">25355787</ArticleId>
<ArticleId IdType="pii">btu713</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btu713</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>Islande</li>
</country>
</list>
<tree>
<country name="Islande">
<noRegion>
<name sortKey="Melsted, Pall" sort="Melsted, Pall" uniqKey="Melsted P" first="Páll" last="Melsted">Páll Melsted</name>
</noRegion>
<name sortKey="Halld Rsson, Bjarni V" sort="Halld Rsson, Bjarni V" uniqKey="Halld Rsson B" first="Bjarni V" last="Halld Rsson">Bjarni V. Halld Rsson</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001859 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 001859 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:25355787
   |texte=   KmerStream: streaming algorithms for k-mer abundance estimation.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:25355787" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021