Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Turtle: identifying frequent k-mers with cache-efficient algorithms.

Identifieur interne : 001A30 ( PubMed/Corpus ); précédent : 001A29; suivant : 001A31

Turtle: identifying frequent k-mers with cache-efficient algorithms.

Auteurs : Rajat Shuvro Roy ; Debashish Bhattacharya ; Alexander Schliep

Source :

RBID : pubmed:24618471

English descriptors

Abstract

Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library.

DOI: 10.1093/bioinformatics/btu132
PubMed: 24618471

Links to Exploration step

pubmed:24618471

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Turtle: identifying frequent k-mers with cache-efficient algorithms.</title>
<author>
<name sortKey="Roy, Rajat Shuvro" sort="Roy, Rajat Shuvro" uniqKey="Roy R" first="Rajat Shuvro" last="Roy">Rajat Shuvro Roy</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Bhattacharya, Debashish" sort="Bhattacharya, Debashish" uniqKey="Bhattacharya D" first="Debashish" last="Bhattacharya">Debashish Bhattacharya</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Schliep, Alexander" sort="Schliep, Alexander" uniqKey="Schliep A" first="Alexander" last="Schliep">Alexander Schliep</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:24618471</idno>
<idno type="pmid">24618471</idno>
<idno type="doi">10.1093/bioinformatics/btu132</idno>
<idno type="wicri:Area/PubMed/Corpus">001A30</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A30</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Turtle: identifying frequent k-mers with cache-efficient algorithms.</title>
<author>
<name sortKey="Roy, Rajat Shuvro" sort="Roy, Rajat Shuvro" uniqKey="Roy R" first="Rajat Shuvro" last="Roy">Rajat Shuvro Roy</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Bhattacharya, Debashish" sort="Bhattacharya, Debashish" uniqKey="Bhattacharya D" first="Debashish" last="Bhattacharya">Debashish Bhattacharya</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Schliep, Alexander" sort="Schliep, Alexander" uniqKey="Schliep A" first="Alexander" last="Schliep">Alexander Schliep</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Genome, Human</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">24618471</PMID>
<DateCompleted>
<Year>2014</Year>
<Month>09</Month>
<Day>18</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>30</Volume>
<Issue>14</Issue>
<PubDate>
<Year>2014</Year>
<Month>Jul</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Turtle: identifying frequent k-mers with cache-efficient algorithms.</ArticleTitle>
<Pagination>
<MedlinePgn>1950-7</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btu132</ELocationID>
<Abstract>
<AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present a novel method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Our method is designed to minimize cache misses in a cache-efficient manner by using a pattern-blocked Bloom filter to remove infrequent k-mers from consideration in combination with a novel sort-and-compact scheme, instead of a hash, for the actual counting. Although this increases theoretical complexity, the savings in cache misses reduce the empirical running times. A variant of method can resort to a counting Bloom filter for even larger savings in memory at the expense of false-negative rates in addition to the false-positive rates common to all Bloom filter-based approaches. A comparison with the state-of-the-art shows reduced memory requirements and running times.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION" NlmCategory="METHODS">The tools are freely available for download at http://bioinformatics.rutgers.edu/Software/Turtle and http://figshare.com/articles/Turtle/791582.</AbstractText>
<CopyrightInformation>© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Roy</LastName>
<ForeName>Rajat Shuvro</ForeName>
<Initials>RS</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Bhattacharya</LastName>
<ForeName>Debashish</ForeName>
<Initials>D</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Schliep</LastName>
<ForeName>Alexander</ForeName>
<Initials>A</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2014</Year>
<Month>03</Month>
<Day>10</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015894" MajorTopicYN="N">Genome, Human</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2014</Year>
<Month>3</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2014</Year>
<Month>3</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2014</Year>
<Month>9</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">24618471</ArticleId>
<ArticleId IdType="pii">btu132</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btu132</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A30 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001A30 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:24618471
   |texte=   Turtle: identifying frequent k-mers with cache-efficient algorithms.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:24618471" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021