Turtle: identifying frequent k-mers with cache-efficient algorithms.
Identifieur interne : 001A30 ( PubMed/Corpus ); précédent : 001A29; suivant : 001A31Turtle: identifying frequent k-mers with cache-efficient algorithms.
Auteurs : Rajat Shuvro Roy ; Debashish Bhattacharya ; Alexander SchliepSource :
- Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2014.
English descriptors
- KwdEn :
- MESH :
Abstract
Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library.
DOI: 10.1093/bioinformatics/btu132
PubMed: 24618471
Links to Exploration step
pubmed:24618471Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Turtle: identifying frequent k-mers with cache-efficient algorithms.</title>
<author><name sortKey="Roy, Rajat Shuvro" sort="Roy, Rajat Shuvro" uniqKey="Roy R" first="Rajat Shuvro" last="Roy">Rajat Shuvro Roy</name>
<affiliation><nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Bhattacharya, Debashish" sort="Bhattacharya, Debashish" uniqKey="Bhattacharya D" first="Debashish" last="Bhattacharya">Debashish Bhattacharya</name>
<affiliation><nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Schliep, Alexander" sort="Schliep, Alexander" uniqKey="Schliep A" first="Alexander" last="Schliep">Alexander Schliep</name>
<affiliation><nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:24618471</idno>
<idno type="pmid">24618471</idno>
<idno type="doi">10.1093/bioinformatics/btu132</idno>
<idno type="wicri:Area/PubMed/Corpus">001A30</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A30</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Turtle: identifying frequent k-mers with cache-efficient algorithms.</title>
<author><name sortKey="Roy, Rajat Shuvro" sort="Roy, Rajat Shuvro" uniqKey="Roy R" first="Rajat Shuvro" last="Roy">Rajat Shuvro Roy</name>
<affiliation><nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Bhattacharya, Debashish" sort="Bhattacharya, Debashish" uniqKey="Bhattacharya D" first="Debashish" last="Bhattacharya">Debashish Bhattacharya</name>
<affiliation><nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Schliep, Alexander" sort="Schliep, Alexander" uniqKey="Schliep A" first="Alexander" last="Schliep">Alexander Schliep</name>
<affiliation><nlm:affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM"><PMID Version="1">24618471</PMID>
<DateCompleted><Year>2014</Year>
<Month>09</Month>
<Day>18</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>30</Volume>
<Issue>14</Issue>
<PubDate><Year>2014</Year>
<Month>Jul</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Turtle: identifying frequent k-mers with cache-efficient algorithms.</ArticleTitle>
<Pagination><MedlinePgn>1950-7</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btu132</ELocationID>
<Abstract><AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present a novel method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Our method is designed to minimize cache misses in a cache-efficient manner by using a pattern-blocked Bloom filter to remove infrequent k-mers from consideration in combination with a novel sort-and-compact scheme, instead of a hash, for the actual counting. Although this increases theoretical complexity, the savings in cache misses reduce the empirical running times. A variant of method can resort to a counting Bloom filter for even larger savings in memory at the expense of false-negative rates in addition to the false-positive rates common to all Bloom filter-based approaches. A comparison with the state-of-the-art shows reduced memory requirements and running times.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION" NlmCategory="METHODS">The tools are freely available for download at http://bioinformatics.rutgers.edu/Software/Turtle and http://figshare.com/articles/Turtle/791582.</AbstractText>
<CopyrightInformation>© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Roy</LastName>
<ForeName>Rajat Shuvro</ForeName>
<Initials>RS</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Bhattacharya</LastName>
<ForeName>Debashish</ForeName>
<Initials>D</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Schliep</LastName>
<ForeName>Alexander</ForeName>
<Initials>A</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USADepartment of Computer Science, Department of Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences and BioMaPS Institute for Quantitative Biology, Rutgers University, New Brunswick, NJ 08901, USA.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2014</Year>
<Month>03</Month>
<Day>10</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D015894" MajorTopicYN="N">Genome, Human</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2014</Year>
<Month>3</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2014</Year>
<Month>3</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2014</Year>
<Month>9</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">24618471</ArticleId>
<ArticleId IdType="pii">btu132</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btu132</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A30 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001A30 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:24618471 |texte= Turtle: identifying frequent k-mers with cache-efficient algorithms. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:24618471" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |