Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Identifieur interne : 001F12 ( PubMed/Corpus ); précédent : 001F11; suivant : 001F13

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Auteurs : Guillaume Marçais ; Carl Kingsford

Source :

RBID : pubmed:21217122

English descriptors

Abstract

Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.

DOI: 10.1093/bioinformatics/btr011
PubMed: 21217122

Links to Exploration step

pubmed:21217122

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, University of Maryland, College Park, MD 20742, USA. gmarcais@umd.edu</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2011">2011</date>
<idno type="RBID">pubmed:21217122</idno>
<idno type="pmid">21217122</idno>
<idno type="doi">10.1093/bioinformatics/btr011</idno>
<idno type="wicri:Area/PubMed/Corpus">001F12</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F12</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, University of Maryland, College Park, MD 20742, USA. gmarcais@umd.edu</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2011" type="published">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Base Sequence</term>
<term>Computational Biology (methods)</term>
<term>Genome</term>
<term>Humans</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Base Sequence</term>
<term>Genome</term>
<term>Humans</term>
<term>Sequence Alignment</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">21217122</PMID>
<DateCompleted>
<Year>2011</Year>
<Month>05</Month>
<Day>31</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>01</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>27</Volume>
<Issue>6</Issue>
<PubDate>
<Year>2011</Year>
<Month>Mar</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.</ArticleTitle>
<Pagination>
<MedlinePgn>764-70</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btr011</ELocationID>
<Abstract>
<AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.</AbstractText>
<AbstractText Label="AVAILABILITY" NlmCategory="BACKGROUND">The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Marçais</LastName>
<ForeName>Guillaume</ForeName>
<Initials>G</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, University of Maryland, College Park, MD 20742, USA. gmarcais@umd.edu</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Kingsford</LastName>
<ForeName>Carl</ForeName>
<Initials>C</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>R21 AI085376</GrantID>
<Acronym>AI</Acronym>
<Agency>NIAID NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>1R01HG0294501</GrantID>
<Acronym>HG</Acronym>
<Agency>NHGRI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>1R21AI085376</GrantID>
<Acronym>AI</Acronym>
<Agency>NIAID NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2011</Year>
<Month>01</Month>
<Day>07</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001483" MajorTopicYN="N">Base Sequence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016678" MajorTopicYN="N">Genome</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2011</Year>
<Month>1</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2011</Year>
<Month>1</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2011</Year>
<Month>6</Month>
<Day>1</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">21217122</ArticleId>
<ArticleId IdType="pii">btr011</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btr011</ArticleId>
<ArticleId IdType="pmc">PMC3051319</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Nucleic Acids Res. 2004;32(5):1792-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15034147</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Mar 1;21(5):582-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15374857</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2008 Dec 15;24(24):2818-24</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18952627</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2008;9:517</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18976482</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Dec;78(6 Pt 1):061912</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19256873</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2010 Jan 21;463(7279):311-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20010809</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2003 Oct;13(10):2306-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12975312</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Biol. 2010;8(9). pii: e1000475. doi: 10.1371/journal.pbio.1000475</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20838655</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2010;11(11):R116</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21114842</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2000 Mar 24;287(5461):2196-204</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10731133</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2003 Jan;13(1):91-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12529310</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2003 Feb 12;19(3):319-26</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12584116</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2010 Sep;20(9):1165-73</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20508146</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F12 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001F12 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:21217122
   |texte=   A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:21217122" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021